Safety Engineer, Dad, Husband, Pilot, Musician. Not necessarily in that order.

Ingenieur für funktionale Sicherheit, Vater, Ehemann, Pilot, Musiker. Nicht notwendigerweise in dieser Reihenfolge.

  • 2 Posts
  • 26 Comments
Joined 2 years ago
cake
Cake day: June 11th, 2023

help-circle
  • Then why do you think manufacturers still list these failure rates (to be sure, it is marked as a limit, not an actual rate)? I’m not being sarcastic or facetious, but genuinely curious. Do you know for certain that it doesn’t happen regularly? During a scrub, these are the kinds of errors that are quietly corrected (althouhg the scrub log would list them), as they are during normal operation (also logged).

    My theory is that they are being cautious and/or perhaps don’t have any high-confidence data that is more recent.


  • Hopfgeist@feddit.detoSelfhosted@lemmy.worldHow to fix my ZFS pool mistakes
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    10 months ago

    Bit error rates have barely improved since then. So the probability of an error whenr reading a substantial fraction of a disk is now higher than it was in 2013.

    But as others have pointed out. RAID is not, and never was, a substitute for a backup. Its purpose is to increase availability. And if that is critical to your enterprise, these things need to be taken into account, and it may turn out that raidz1 with 8 TB disks is fine for your application, or it may not. For private use, I wouldn’t fret. but make frequent backups.

    This article was not about total disk failure, but about the much more insidious undetected bit error.


  • Let’s do the math:

    The error-reate of modern hard disks is usually on the order of one undetectable error per 1E15 bits read, see for example the data sheet for the Seagate Exos 7E10. An 8 TB disk contains 6.4E13 (usable) bits, so when reading the whole disk you have roughly a 1 in 16 chance of an unrecoverable read error. Which is ok with zfs if all disks are working. The error-correction will detect and correct it. But during a resilver it can be a big problem.


  • I also use this, and it works great. Another downside is that when using the free service, others can just use subdomains of your registered domains. You can always deny it, but you have to do it manually. With the premium subscriptions you can prevent that automatically for a number of domains, depending on how much you pay.


  • To add, unlike “traditional” RAID, ZFS is also a volume manager and can have an arbitrary number of dynamic “partitions” sharing the same storage pool (literally called a “pool” in zfs). It also uses checksumming to determine if data has been corrupted. On redundant setups it will then quietly repair the corrupted parts with the redundant information while reading.




  • Hopfgeist@feddit.deOPtoSelfhosted@lemmy.worldDifferent "geometries" for same disk model?
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    edit-2
    1 year ago

    Sure, SCSI disks will show their defective list (“primary defects”, as delivered by the factory, and grown defects, accumulated during use), and they all have a couple hundred primary defects. But I don’t see why that would affect the reported geometry, given that it is fictional, anway. And all disks have enough spare tracks to accommodate for the defects, and offer the specified full number of total sectors, even for long list of grown defects. Incidentally, all the 4TB disks are still “perfect” in that they have no grown defects.

    And yes, ever since LBA, nobody has used sectors and cylinders for anything.


  • I’m not touching that post again. But a small rant about typesetting in lemmy: It seems there is no way whatsoever to put angle brackets in a “code” section. In an overzealous attempt to prevent HTML injection, everything in angle brackets is just removed when posting (although it remains there in preview). In normal text, you can use “<”, but not inside “code” segments, where it will be retained verbatim.



  • If you’re as paranoid as me about data integrity, SAS drives on a host adapter card in “Initiator Target” (IT) mode with write-cache on the disks disabled is the safest. It will degrade performance when writing many small files concurrently, but not as badly as with SATA drives (that’s for spinning disks, of course, not SSD). With a good error-correcting redundant system such as ZFS you can probably get away with enabled write cache in most cases. Until you can’t.


  • RAID is generally a good thing but don’t get complacent, follow the 3-2-1 method

    To expand on that: Redundant drive setup and backups serve completely different purposes. The only overlap is in case of a single disk failure, where RAID (or similar) may save the data.

    Redundancy is all about reducing downtime in case of single hardware failures. Backups not only protect you from data loss in case of multiple simultaneous failures, but also from accidental deletion. Failures that require restoration of data almost always involve downtime. In short: You always need backups (unless it’s strictly a local cache, and easily recreatable), but if you want high availability, redundancy may help.

    3-2-1-rule for backups, in case you’re unfamiliar: 3 copies of important data, on 2 different media, with 1 off-site.





  • It’s much more than a fan shroud. It’s a baffle specifically designed to guide cooling air over the CPU heatsinks and the RAM modules. This kind of airflow design is very common in servers. I wouldn’t trust it without, especially since the CPU heatsinks have no dedicated fans, but rely on the aerodynamic functioning of the baffle.

    And yes, I know they are very similar, in fact I am quite (but not absolutely) certain that they are identical except for the actual second CPU socket. It’s almost as if you didn’t read my post. Even the soldering points for the second CPU socket are there in the single-CPU T320. They certainly won’t have different PSU connectors. They even share part numbers for the case.




  • I don’t think there’s anything intrinsically wrong, but far as I can see you are using only a single disk for the zfs pool, which will give you integrity checks (know when something is corrupted), but no way to fix it.

    Since this is, by today’s standards, a tiny disk at 100G, I assume this is just a test setup? I’m not sure zfs is particularly well suited for virtual machines, I think it is better to have the host handle the physical data integrity by having the disk image on a zfs filesystem, or giving the VM a zfs volume (block device) directly.




  • What are the advantages of raid10 over zfs raidz2? It requires more disk space per usable space as soon as you have more than 4 disks, it doesn’t have zfs’s automatic checksum-based error correction, and is less resilient, in general, against multiple disk failures. In the worst case, two lost disks can mean the loss of the whole pack, whereas raidz2 can tolerate the loss of any 2 disks. Plus, with raid you still need an additional volume manager and filesystem.