Backblaze of course, but we aren’t talking about the probability of seeing a failure, but of one of your disks failing, and more importantly, data loss. A binomial probability distribution is a simplified way to see the scenario.
Let’s pretend all disks have a failure rate of 2% in year one.
If you have 2 disks, your probability of each disk failing is 2%. The first disk in that array is 2%, and the second is 2%. If 2 disks fail in Z1, you lose data. This isn’t a 1% (half) chance, because the failure rate of one disk does not impact the other, however the risk is less than 2%.
So we use a binomial probability distribution to get more accurate, which would be .02 prob in year one with 2 trials, and 2 failures making a cumulative probability of .0004 for data loss.
If you have 6 disks, your probability of each disk failing is also 2%. The first disk in that array is 2%, the second is 2%, so on and so forth. With 6 disk Z2, three must fail to lose data, reducing your risk further (not to .08%, but lower than Z1).
So with a binomial probability distribution, this would be .02 prob with 6 trials, and 3 failures making a cumulative probability of .00015 for data loss.
Thats a significantly smaller risk. The other interesting part is the difference in probability of one disk failing in a 6 disk array than a 2 disk array is not 3x, but is actually barely any difference at all, because the 2% failure rate is independent. And this doesn’t even take into account large disks have a greater failure rate to start.
I’m not saying mirroring two larger disks is a bad idea, just that there are tradeoffs and the risk is much greater.
No dark mode, no use.