Everything can break, you need to remind yourself this whenever a vendor is talking to you.
Over the past year, I have personally come across the following:
1) A triple disk failure in a RAID6 environment which resulted in data loss
2) Data-loss due to an a bug on an array
3) Data-loss due to an application bug
4) Data-loss due to failed back-ups
Of these, only the first three are in anyway partially excuseable. Hardware will fail, software inevitably has bugs; we would hope that in general that all scenarios are tested but anyone who has been involved in testing knows that sometimes things get concessioned or sometimes simply missed.
But the last one is pretty much inexcusable, failed back-ups should be caught and fixed before they become a problem. The only acceptable SLA/OLA for a back-up environment is 100%; if you are willing accept that there is a chance that you might loose that data, perhaps you shouldn't be backing it up in the first place.
Martin
Hence the point of replication of multiple copies; all depends on how valuable that data is. If it is the life of your company, 2/3/4 copies is worth the investment.
And the number of stories I can tell you about messed up replication; disks added to the primary but not to the replicas etc.
Manage, monitor and audit….often the missing processes after implement.
Data can be not available, but no justification to lose the data! Completely agree.
Agreed! Nothing is “Set it and forget it” (SIFI) and failed backups should always be addressed. Secondly, whatever is making you need to do constant recoveries should be addressed.
I don’t see anything which talks about constant recoveries but to be honest, in any reasonable sized estate with a reasonable number of users; you expect to be recovering files in some manner on a regular basis.
I’d really be curious to hear the details around the triple disk failure….what storage array, RAID technology, rebuild times, all that.
Not looking to start a bashing session on whichever vendor….but just really curious to know the details (as some vendors do have stuff to make triple disk failure less likely and curious what technologies were in play).
I have seen the triple disk failure where I currently work, in fact the same Array has single disk failures every week and the vendor can;t explain why. For info the Array is HP XP24000 series Array.
Spill the beans – what was the array with the triple disk failure, what was the array with the bug (and what was the bug), what was the application and what was the back up software?
Not really relevant to your point (the above could happen with any array / software) just curious…