If Schrodinger was alive today, I don't think he'd be writing about cats! He'd be writing about IT; he'd have a wealth of things to write about! But I'm going to pick on two storage related ones!
- BackUps – all backups are good until you try to recover it
- Disaster Recovery Plans – they all work until you try them
It is scary when I talk to colleagues who seem to think that just because the back-up server has reported that something has backed-up correctly, that you actually have a good back-up. There is a good likelihood that you have backed-up something but until you try to recover it, you can't be sure. So it might be well to instigate a testing regime.
But the scariest thing is the Disaster Recovery thing; the number of companies I come across who think that because
1) They have some kit to recover on
2) They have a plan
that they can recover in a disaster but when you ask if they test it; you will be faced with blank looks and excuses that it is too complicated and complex, anyway who has time to test it?
Well if it is too complicated and complex to do when you are not working under the pressure of restoring a business which is currently down; how much worse is it going to be when you are actually trying to save your business?
In the current climate, I can see a lot of these things going by the by because we've got another excuse; that is…we don't have enough resource and we can't afford it anyway.
Badger your management to do their DR tests, to periodically do recovery testing…otherwise you might find yourself with a lot of boxes full of dead cats. And eventually, they'll start to smell and you won't even have to open them to find that out.
and to further comment on what you have said
• They might have a plan but how up to date is it!
If the business does come back after a fashion how many of those CxOs will still have a job or be out of jail
Forget DR. Seriously, start thinking about running two production sites, and giving yourself the flexibility to slide workloads from one to the other on demand — or when one of the datacentres pops. There are qite a few of these types of infrastructures around in the UK; some with 3 datacentres. They needn’t be that expensive, and they are way, way cheaper than having to do DR testing for the awful day. Every day should be a DR test.
Hmmm…depends where your data centres are and distance between them!
I disagree – I actually think the operational recovery problem is more important in a lot of ways than disaster recovery. The thing is that the backup issue is even worse than you describe.
Trusting that your backup has completed successfully is important but ultimately, even if you know with 100% certainty that the data on the tape (or disk) is recoverable, you still only really know that a chunk of data can be recovered. That tells you nothing about whether or not you can actually recover an application or to what point in time you can recover it.
The problem is that as applications become more complex, you don’t necessarily see the appreciation of operational recovery issues become more complex with it. Issues of consistency which people (usually) deal with when thinking about DR tend to get swept under the carpet a bit in the OR world.
Orgs really need, as part of their end-to-end management to document the recoverability of their applications.
Beyond that, for large orgs, I agree with Alex – at the very least, you reduce the amount of work that needs doing when a DC does ‘go pop’.