So there are two possible default back-up policies; back-up everything or back-up nothing.
I am coming round to the latter as a default policy and before you think I’ve gone mad, I’ll explain why. Backing-up everything is a lazy option which requires little thought from you and your users but it is terribly inefficient and probably is major cause of failures in your back-up environment.
But surely you need to back-up something? What about the operating system? What about the localised settings? Well, actually no; not really and this does lead to some bizarre conversations.
I’ve had discussions with my colleagues in server teams which go along the lines of this
‘Why do you want the operating system backed up?’
‘Because we always have done and we need it backed up!’
‘Are you sure, will you ever do a bare metal recovery?’
‘No, we use our provisioning servers and simply re-provision!’
‘So, how about we simply back-up the provisioning server?’
‘No, we must have the operating systems and related files backed up!’
And so it goes round and round; it’s backed-up because it always has been done but no-one will ever use the back-up.
In fact, I can think of very little which needs to be backed-up from a core infrastructure/application platform point of view. Just back-up the core deployment environments and you should be able to rebuild the environment quickly and simply from them. Okay, there are infrastructure supporting applications as well which we as infrastructure teams need backing up; authentication servers, name-servers and the like. As infrastructure teams we need to keep our own house in order from that point of view.
So, this leaves us with the data; be it user data, databases and anything else that might be created by an application. Well, it’s probably about time that we got the data owners to tell us what value this has and only back it up if asked; is it transient data? Can it be easily rebuilt?
Making it policy that nothing gets backed-up unless requested takes out all ambiguity. There can be no assumptions about what is being backed-up, it makes it someone’s responsibility as opposed to an assumed default.
I think starting from a policy of zero backups and then building from there is a much better approach than backing up everything and reducing from that position.
And at least if you back-up nothing, you can really run a zero tolerance policy to back-up failures.
[…] has discussed his reasoning behind his default policy here, in “Don’t BackUp“, which I encourage you to read before continuing. There is, indeed, as Martin suggested in a […]
Your post got me thinking about a recent experience I had with a client, where we said a newly provisioned server didn’t need to be backed up.
The infrastructure team’s reponse was that they’d still back it up, but only retain the data for a few days instead of the usual 30+, in case of disasters, etc. I suppose we could’ve pushed them to not back up anything, but it seemed rather pointless.
That event and your blog post has got me thinking about the visibility of backups to users. If we do take an approach of “Don’t back up anything unless explicitly requested”, we need to get much better at providing clear information on what is and isn’t being backed up.
With a “services” oriented infrastructure, it’s highly likely that multiple teams will be using the same resources, but the owner might be associated with just one of those teams. When multiple teams are each using the same resource, there’s a very good chance that one will assume that the other shares their high valuation of the data, and the backup requirements, while the original application owner considers the data to be of a low priority.
I think in most organisations, it would be pretty difficult for someone who is just a “user” of a service to find out the data retention policies for a service they use.
I’ve not really put my thoughts down on this and tried to edit them into a clear form, but I hope you understand what I’m getting at?