A lot of storage management is really rather easy; we like to veil it in mysterious terms and generally pretend that it is a bit of a black art. There's probably only one group of IT people who are worse and that's the Unix guys. I've been both and I should know but the interfaces are getting easier and it probably wouldn't take that long to turn a decent sysadmin into a storage admin. Okay, the command lines take a little longer to learn (and that's generally due to the screwy and arcane syntaxes that we all love) but the GUIs are pretty intuitive.
But life is getting interesting; not the day-to-day but actually planning and proper management of the environment. Why? Because we are beginning to use techniques which mean that our arrays are going to lie to us on a regular basis!! And this is going to bring with it a number of real headaches; how do I mean lie?
Well Steve Foskett's blog on thin provisioning and Symantec's making VxFS thin provisioning aware kicked off some thoughts, it's been something which has been niggling for a while.
Firstly, lets take thin-provisioning. I assume everyone reading this blog understands the premise of thin-provisioning? We can thin-provision storage but it only gets 'consumed' when data is written to it for the first time; this hopefully allows us to cope with some of the profligacy of our customers who always ask for more storage than they are actually going to use.
Some really nasty traps do lay ahead; for example we will over-commit our storage i.e we will have more storage logically allocated than we have physically available; it's the only way of making thin provisioning really useful and save us money. However, we have to be very careful to monitor what is actually being used and the rate of growth; otherwise, one morning, we might come in and find that all our storage is gone. I have suggested ways that we can help to prevent this i.e convincing sys-admins not to allocate the whole LUN for example but at some point we are going to run out of space.
What happens when we've committed more space than the array can actually be expanded to? I can see some really nasty and rapid migrations having to happen if this is not carefully monitored. So if you are going down the thin provisioning route; be careful, you have made your life easier generally in the day-to-day but now you've got to keep a tight handle on capacity/demand management. And you'd better make sure that you do have migration strategies to cope with the over-commit scenarios; plan to migrate and plan to do it probably more often than you do already.
Now, there is another really sneaky trap laying in wait; de-dupe of primary storage. De-dupe of primary storage is coming main-stream slowly but surely; it will happen! A-SIS already has the capability, expect EMC to do things with Avamar running within a head and IBM do something with Diligent running in an XIV (lots of CPU power in the RAIN based arrays) or in a partition on a DS8K.
So there you are happily deduping storage; ensuring there's lots of commonality and lots of space savings. And then someone does something stupid; patches an operating system, let's say half a dozen blade servers with 30 VMs on? Suddenly you see some commonality go; perhaps updates some Oracle binaries, re-encodes some files; any manner of things. And next thing you know and pretty much without warning, you are out of space.
Okay, same thing can potentially happen with Snaps and all it takes is some-one to run a big update job and you find that your reserved pool for Snap space is gone (I've seen it happen).
None of the above is a reason not to use Thin Provisioning, DeDupe or even Snaps. But we need to get better at managing what is going to be an increasingly volatile environment where at the moment a lot of the tools lie to us. Our day-to-day admin has got easier and our users know this, they are demanding storage quicker and demanding that we use the facilities which our vendors are telling our CIOs etc are going to save them money but just lets take care that we have the tools and we have techniques/disciplines in place to enable us to manage our storage well.
Our colleages in the Server and Network disciplines have it easier than us; if they run out of capacity, pretty much most of the time, things just run a little bit slowly…things stop if we run out of capacity.
Nice! I’d like to highlight one thing you said here, because it’s the key element: Thin provisioning and other space-saving technologies MUST over-provision space or you lose their main benefit. If you’re not at risk of a space crunch, you’re not really getting what you were looking for. And if you are at risk, well, we all know what that means…
Indeed, so we need management tools not administration tools. And there is a difference which is only just really becoming apparent to me. The industry has concentrated on making to day-to-day arcaneness easier but we need to move away from hedge-witchery and into full-blown high magick.
A Solomon’s Seal for Storage to tame those demons!!!
Martin,
As you know, our focus on making day to day arcaneness easier comes from not being able to effect the OS/FS side of the abyss.
Old assumptions die hard – such as the assumption that file systems need to reserve storage space in order to use it safely and efficiently. My guess is that file system developers would be happy to get rid of that responsibility and replace the old assumption with a newer one: storage systems and administrators can be intelligent! (Go ahead, shoot thyself, human.)
Given a clean slate, we might decide:
1) Storage capacity would never be reserved in advance. Instead, storage allocations would always be done on a just-in-time basis.
2) Disk utilization stats would be generated by a storage-side applet and not by piecing together scraps of data from scores of file systems.
3) Free space management would include all forms of data being stored including system, application, de-duped (including what if “re-duped”), snaps and backup. It would also allow admins or policies to decide what resources are the best candidates for returning capacity to the free space pool.
4) Data and storage management would become more aligned (storage analytics). Being able to get a “whole storage perspective” on demand makes it possible to manage the whole mess and not give us another “bad day to give up sniffing dry erase markers.”
Ahhh, if we could just fix the file system side of the world.