A recent blog by Tony Asaro had some of my fellow bods in the storage world rolling their eyes, letting out tired sighs and a general disbelief that something so hackneyed is still being blogged about.
Tony blogs about Intelligent Storage Tiering, the fact that huge amounts of data is basically not accessed 90 days after it has been created. Nothing especially new here and then obviously, what you need to do is implement virtualisation to allow you move this data around. This data movement is all seemless and invisible to the users and the applications etc….
This is such an over simplification that it is really beginning to get on my proverbials! Firstly, even without virtualisation I already have implemented storage tiering; I do it in the array. I have my primary disks which in my case are generally 300 gig FC and I have a lower tier which is 500 gig 'low cost FC'. I am considering using 1 TB SATA drives as well in the array. Simply implementing the two tiers has saved me a huge amount of money, I can use the lower tier as a clone target/replication target and for data which doesn't need the screaming performance that the users have become addicted to. I keep this all within an array boundary and to be honest, I don't really want a single application spread across multiple arrays if I can avoid it.
Currently my big space-hogs tend to be databases; I could tier these within the array (and arguably we do, we have another tier of disk which we don't talk about…we have some very small RAID-1 volumes where we can put redo-logs).
But to tier any more means alot of work, not by the storage team but the DBAs and application developers. In a previous job, we used Outerbay to achieve this but the tiering was a side-effect of the work we needed to do to get data out of a Peoplesoft environment to enable it to be upgraded in a reasonable time.
And how long do you carry out your data-classification for? When do you move data? After 90 days? Six months? A year? If you move the data, how quickly can you move it back? How do you ensure that you've got enough fast disk in case you need to move it back? What happens to that data which is written once, not accessed for 90 days, gets moved and then lets say an annual billing reconcilliation job runs and it needs access to all that data? Sure it's still accessible but unfortunately you've just dramatically increased the length of the billing run.
The magic bullet of Storage Virtualisation is really not magic at all if you want to reduce costs; it is a bullet which needs alot of aiming and callibration; it might hit the bulls-eye, it might just wing the target. You need to understand what you are doing, you could cripple your business!
I'll tell you where the greatest value of virtualisation is potentially for me, that is heterogeneous data-mobility. But it's not huge cross-array storage pools, not today anyway. And it's not currently intelligent storage tiering.
Concentrate on building me Storage Management tools which enable to me easily apply my intelligence to storage tiering.
Hear hear, another thousand mile view from the analysts. Unfortunately like everything else in the storage industry the Devil’s in the detail.
Indeed – I agree that any fix at the storage ‘data layer’ is on single digit benefit % versus double digit benefit in the ‘information layer’. App designers need to manage information lifecycle at the business process and application layers. Storage infrastructure then needs to provider the relevent differentiated ‘buckets’ for the apps to migrate the information between. Anything else is hype/myth/tinkering… Ever done a TCO on database ILM to save money? We’ve done loads and all costs much more than simply extending the disks… If the reason isn’t for costs then theire are good justifications (RTO improvement, performance, max’d out infra etc) but otherwise ILM currently is a joke.
So virtualisation – another good idea killed by the luddites in the storage manufacturers worried about strangling their cash cows… 🙁
Abstraction / indirecton is the key thing – one we have this then we can move, but all thr products today are ‘add-on’s or constrained (for the size/type of problem they are dealing with), or a nightmare re diag etc.
Why are so many people moving from array based replication to HBR? Well because it works great for migrations between tiers…
What I want is :-
a) Raid5/6 between arrays (yup treat each array like a disk – then i can power down DMX/USP etc without worry…) – have had this as an RFE to 3 companies for 4 years!
b) We need to work with common capabilities not widget point products… (eg MirrorView, SRDF, Recoverpoint et al…) if replicaton worked between products (without having to add more layers) then migration would be easier.
c) Policy based, fully automated, transparent, storage tiering within a single enclosure would be a good starting point. But this has to be built to seamlessly span multiple enclosures.
d) Manufactures to get it that “enterprise storage IS a commoditiy” and for 90% of the features / use cases it’s irrelevant which manufacture is in use. Thus it will be the one that makes it the easiest to move to/from that gets first mover traction…
Actually you would be surprised on how common of an issue that this is and it is not being prioritized because of all of other challenges facing IT professionals. Yes – you can tier within a storage system but these things are complementary and not at odds with one another. However – internal tiering only gets you so far because the cost of the system is typically associated with the controller. Additionally, you begin to impact the complexity of managing the storage system on many levels. I do agree that the devil is in the details but that isn’t what blogging is for – and the goal was to present ideas for further investigation and consideration. But I agree – more details are needed.
Tony, arguably all virtualisation does is abstracts the controller slightly further away from the disk-trays. I tend to think of things like SVC and USP as abstracted disk controllers.
Ian’s suggestion of being able to RAID Arrays is one we’ve discussed before. Actually, you can RAID-1 arrays using SVC. One of these days IBM will simply admit that the SVC and the DS controllers are pretty much the same thing; one runs Intel and one runs PowerPC. The CLIs are certainly very similar; I’d be interested to know how much code there is? Barry?
Interesting thread with a lot of good points, but I tend to think tiering is good (y’all get to choose how you tier and what you define as a tier 1/2/3…9) and virtualization is good (and again, y’all get to choose your method of virtualization).
If tiering within a controller like the USP-V or DMX suits your needs, go for it. If the controller also supports virtualization of external storage (like the USP-V) doesn’t that just expand your options? Certainly to the extent that you can now demote data to spin down disks.
To Ian’s point, what is missing is a full policy-based scheme that seamlessly demotes/promotes data to fit application-level requirements, not storage requirements. And we’ll get there. This all reminds me of the brouhaha that preceeded System Managed Storage back in the mainframe, circa 1980’s.
Claus, indeed all we are doing is trying to recreate SMS for Open Systems. It’s probably a much harder thing to do with the file-system paradigms that we have created for Open Systems.
Things like EMC’s Atmos kind of make me think of SMS; I’m not sure they meant it like that but it might be a move along those lines.
Spot on!
We help customers manage storage and free up storage capacity – and we do recommend and implement storage virtualization. On top of that, I used to research and generate IP around data classification, virtualization, and data mover technologies.
With all that said, this stuff is complex and does take services, support, and know-how.
However, the low hanging fruit – the simplest thing to start with – is often not done. And that is simply tiering the impact and importance of your applications to your business. This simple step can guide you to buy different tiers of storage as you state above.
You also bring up another point often overlooked – “people processes” have just as much an impact (if not more an impact) on all this stuff as a “Virtualization Solution” can. How you allocate (or over-allocate) your application space, databases, database tables, etc. has a HUGE impact on your storage space.
So, by all means use virtualization to your advantage – but if you don’t take the first, rudimentary data management steps first – then you are just putting an abstraction layer on top of a complex (and more than likely inefficient) infrastructure. And this will lead to more problems than solutions…