A common theme which appears to be coming more to the fore, is the need for a new type of storage, Object-based storage. Okay, it's not new but it is certainly not yet mainstream. EMC's Atmos for example is an implementation of object-based storage, whether it is optimised for the Cloud, I'll let you decide.
This could have happened years ago; I would argue that it is simply an extension of what was done on mainframes years ago; defining data-sets, giving them characteristics etc, letting them be automagically be moved about, whisked off to slower disk, online tape and ultimately off-sited somewhere.
So why has this taken such a long time to develop for open-systems? Well, if you sit and discuss this with a lot of open-systems guys, they don't get it. It feels clunky; what do you mean I need to know something about the files before I store them?
But they already do or certainly in the past they did, we would lay things out on disks for optimal performance to eek the last I/O out of the spindle; now in most cases, data just gets thrown at some disk and we hope that the arrays are chunky enough to sort it out.
We probably don't need to be as anal about layout as we used to be and with multi-petabyte sites, I suspect life just is not long enough. So we need to start categorising and defining classes of storage; I don't mean tiers, I mean defining something about the data that is going to be stored. Is it a database? Is it a log file? Is it unstructured etc, etc? Now we might need to build some intelligence in so that the storage can make some intelligent guesses for storage which we can't categorise before hand.
And yes we are going to need some fundamental changes to how about how applications expect storage to respond and behave.
I would suggest that people start with this article about SMS: The Discipline; forgive the mainframe-isms. We've got a lot to learn from some of these mainframe guys; especially that it's not about the hardware, it's about the software.
I’ve mentioned a number of times about how good the mainframe storage environment was. I worked on early SMS, found some of the original bugs and wrote a lot of ACS routines. However, at one site I managed a massive 300GB of storage – and knew almost every dataset. Sadly not possible now.
The problem is the end users. Month-end they touch a lot more data than they do during the month. Year-end … same. And there is always that certain class of users that treat the data like a datamart. Try to expunge those records? “We already have a data warehouse!” You proclaim. So you end up with databases with 7, 8, 10 years of data and good luck purging the old data. “Don’t we own that?”
So do you tier those 4 year old records off to tape automatically? Maybe until Monday when those painful users come in and run those specialized queries.
I’d guess these open systems became more popular because they were cheaper and the data storage was cheaper. I/we all know users that keep it all because they can. Go ahead and tier it at your peril is my view. If those queries start taking much longer to run, the end users will come calling.
Harks back to EMC’s idea of a “Data Tone” mooted many years ago. http://www.informationweek.com/806/emc.htm.
It needs defining and automating, but seems the idea was always there….