The storage community on Twitter is an active and vocal bunch; well some of us are anyway. Tony Asaro posted a link on Twitter pointing to his blog entry on Compellent and Automated Storage Tiering.
This spiralled into a bit of a conversation on whether Compellent was merely another ILM solution and not that special; whether Object storage with metadata describing was really the answer and what the future might be.
What Compellent do at the moment is pretty damn cool; they keep metadata on blocks and allow you to set-up rules which allow the data to be moved on depending on last access. Barry Whyte suggested that this was only done a weekly basis, this led the question whether this was actually enough?
I would suggest that it is a massive improvement on where we are sitting today. Compellent will allow you to have a LUN which will span tiers with 'hot' blocks sitting on Tier 1 and colder blocks sitting on some other tier. But is weekly enough? I think we need to be fairly rapidly look at real-time automated storage tiering based on rules derived from last access, rate of change of data etc.
I'd love to hear some real world opinions/experiences from end-users who are actually using Compellent in anger; it'd be interesting to see a visualisation of flows of data around the array. For example, would it ensure that the fie-system meta-data ends up on the fastest disk? Automated storage tiering may have some interesting impacts on next-generation file-system design.
File-system design, this leads us onto Object Storage; in a world where Object storage becomes more common, what impact does this have on file-systems and data-access. Dave Graham is blogging lots of good stuff on this and it is no-where as near vendor-focussed as I feared considering his employer is EMC. But I don't see it as AST vs Object Storage, I see it as Objects on top of AST.
I think we have two strands of development at different levels of abstraction; both are complementary and both independently valuable. However when applied together, they become hugely more powerful. Objects which move around storage tiers automatically; just taking the space and I/O footprint that they require, replicating and protecting themselves according to the SLAs defined. All this done by an appliance which sits above a modularised storage estate, there's a thought?
Not sure what work there will left for the poor storage admin in this world but I'm sure we can obfuscate this in acronyms etc (Modularised Automated Storage Tiering Utilising Realtime Based Advanced Technology Engine) and keep ourselves in gainful employment. And when it goes wrong, boy will it go wrong!
Great post. I responded to Barry on twitter and said that Compellent was policy-driven and doesn’t have to be a week. I don’t know why he is fixed on that. It is based on frequency of access – if something is still being used why demote it? Having said that – Compellent does put in defaults to make things easier for their customers. But we live in a world where people leave dormant data on Tier 1 storage for months if not years. Btw – Compellent is not a client of mine for the cynics who think they might be 🙂
I am a proponent of object-based but there are some fundamental challenges. Creating the standard is only part of the problem – you also need host applications to support it. Objects also create latency – which can really slow down performance.
I like your idea of an appliance integrating with AST. I think data movement between tiers is just the beginning. We can set all sorts of policies and also get greater insight. We could use federated search, de-dupe everything, create protection policies, eliminate backup, and do tiering.
I don’t think the job of the storage admin would go away – instead of being the guy that spends time on nuts and bolts you would think about how storage can change and improve your core business – a combination of left and right brain thinking. Which is something you are already doing at a technology level (from what I’ve seen on your blog and twitter).
Create Blog Martin, and great comment Tony, keep going……
You have to wonder how long it will be before some level of object functionality is added to the existing virtualisation platforms. That aside though, it’s interesting to think about object based storage in the media industry in particular. The video problem is of a category with things like medical imaging and seismic surveys where treating data as objects (or object-ish) makes so much sense that everyone’s already doing it to some extent (MXF etal in media, DICOM in imaging etc.). What’s missing is that link into the infrastructure to take advantage of the information that’s stored along with the data.
Clearly things are coming but I wonder whether they’ll be tripped up by the performance problems that some systems have had in the past (esp. dealing with high-bandwidth stuff like video). Still, at least bandwidth is in many ways an easier problem to solve than latency so fingers crossed.
Nice post Mr Bod.
To answer Matt, MatrixStore is an object based systems that stores the metadata alongside the asset itself thus making it living object in the archive.
In the media space this is important as the context around a piece of footage may change over time. The ability to update the context of a piece of data without compromising the integrity of the content is very important.
Of course protecting metadata in the archive makes it searchable which means that your multiple terabyte clustered object store becomes in in-house googlable (not a word I know) archive.
The old adage if you can’t find it you do not have it is a particular problem in the media space. An industry that loses or cannot find 75% of the content it creates.
The other problem is managing hundreds of TB of content. This is where the base functionality of object stores (providing a single namespace, automated data protection and recovery, adding of capacity) really can and does drive down administration cost/effort. Does it mean the end of the sys admin? No.
As for policies not all object storage is the same. We create vaults in which objects are tied to policies but we also use the metadata capabilities to determine what actions can be taken upon an object.
Clustered object storage using commodity components will become part of the storage fabric across industries… we hope 😉
Nick, so when are we going to get a version of OM which runs on something apart from Mac? A Linux version would be cool. A Linux VSA would be nicer! Well, you aren’t ever going to be legally able to release a MacOS VSA.
Martin, OM runs on Linux as well as Mac. The Mac version is a DIY ‘install it yourself’ version, Linux only comes in pre-configured all-in-one ‘this is the kit we have qualified’ version. EMC’s Centera is the same (runs on Linux-but you wouldn’t know it), as are a few others out there. (In fact I strongly suspect MOST other cluster solutions run on Linux from Isilon to Permabit).
Why run on all-in-one solutions? (1) the system can be qualified, with carefully tuned response times from other nodes, to guaranteeing system setup like flush-to-disk settings (2) Easier concept to sell – it’s plug and play archiving (3) Profit (from the hardware sale).
OM claims differentiators in their software is that you can turn it off if you ever want to – thereby leaving you with a highly reusable Linux node(s)/your data in a standard mountable fs.
Is there real demand for an install-it-yourself Linux version of such a product?
I think there really is a demand. If I can repurpose kit I already have, potentially my cost of entry is relatively small. And selfishly, if you can give me a capacity limited Virtual Appliance, I can add it to my list of VSAs that I can play with at home.
Strikes me that so long as it’s not got a native GUI to break things, it’d work just fine on Darwin if you’re willing to go to the effort :). Certainly a lot more fun than boring old Linux.
Nick. This is the broadcast industry. Of course there’s a demand for a ‘DIY’ version. Preferably you should publish circuit diagrams and demonstrate how you make it work on a breadboard. Furthermore, any system diagrams should be typed and hung from the rack by a bit of string.
We have been installing Compellent Systems for over 5 years and while discussing what is “best” becomes a religious discussion, Phil Soran and company have certainly got it right. I know of no system that utilizes space better and more efficiently than Compellent. It also allows a client to build for performance while saving $’s. Now, if you want to spend your time at the CLI, you can do that to, but the real power of this system is the simple fact that it actually is simple to use.
The in depth features go on and on, but the ability to build a pool large enough to handle the active data (blocks), meeting IOP requirements, then have everything else sitting in big inexpensive sata cans, is simply a smart and economically intelligent way to manage what has always been an expensive “filing cabinet”.
’09 will see more petabytes shipped than ’08, but the overall $ spend will be flat or actually dip. This will put an enormous squeeze the “Big 5” to keep their marketshare. Every purchase will be scrutinized and will be reassessed as to the value it brings. In this, EMC will be the big loser. HDA, NetApps, IBM, and HP will hold their share. Those that will gain will be Compellent and EqualLogic.
I love it when really smart people break the mold, and head in new directions. Keep your eye on these guys.
Paul Clifford
Davenport Group
http://www.Davenportgroup.com