Chuck’s post about Big Data Storage (as opposed to your tiny data storage) is pretty much on the money; simplicity is very much the key and it is this ‘no frills’ approach to storage which means that small teams can manage large amounts of storage with the minimum amount of fuss.
The key to us is the ability to scale quickly and easily; adding storage in and then simply using software to do clever stuff like balancing; be it OneFS, GPFS or StorNext, the job is relatively simple assuming that you’ve done the initial set-up correctly.
Much about what he says about Snaps, DeDupe and the other features that the more traditional general purpose storage arrays have also rings very true. Much of the data we deal with does not lend itself to DeDupe and there are some other interesting aspects of the data we deal with; once the file is written, it is never changed.
Think of it like a RAW file from a digital camera; you don’t actually change the file, you may develop the file, in some applications, we may save a file which details the edits and transformations which are required to produce the processed file and in others, we may save a copy of the processed file.
Replication and archive is handled at an application level; we could do it at the file-system or storage level but it is easier to let the application handle it and this means we can be completely storage-agnostic.
We are relinquishing a certain amount of control and empowering the user to take responsibility for their environment. They decide on the number of replicas they require and to a certain extent, they decide as to where these replicas are stored; if they want to store eight copies locally, then they are empowered to do so.
We do need better instrumentation to allow us to look at ways of telling them exactly how much data they are storing and also how much bandwidth etc they are consuming. This could develop into a charge-back system but I suspect it will be more an awareness exercise. It would also allow us to model the impact of moving to a public cloud provider for instance where both available bandwidth and bandwidth consumption are important factors.
Looking at this model may have longer term implications for general purpose storage; if VMware and other operating systems continue to add features such as snapshot and replication into their software stacks; then the back-end array type becomes almost entirely commodity.
If we consider the impact of caching flash moving up into the server stack; yet more array functionality becomes server-level functionality. Of course, there are still challenges in this space with ensuring that the array is aware and co-operating with the server but if the replication and other functionalities are actually carried out at the server level, then this becomes more feasible.
I can imagine a time where the actual brains of the array are completely virtualised and live on the servers; virtualised VMAXs, VNXs, Filers, v7000s etc are all within the realms of possibility in the near future and probably exist today in various vendor labs. The rust/SSDs become a completely commodity item.
Where does this leave the storage team? In the same place it is today, at the core of the Enterprise working to provide simple and effective access to the information asset; they may be working at different levels and collaborating with different people but they’ll still be there. But they might have smiles on their faces instead of frowns…now there’s a thought!
Martin — thanks for letting us know that we got our basic understandings correct.
Now we have to deliver on it!
— Chuck