I seem to be doing a lot of thinking about clouds, dynamic data centres and what it all means. I do believe that the architectures of the future will become increasingly dynamic and virtualised. I was playing with EC2 and AWS at the weekend and I can see a time that I won't bother the ridiculous amount of hardware that I have at home for playing with virtual appliances and 'stuff' * And I can see that it makes increasing amount of sense for a lot of the things we do at work but….I have some questions/thoughts about storage in the public cloud and to a certain extent, the private cloud.
1) All the pricing is per gig, this is a very simplistic model. I know that people will argue that you wouldn't put your highest performing apps in the cloud but you do need some kind of performance guarantees. Anyone want to benchmark Amazon's Storage Cloud, an SPC for Cloud?
2) Replication between private-public clouds; public-public clouds i.e between cloud providers. Or is this simply done at an application level? As anyone tried using database replication between applications running in different clouds?
3) Related to the above, redundancy in the cloud? We provision network links from diverse suppliers to try to protect ourselves from a castrophic outage taking out an entire supplier; do you do the same in the cloud or is it enough to have DR between different clouds from the same supplier.
4) Dedupe in the cloud? Can you dedupe cloud storage? Have people considered writing dedupe appliances to run in the cloud? For example, would Ocarina run as a virtual appliance in the cloud?
5) Backup in the cloud? How do we back our cloud storage up when running in a public cloud? Would you back-up to a different cloud?
6) A virtual array? Before you think I'm mad, it might be interesting to be able to prepurchase a storage pool which can be allocated to virtual servers. This storage pool could be thin-provisioned, over-committed etc as per traditional thin-provisioning.
Just my thoughts, any answers? Any questions of your own?
*This is a blatant lie, I have ridiculous amounts of hardware because I enjoy fiddling and hacking about with it. Pretending it is for research is just an excuse I give myself, my wife is aware of the real truth but she humours me!
Martin
Sounds like you’re thinking too much about the hardware. I’d suggest with Cloud Computing the focus should be on service;
Replication – don’t care – just give me a service that offers it.
Redundancy – don’t care – just give me a guaranteed service
Dedupe – Don’t care – do it and don’t tell me, but price my service right
Backup – don’t care, but offer me different service levels including integration with other locations/vendors (in-band versus out-of-band backup)
Virtual Array – don’t care – just give me cheap storage.
Chris
Actually, I don’t care about the hardware but I do care about the service and how it is offered.
For example, replication, I might want a service which offers replication between private and public cloud(s). I need to know the details of the service and how I get it to interop between clouds of various types.
Redundancy, guaranteed service with such punitive damages which allow my business to deal with any revenue/reputational impact due to failure? I am not sure that the external cloud providers are ready for that yet.
I think do we have to question the service providers and get the right services in place. There are still alot of publically un-answered questions about interop between various clouds.
Is cloud compute the future for most computing requirements; sitting here today, almost certainly. But we have a journey to make, the technology is very nearly ready but organisationally, I suspect most of us are not.
Great post, Martin. Here are my thoughts about the question you raised. There are two places you can dedupe in the cloud — at the client side and in the cloud. By deduping (and compressing) at the client side, you reduce not only storage costs but also bandwidth between client and cloud.
Also, it’s more likely to find meaningful duplicate file information within a customer data set, than across customer data sets. Doing deduplication across the data of multiple customers can raise co-tenancy, privacy, and security concerns – is the dedupe engine mixing data from multiple customers in some unsavory way?
Finally, encryption is an issue that everyone should think through when talking about dedupe and cloud storage. If you encrypt data – not a bad idea when moving it across the public internet – it obscures data patterns.
Encrypting data will make the block level data look like random noise – this dramatically lowers the statistical likelihood of finding duplicates, even in the case where there were easy duplicates to find in the original data set. By doing deduplication and compression at the customer side of the cloud interface, before encryption takes place, you can find all the meaningful duplicates before they are obscured. (If you are finding a lot of duplicates on a data set that has been encrypted, you might want to think about getting a different encryption product….)
At Ocarina, we treat the cloud as a storage platform – to us, it is like a big file server, possibly with different interfaces than a standard filer, and with different latency and performance properties, but a filer nonetheless.
Thanks,
Carter George, VP Products, Ocarina Networks
http://www.ocarinanetworks.com