Life is certainly a bit odd in the storage world at the moment.
Today IBM announced automated storage sub-LUN tiering in the DS8700; so since EMC announced FAST both 3PAR and IBM have announced and are shipping product before EMC? Although IBM's appears to be chubby chunks; not chunklets more chunkosaurus!
So from IBM, you can hedge your bets with regards to storage tiering
1) Tiering is rubbish, you only need one tier and one type of disk i.e XIV
2) Tiering is rubbish, but you need more than one tier at the moment and we can speed things up with a big read cache i.e nSeries aka NetApp
3) Automated tiering is the way to go i.e DS8700
So which ever philosophy you subscribe too; you can buy from IBM. And if you want something to bind them all together, you can also get an SVC.
And then we have EMC; storage virtualisation was rubbish, what you want is federation! What you do is you get all your arrays and you federate them behind an appliance. But if you want NAS or Object Storage, you're going to need a different device.
And then our friends at NetApp; that EMC fragmented storage stuff, that's rubbish! What you want is unified storage platform! Manage all your protocols from a single operating system. What's that? Object Storage? You want Object Storage? Really, ummm…okay! Here's a company we've just bought; integrated into our operating system? Ummm, no…not really since you ask. But I'm sure it will be, someday.
HP? Well we can make it all…we have all our own technology, we've bought just about everyone and it'll all be integrated someday. Just don't ask us who makes XP and SVSP. Please!
Knock, knock….Hu's there? Is there anyone there? Are you sure? I'm sure HDS now stands from Historically Did Storage.
It's like watching monkeys bashing away at typewriters at the moment! I'm sure someone will save us but I'm not sure who at the moment.
In some way NetApp Caching approach is very similar in concept to sub-LUN automatic tiering. Move data to fast media when it becomes a hot-spot and then back to their sources…
IMHO, sub-LUN tiering is a temporal thing, because of data updates and growth, despite some “chuncks” of a database could live there forever (indexes, some tables, etc).
In caching algorithm you start moving data to your limited space cache memory and that data remains till you find another hot-spot, always thinking of large cache.
I think it’s what everyone expected from SVC SSDs…
One thing is true… Hu was right, but EMC talk about NAS + SAN + Geolocation and still forgot backups (VTLs, Tapes, SW, etc.)
WRT chubby-chunks – based on testing and input from IBM’s software group, 1GB is a reasonable size within a database etc for a hot-spot. The key is you don’t want to thrash the system always migrating, you need to make a decision about the value longer term of migrating and hence make it stick for a while.
Which is exactly why using standard flash / SSD devices is not a good idea for standard caching algorithms. They think in seconds, not days or weeks. To keep that amount of meta-data in a cache, and to try and react that quickly is still going to be limited by the 100MB/s (if you are lucky) from a single HDD in the lower tier.
The whole idea is to improve response time and get best use of a few SSD. We all marvel at the amazing IOPs rates of the devices, and yes they can (as we have with SVC) be used by a niche market that needs huge iops from a small amount of capacity. However that was a stepping stone to Easy Tier – its much better if you can automatically detect which areas of an entire enterprise and stick them there for some time.
There is an inherent cost in moving the data – especially into a cache like NetApp, where its likely to be evicted again quickly. You want to make sure that once moved it sticks.
This same function is coming on SVC soon, as we have said numerous times (we were working in it before EMC pre-pre-pre announced FAST v2) and it will be based on the extent sizes you use in the MDG, so down to 16MB if you so desire.
Barry,
really interesting, and it’s logic the chunk size, smaller chunks=>higher overhead. Just asking myself a couple of things:
1) Why SSD and not PCIe SLC? SVC is System X, isn’t it?
2) If you have to decide, then you move the hot-spots in, not out. It’s some kind of batch process? You need to “reprocess data” to remove chunks?
3) Some way to create pools? If you have different workloads then you need to separate them so they don’t compete for the same resource, I guess