The Storage Buddhist's blog about IBM Easy Tier with SATA and SSD here started me thinking.
And I quote from his blog
'Easy Tier was left to automatically learn the SPC-1 benchmark and respond (again, automatically)'
Very cool and IBM get some good results from this but I have a few questions and this goes out to all vendors who are doing automated tiering of some sort (I include NetApp's PAM here)
1) How long does the array take to learn the workload and how quickly does it take to achieve an optimal state?
This is very important because it impacts any new application deployment. Do I deploy to the fastest disk and artificially throttle the disk to required SLAs? Users are funny you see and if they discover that application performance is degrading over time, even if it is within the SLAs, they generally complain. Or do I deploy to the slowest tier and let the array tune itself to meet the required SLAs? Now, this might be the worst possible time to impact performance and may impact the user acceptance of the application?
Do I have to come up with some artificial way of simulating the production load and run that for a period to enable the array to tune itself prior to any real users? If so, that has impacts on my ability to respond quickly to my user's demands. And my dynamic data-centre becomes less dynamic than I want.
2) How does the automated tiering impact replication?
In traditional replication technologies; one tries to keep the layout of the local and the remote array in sync. This is challenging enough in a non-dynamic environment and is often a manual task. How do you keep the array layout in sync if the array constantly changing it's layout to reflect the load on it? How can you be sure that your remote recovery array is in an optimal state? Do you want to? I suppose the answer is very much dependant on my first question? How long does it take to reach optimal state? Is it hours? Is it days? What is the delta between optimal and non-optimal performance?
If you have to keep the arrays completely in sync; what is the impact of sending array layout changes to the remote array? How much additional network bandwidth do we require?
3) What is impact on restores both from tape and from snaps/clones?
The backup application has no idea how the underlying physical structure has been changed; it will just sequentially restore the blocks on the LUNs it can see. However the array has no idea if the blocks being restored are the hot blocks it moved onto SSD or the cold blocks it moved onto SATA. It could be a right dog's dinner and obviously it will see a completely different I/O pattern in a restore scenario; you might not want your array self-tuning itself at this point.
I suspect that the answers to these questions could be quite complex but I am certainly interested in how the various vendors mitigate some of the risk that I highlight. Automated Storage Tiering is still in it's infancy; I think we've a lot of lessons to learn.
Or do I worry too much?
Martin,
I’ll take a crack at responding to this (unofficially of course) from a Compellent perspective:
1 – There’s really no “learning” per se with our system. Rather, data is ingested at the highest tier available at RAID 10. Users may decide to not do this (let’s say you’re doing a data migration for example – you can pin the volume to the lowest tier @ RAID 5 or whatever makes sense to you). As for the data progression between tiers, this occurs at the page (extent if you prefer) level (default is 2MB pages in our system). I/O activity within pages is monitored and access frequency noted to determine which pages are less/more active and are candidates for movement.
SO – no learning, but rather a “settling” of data over time. By the way, any pages which are part of a replay set (snapshot in our world) are considered read only and re-stripe as RAID 5 at each data progression run (by default, once a day).
2 – Tiering in our system doesn’t impact replication in that the source/target volumes can be independently tiered as appropriate. For example, some customers opt to use only large SATA disk for the DR target volumes – one tier. The production volume can be tiered across three drive speeds, but this will not require an identical configuration at the target site.
3 – A tape restore would look like incoming data and be handled as per item 1 above. However, a volume could be temporarily moved to a different tiering profile while a restore was being done (i.e. moved into tier 2 or 3) or it could be restriped/demoted manually after the restore is finished (to keep the write/ingest flowing at the fastest possible speed).
Hope this helps – I don’t think you worry too much! Good questions and thanks for the opportunity.
Martin,
“learning curve” is 24 hours. You can turn it on and off, or you can do it manually, but to use Sub-LUN you need automation.
The rest is a matter of policy and design. This is optimization, not automation.
Martin,
Yes, it takes 24 hours to learn. All extents are sorted hottest to coldest, and the hottest are moved from HDD to SSD, and the coldest are moved from SSD to HDD, until the coldest SSD extent is hotter than the hottest HDD extent. The list is re-evaluated every 5 minutes with extents being swapped to ensure optimal state.
Replication is not affected. Destination copies do not need to match identical HDD/SSD configuration.
As for restores, it depends on the method. If you do a block-level restore, the hottest data will still be on the same blocks as before, based on the files or data sets contained on them. If you are doing file-level restores and this puts the new files in different blocks, then it will go through standard re-evaluation every 5 minutes to swap blocks as needed between HDD and SSD.
— Tony (IBM)
Tony,
if you wanted your DR target to be identical to your production target, surely you would have to send the meta-data across to the DR target to move the extents at the DR end? Otherwise when you cut across to DR, you find yourself with a non-optimal array; this may or may not be important to you but as in most DR situations you are not running in an optimal situation, anything which keeps you close to optimal would be useful.
And yes, I guessed that file-level restores would mean that the restored file needs to be re-optimised. It gets kind of interesting and what impact does the restore activity have the optimisation algorithms because the array would see a completely different type of I/O pattern whilst restoring. Just wondering what the impact of this would be?
Maybe these are simply edge-cases but even edge-cases need to be taken into account.
Thanks to IBM and Compellent for answering.
So Barry B? How do EMC do it? Any answers or are you just too busy getting ready for your annual junket?
Martin,
same as servers, automatic movement of any kind of power is not desirable in conservative environments. Uncapped schemas and things like that makes reports get mad, so if you are that kind of conservative, don’t use automatic nothing not in production, never in DR, because it’s not so predictable.
But if you try to squeeze till the last breath from your system, this kind of stuff helps you tune the engines.
NOTE: PowerVM from IBM Power Systems can use uncapped power so one partition can borrow CPU from another not using it in real time, but some sysadmins hate that feature because reports show over 100% CPU usage, despite the performance gain.