Storagebod Rotating Header Image

Storage

/dev/null – The only truly Petascale Archive

As data volumes increase in all industries and the challenges of data management continue to grow; we look for places to store our increasing data hoard and inevitably the subject of archiving and tape comes up.

It is the cheapest place to archive data by some way; my calculations currently give it a four-year cost something in the region of five-six times cheaper than the cheapest commercial disk alternative . However tape’s biggest advantage is almost its biggest problem; it is considered to be cheap and hence for some reason no-one factors in the long-term costs.

Archives by their nature live for a long-time; more and more companies are talking about archives which will grow and exist forever. And as companies no longer seem to be able to categorise data into data to keep and data not to keep; exponential data-growth and generally bad data-management; multi-year, multi-petabyte archives will eventually become the norm for many.

This could spell the death for the tape-archive as it stands or it will necessitate some significant changes in both user and vendor behaviour. A ten year archive will see at least four refreshes of the LTO standard on average; this means that your latest tape technology will not be able to read your oldest tapes. It is also likely that you are looking at some kind of extended maintenance and associated costs for your oldest tape-drives; they will certainly be End of Support Life. Media may be certified for 30 years; drives aren’t.

Migration will become a way of life for these archives and it is this that will be a major challenge for storage teams and anyone maintaining an archive at scale.

It currently takes 88 days to migrate a petabyte of data from LTO5-to-LTO6; this assumes 24×7, no drive issues, no media issues and a pair of drives to migrate the data. You will also be loading about 500 tapes and unloading about 500 tapes. You can cut this time by putting in more drives but your costs will soon start escalate as SAN ports, servers and periphery infrastructure mounts up.

And then all you need is for someone to recall the data whilst you are trying migrate it; 88 days is extremely optimistic.

Of course a petabyte seems an awful lot of data but archives of a petabyte+ are becoming less uncommon. The vendors are pushing the value of data; so no-one wants to delete what is a potentially valuable asset. In fact, working out the value of individual datum is extremely hard and hence we tend to place the same value on every byte archived.

So although tape might be the only economical place to store data today but as data volumes grow; it becomes less viable as long-term archive unless it is a write-once, read-never (and I mean never) archive…if that is the case, perhaps in Unix parlance, /dev/null is the only sensible place for your data.

But if you think your data has value or more importantly your C-levels think that your data has value; there’s a serious discussion to be had…before the situation gets out of hand. Just remember, any data migration which takes longer than a year will most likely fail.

Flashman and the Silo of Doom

Flash, flash and more flash seems to be the order of the day from all vendors; whether that is flash today or flash tomorrow; whether it’s half-baked, yesterday’s left-overs rehashed or an amuse-bouche; 2013 is all about the flash.

Large and small, the vendors all have a story to tell but flash still makes up a tiny amount of the total capacity shipped and is a drop in the ocean even on a revenue basis. There doesn’t even seem to be a huge amount of consensus as to how it should be deployed; is it as a cache, is it a networked cache or is an all flash-array? Yes, seems to be the answer.

Storage is making the shift from mechanical to solid state and finally should be able to keep up with the servers of today. Well until we change to optical computing or something new.

And like the shift from mechanical computing machines; the whole market is in flux and I don’t see anyone who is definitely going to win. What I see is a whole lot of confusion; a focus on stuff and a focus on hype.

Another storage silo in the data-centre.

Data still finds itself in siloed pools and until the data management problem is solved; data flowing between compute environments, being re-purposed and re-used simply and effectively will continue to be hindered.

Data will duplicate and replicate; I see people selling the power efficiency of flash…it will be even less power efficient because it is highly likely that I will still have all those mechanical disk arrays and only the active data-set will live on flash. Despite the wishful thinking of some senior sales-guys; few people are going to rip out their existing disk-estate and replace it entirely with flash.

I may be able to replace a few 15k disks but data growth currently means that saving is far outstripped by the rack-loads of SATA that the average enterprise is having to put in place.

Whilst I continue to read articles full of hyperbole about speeds and feeds; of features, not usage models…I simply see a faster disk adding a new complexity into an already overly complex and fragile environment.

So lets see some proper data-management tools and progress in that area…

 

Tiny Frozen Hand

So EMC have finally announced VMAX Cloud Edition; a VMAX iteration that has little to do with technology and everything to do with the way that EMC want us to consume storage. I could bitch about the stupid branding but too many people are expecting that!

Firstly, and in many ways the most important part of the announcement is around the cost model; EMC have moved to a linear cost model; in the past, purchasing a storage array had a relatively large front-loaded cost in that you have to purchase the controllers etc; this meant that your cost per terabyte was high to start with, it then declined and the potentially rose again as you added more controllers and then declined again.

This led to a storage-hugging attitude; that’s my storage array and you can’t use it. A linear cost model allows IT to provide the Business with a fixed cost per terabyte whether you were the first to use it or last to use it.  This allows us to move to a consumption and charging model that is closer to that of Amazon and the Cloud providers.

It is fair to point out that actually EMC and other vendors already have various ways to doing this already but they could be complex and used financial tools to enable.

Secondly, EMC are utilising a RESTful API to allow storage to be allocated programmatically from a service catalogue. There are also methods of metering and charging back for storage utilisation. Along with an easy to use portal; the consumption model continues to move to an on-demand model. If you work in IT and are not comfortable with this, you are in for a rough ride for quite some time.

Thirdly, the cost models that I have seen are very aggressive; EMC want to push this model and this technology.  If you want to purchase 50Tbs and beyond and you want it on EMC, I can’t see why you would buy any other block storage from EMC. It is almost as if EMC are forcing VNX into a SMB niche. In fact, if EMC can hit some of the price-points I’ve had hinted at; everyone is in a race to the bottom. It could be a Google vs Amazon price-battle.

Fourthly and probably obviously; EMC are likely to be shipping more capacity than an end-user requires, allowing them to grow with minimal disruption. If I was EMC, I’d ship quite a lot of extra capacity and allow a customer to burst into at no charge for a fair proportion of the year. Burst capacity often turns into bought capacity; our storage requirements are rarely temporary and quickly temporary becomes permanent. Storage procurement is never zipless; it always has long term consequences but if EMC can make it look and feel zipless…

I’m expecting EMC also to move to a similar model for the Isilon storage as well; it is well suited to this sort of model. And yet again,  this leaves VNX in an interesting position.

Out in the cold, with a tiny frozen hand….dying of consumption.

 

Who’ll Do a Linux to Storage?

Are we heading to a Linux moment in the storage world where an open-source ‘product’ truly breaks out and causes the major vendors a headache?

I’ve had this conversation a few times recently with both vendors and end-users; the general feeling is that we are pretty close to it. What is needed is someone to do a Red-Hat and package some of the open-source products and take it on; make them pretty and simple to use. And then give it away..

Of course, Nexenta have already done this rather successfully and if I was looking for a bog-standard traditional dual-head filer product; I’d seriously consider them against the traditional filers.

But great product that it is, it hardly breaks new ground; well apart from price.

What I’m thinking is something which forces its way into the scalable space…block, file and object. Ceph is probably the technology that is closest to this and although it is pretty simple to get going; it is still a bit of science project for most. I’m not sure I’d want to manage a Ceph environment at scale yet; I’d certainly be nervous about running heavy production workloads on it.

Integrating it into a traditional mixed data-centre environment running Linux, Windows and a variety of virtualisation products would be a big challenge.

I’m looking at InkTank to do something but I’m not sure that they have the funding to push it to the level required.

Yet I think the storage market is ripe for this sort of disruption; especially in the object and ‘hyperscale’ space, the big vendors aren’t there quite yet.

Or perhaps a big vendor will finally realise that they can take the open-source building blocks and use that as a weapon..it may mean sacrificing some margin but they could guide the direction and gain some serious advantage. If I was already building commodity hardware, I’d be looking at building proper commodity storage.

5 Minutes

One of the frustrations when dealing with vendors is actually getting real availability figures for their kit; you will get generalisation,s like it is designed to be 99.999% available or perhaps 99.9999% available. But what do those figures really mean to you and how significant are they?

Well, 99.999% available equates to a bit over 5 minutes of downtime and 99.9999% equates to a bit over 30 seconds downtime over a year. And in the scheme of things, that sounds pretty good.

However, these are design criteria and aims; what are the real world figures? Vendors, you will find are very coy about this; in fact, every presentation I have had with regards to availability are under very strict NDA and sometimes not even notes are allowed to be taken. Presentations are never allowed to be taken away.

Yet, there’s a funny thing….I’ve never known a presentation where the design criteria are not met or even significantly exceeded. So why are the vendors so coy about their figures? I have never been entirely sure; it may be that their ‘mid-range’ arrays display very similar real world availability figures to their more ‘Enterprise’ arrays…or it might be that once you have real world availability figures, you might start ask some harder questions.

Sample size; raw availability figures are not especially useful if you don’t know the sample size. Availability figures are almost always quoted as an average and unless you’ve got a real bad design; more arrays can skew figures.

Sample characteristics; I’ve known vendors when backed into a corner to provide figures do some really sneaky things; for example, they may provide figures for a specific model and software release. This is often done to hide a bad release for example. You should always try to ask for the figures for the entire life of a product; this will allow you to judge the quality of the code. If possible as for a breakdown on a month-by-month basis annotated with the code release schedule.

There are many tricks that vendors try to pull to hide causes of downtime and non-availability but instead of focusing on the availability figures; as a customer, it is sometimes better to ask different specific questions.

What is the longest outage that you have suffered on one of your arrays? What was the root cause? How much data loss was sustained? Did the customer have to invoke disaster recovery or any recovery procedures? What is the average length of outage on an array that has gone down?

Do not believe a vendor when they tell you that they don’t have these figures and information closely and easily to hand. They do and if they don’t; they are pretty negligent about their QC and analytics. Surely they don’t just use all their Big Data capability to crunch marketing stats? Scrub that, they probably do.

Another nasty thing that vendors are in the habit of doing is forcing customers to not disclose to other customers that they have had issues and what they were. And of course we all comply and never discuss such things.

So 5 minutes…it’s about long enough to ask some awkward questions.

The Complexity Legacy

I don’t blog about my day-job very often but I want to relate a conversation I had today; I was chatting to one of the storage administrators who works on our corporate IT systems, they’ve recently put in some XIV systems (some might be an understatement) and I asked how he was getting on with them. He’s been doing the storage administrator thing for a long time and cut his teeth on the Big Iron arrays and I thought he might be a bit resentful at how easy the XIV is to administer but no…he mentioned a case recently when they needed to allocate a large chunk of storage in a real hurry; took 30 minutes to do a job which he felt would take all day on a VMAX.

And I believe him but…

Here’s the thing; in theory using the latest GUI tools such as Unisphere for VMAX, surely this should be the case for VMAX? So what is going on? Quite simply the Big Iron arrays are hampered by a legacy of complexity; even experienced administrators and perhaps especially experienced administrators like to treat them as complex, cumbersome beasts. It is almost as if we’ve developed a fear of them and treat them with kid gloves.

And I don’t believe it is just VMAX that is suffering from this; all of the Big Iron arrays suffer from this perception of complexity. Perhaps because they are still expensive, perhaps because the vendors like to position them as Enterprise beasts and not as something which as easy as to configure as your home NAS and perhaps because the storage community are completely complicit in the secret occult world of Enterprise storage?

Teach the elephants to dance…they can and they might not crush your toes.

Love to VMAX

It may come as a surprise to some people, especially reading this but I quite like the Symmetrix (VMAX) as a platform. Sure it is long in the tooth or in marketing speak ‘a mature platform’ and it is a bit arcane at times; Symmetrix Administrators seem at times to want to talk a different language and use acronyms when a simple word might do but it’s rock solid.

Unfortunately EMC see it as a cash-cow and have pushed it when at times better fits in their own product set would suit a customer better. This means that many resent it and like to hold it up as an example of all that is wrong with EMC. I certainly have done in the past.

And it might end up being the most undersold and under-rated product; the product pushed into the marketing niche that is Enterprise Storage. Yet it could be so much more.

I think that there is a huge market for it in the data-centres of the future; more so than EMC’s other ‘legacy’ array in VNX. For many years, I thought that EMC should drop the Symmetrix and build up the Clariion (VNX) but I see now that I was wrong; EMC need to drop the Clariion and shrink down the Symmetrix. They need to produce a lower-end Symmetrix which can scale-out block much in the way that Isilon can scale out file. Actually a smaller Isilon would be a good idea too; a three node cluster that could fit into three or four U; presenting 20-40 terabytes.

In fact, for those customers who want it; perhaps a true VNX replacement utilising the Virtual Versions of  the Symmetrix and Isilon might be the way to go but only if there is a seamless way to scale out.

I guess this will never happen apart from in the labs of the mad hackers because EMC will continue to price the Symmetrix at a premium price…which is a pity really.

Defined Storage…

Listening to the ‘Speaking In Tech’ podcast got me thinking a bit more about the software-defined meme and wondering if it is a real thing as opposed to a load of hype; so for the time being I’ve decided to treat it as a real thing or at least that it might become a real thing…and in time, maybe a better real thing?

So Software Defined Storage?

The role of the storage array seems to be changing at present or arguably simplifying; the storage array is becoming where you store stuff which you want to persist. And that may sound silly but basically what I mean is that the storage array is not where you are going to process transactions. Your transactional storage will be as close to the compute as possible or at least this appears to be the current direction of travel.

But there is also a certain amount of discussion and debate about storage quality of service, guaranteed performance and how we implement it.

Bod’s Thoughts

This all comes down to services, discovery and a subscription model. Storage devices will have to publish their capabilities via some kind of API; applications will use this to find what services and capabilities an array has and then subscribe to them.

So a storage device may publish available capacity, IOP capability, latency but it could also publish that it has the ability to do snapshots, replication, thick and thin allocation. It could also publish a cost associated with this.

Applications, application developers and support teams might make decisions at this point what services they subscribe to; perhaps a fixed capacity and IOPs, perhaps take the array-based snapshots but do the replication at an application layer.

Applications will have a lot more control about what storage they have and use; they will make decisions whether certain data is pinned in local SSD or never gets anywhere near the local SSD; whether it needs sequential storage or random access..It might have it’s RTO and RPO parameters; making decisions about what transactions can be lost and which need to be committed now.

And this happens, the data-centre becomes something which is managed as opposed to the siloed components.

I’ve probably not explained my thinking as well as I could do but I think it’s a topic that I’m going to keep coming back to over the months.

 

 

 

Enterprising Marketing

I love it when Chuck invents new market segments, ‘Entry-Level Enterprise Storage Arrays’ appears to be his latest one; he’s a genius when he comes up with these terms. And it is always a space where EMC have a new offering.

But is it a real segment or just m-architecture? Actually, the whole Enterprise Storage Array thing is getting a bit old and I am not sure whether it has any real meaning any more and it is all rather disparaging to the customer. You need Enterprise, you don’t need Enterprise…you need 99.999% availability, you only need 99.99% availability.

As a customer, I need 100% availability; I need my applications to be available when I need them. Now, this may mean that I actually only need them to be available an hour a month but during that hour I need them to be 100% available.

So what I look for vendors is the way that they mitigate against failure and understand my problems but I don’t think the term ‘Enterprise Storage’ brings much value to the game; especially when it is constantly being misused and appropriated by the m-architecture consultants.

But I do think it is time for some serious discussions about storage architectures; dual-head, scale-up architectures vs multiple-head, scale-out architectures vs RAIN architectures; understanding the failure modes and behaviours is probably much more important than the marketing terms which surround them.

EMC have offerings in all of those spaces; all at different cost points but there is one thing I can guarantee, the ‘Enterprise’ ones are the most expensive.

There is also a case for looking at the architecture as a whole; too many times I have come across the thinking that what we need to do is make our storage really available, when the biggest cause of outage is application failure. Fix the most broken thing first; if your application is down because it’s poorly written or architected, no amount of Enterprise anything is going to fix it. Another $2000 per terabyte is money you need to invest elsewhere.

Just How Much Storage?

A good friend of mine recently got in contact to ask my professional opinion on something for a book he was writing; it always amazes me that anyone asks my professional opinion on anything…especially people who have known me for many years but as he’s a great friend, I thought I’d  try to help.

He asked me how much a petabyte of storage would cost today and when I thought it would affordable for an individual? Both parts of the question are interesting in their own way.

How would a petabyte of storage cost? Why, it very much depends; it’s not as much as it cost last year but not as a cheap as some people would think. Firstly, it depends on what you might want to do with it; capacity, throughput and I/O performance are just part of the equation.

Of course then you’ve got the cost of actually running it; 400-500 spindles of spinning stuff takes a reasonable amount of power, cooling and facilities. Even if you can pack it densely, it is still likely to fall through the average floor.

There are some very good deals to be had mind you but you are still looking at several hundred thousand pounds, especially if you look at a four year cost.

And when will the average individual be able to afford a petabyte of storage? Well without some significant changes in storage technology; we are some time away from this being feasible. Even with 10 Terabyte disks, we are talking over a hundred disks.

But will we ever need a petabyte of personal storage? That’s extremely hard to say; I wonder if we will we see the amount of personal storage peak in the next decade?

And as for on-premises personal storage?

That should start to go into decline, for me it is already beginning to do so; I carry less storage around than I used to…I’ve replaced my 120Gb iPod with a 32 Gb phone but if I’m out with my camera, I’ve probably got 32Gb+ of cards with me. Yet with connected cameras coming and 4G (once we get reasonable tariffs), this will probably start to fall off.

I also expect to see the use of spinning rust go into decline as PVRs are replaced with streaming devices; it seems madness to me that a decent proportion of the world’s storage is storing redundant copies of the same content. How many copies of EastEnders does the world need to be stored on a locally spinning drive?

So I am not sure that we will get to a petabyte of personal storage any time soon but we already have access to many petabytes of storage via the Interwebs.

Personally, I didn’t buy any spinning rust last year and although I expect to buy some this year; this will mostly be refreshing what I’ve got.

Professionally, looks like over a petabyte per month is going to be pretty much run-rate.

That is a trend I expect to see continue; the difference between commercial and personal consumption is going to grow. There will be scary amounts of data around about you and generated by you; you just won’t know it or access it.