Storagebod Rotating Header Image

Flashman and the Silo of Doom

Flash, flash and more flash seems to be the order of the day from all vendors; whether that is flash today or flash tomorrow; whether it’s half-baked, yesterday’s left-overs rehashed or an amuse-bouche; 2013 is all about the flash.

Large and small, the vendors all have a story to tell but flash still makes up a tiny amount of the total capacity shipped and is a drop in the ocean even on a revenue basis. There doesn’t even seem to be a huge amount of consensus as to how it should be deployed; is it as a cache, is it a networked cache or is an all flash-array? Yes, seems to be the answer.

Storage is making the shift from mechanical to solid state and finally should be able to keep up with the servers of today. Well until we change to optical computing or something new.

And like the shift from mechanical computing machines; the whole market is in flux and I don’t see anyone who is definitely going to win. What I see is a whole lot of confusion; a focus on stuff and a focus on hype.

Another storage silo in the data-centre.

Data still finds itself in siloed pools and until the data management problem is solved; data flowing between compute environments, being re-purposed and re-used simply and effectively will continue to be hindered.

Data will duplicate and replicate; I see people selling the power efficiency of flash…it will be even less power efficient because it is highly likely that I will still have all those mechanical disk arrays and only the active data-set will live on flash. Despite the wishful thinking of some senior sales-guys; few people are going to rip out their existing disk-estate and replace it entirely with flash.

I may be able to replace a few 15k disks but data growth currently means that saving is far outstripped by the rack-loads of SATA that the average enterprise is having to put in place.

Whilst I continue to read articles full of hyperbole about speeds and feeds; of features, not usage models…I simply see a faster disk adding a new complexity into an already overly complex and fragile environment.

So lets see some proper data-management tools and progress in that area…

 

Tiny Frozen Hand

So EMC have finally announced VMAX Cloud Edition; a VMAX iteration that has little to do with technology and everything to do with the way that EMC want us to consume storage. I could bitch about the stupid branding but too many people are expecting that!

Firstly, and in many ways the most important part of the announcement is around the cost model; EMC have moved to a linear cost model; in the past, purchasing a storage array had a relatively large front-loaded cost in that you have to purchase the controllers etc; this meant that your cost per terabyte was high to start with, it then declined and the potentially rose again as you added more controllers and then declined again.

This led to a storage-hugging attitude; that’s my storage array and you can’t use it. A linear cost model allows IT to provide the Business with a fixed cost per terabyte whether you were the first to use it or last to use it.  This allows us to move to a consumption and charging model that is closer to that of Amazon and the Cloud providers.

It is fair to point out that actually EMC and other vendors already have various ways to doing this already but they could be complex and used financial tools to enable.

Secondly, EMC are utilising a RESTful API to allow storage to be allocated programmatically from a service catalogue. There are also methods of metering and charging back for storage utilisation. Along with an easy to use portal; the consumption model continues to move to an on-demand model. If you work in IT and are not comfortable with this, you are in for a rough ride for quite some time.

Thirdly, the cost models that I have seen are very aggressive; EMC want to push this model and this technology.  If you want to purchase 50Tbs and beyond and you want it on EMC, I can’t see why you would buy any other block storage from EMC. It is almost as if EMC are forcing VNX into a SMB niche. In fact, if EMC can hit some of the price-points I’ve had hinted at; everyone is in a race to the bottom. It could be a Google vs Amazon price-battle.

Fourthly and probably obviously; EMC are likely to be shipping more capacity than an end-user requires, allowing them to grow with minimal disruption. If I was EMC, I’d ship quite a lot of extra capacity and allow a customer to burst into at no charge for a fair proportion of the year. Burst capacity often turns into bought capacity; our storage requirements are rarely temporary and quickly temporary becomes permanent. Storage procurement is never zipless; it always has long term consequences but if EMC can make it look and feel zipless…

I’m expecting EMC also to move to a similar model for the Isilon storage as well; it is well suited to this sort of model. And yet again,  this leaves VNX in an interesting position.

Out in the cold, with a tiny frozen hand….dying of consumption.

 

Who’ll Do a Linux to Storage?

Are we heading to a Linux moment in the storage world where an open-source ‘product’ truly breaks out and causes the major vendors a headache?

I’ve had this conversation a few times recently with both vendors and end-users; the general feeling is that we are pretty close to it. What is needed is someone to do a Red-Hat and package some of the open-source products and take it on; make them pretty and simple to use. And then give it away..

Of course, Nexenta have already done this rather successfully and if I was looking for a bog-standard traditional dual-head filer product; I’d seriously consider them against the traditional filers.

But great product that it is, it hardly breaks new ground; well apart from price.

What I’m thinking is something which forces its way into the scalable space…block, file and object. Ceph is probably the technology that is closest to this and although it is pretty simple to get going; it is still a bit of science project for most. I’m not sure I’d want to manage a Ceph environment at scale yet; I’d certainly be nervous about running heavy production workloads on it.

Integrating it into a traditional mixed data-centre environment running Linux, Windows and a variety of virtualisation products would be a big challenge.

I’m looking at InkTank to do something but I’m not sure that they have the funding to push it to the level required.

Yet I think the storage market is ripe for this sort of disruption; especially in the object and ‘hyperscale’ space, the big vendors aren’t there quite yet.

Or perhaps a big vendor will finally realise that they can take the open-source building blocks and use that as a weapon..it may mean sacrificing some margin but they could guide the direction and gain some serious advantage. If I was already building commodity hardware, I’d be looking at building proper commodity storage.

5 Minutes

One of the frustrations when dealing with vendors is actually getting real availability figures for their kit; you will get generalisation,s like it is designed to be 99.999% available or perhaps 99.9999% available. But what do those figures really mean to you and how significant are they?

Well, 99.999% available equates to a bit over 5 minutes of downtime and 99.9999% equates to a bit over 30 seconds downtime over a year. And in the scheme of things, that sounds pretty good.

However, these are design criteria and aims; what are the real world figures? Vendors, you will find are very coy about this; in fact, every presentation I have had with regards to availability are under very strict NDA and sometimes not even notes are allowed to be taken. Presentations are never allowed to be taken away.

Yet, there’s a funny thing….I’ve never known a presentation where the design criteria are not met or even significantly exceeded. So why are the vendors so coy about their figures? I have never been entirely sure; it may be that their ‘mid-range’ arrays display very similar real world availability figures to their more ‘Enterprise’ arrays…or it might be that once you have real world availability figures, you might start ask some harder questions.

Sample size; raw availability figures are not especially useful if you don’t know the sample size. Availability figures are almost always quoted as an average and unless you’ve got a real bad design; more arrays can skew figures.

Sample characteristics; I’ve known vendors when backed into a corner to provide figures do some really sneaky things; for example, they may provide figures for a specific model and software release. This is often done to hide a bad release for example. You should always try to ask for the figures for the entire life of a product; this will allow you to judge the quality of the code. If possible as for a breakdown on a month-by-month basis annotated with the code release schedule.

There are many tricks that vendors try to pull to hide causes of downtime and non-availability but instead of focusing on the availability figures; as a customer, it is sometimes better to ask different specific questions.

What is the longest outage that you have suffered on one of your arrays? What was the root cause? How much data loss was sustained? Did the customer have to invoke disaster recovery or any recovery procedures? What is the average length of outage on an array that has gone down?

Do not believe a vendor when they tell you that they don’t have these figures and information closely and easily to hand. They do and if they don’t; they are pretty negligent about their QC and analytics. Surely they don’t just use all their Big Data capability to crunch marketing stats? Scrub that, they probably do.

Another nasty thing that vendors are in the habit of doing is forcing customers to not disclose to other customers that they have had issues and what they were. And of course we all comply and never discuss such things.

So 5 minutes…it’s about long enough to ask some awkward questions.

The Complexity Legacy

I don’t blog about my day-job very often but I want to relate a conversation I had today; I was chatting to one of the storage administrators who works on our corporate IT systems, they’ve recently put in some XIV systems (some might be an understatement) and I asked how he was getting on with them. He’s been doing the storage administrator thing for a long time and cut his teeth on the Big Iron arrays and I thought he might be a bit resentful at how easy the XIV is to administer but no…he mentioned a case recently when they needed to allocate a large chunk of storage in a real hurry; took 30 minutes to do a job which he felt would take all day on a VMAX.

And I believe him but…

Here’s the thing; in theory using the latest GUI tools such as Unisphere for VMAX, surely this should be the case for VMAX? So what is going on? Quite simply the Big Iron arrays are hampered by a legacy of complexity; even experienced administrators and perhaps especially experienced administrators like to treat them as complex, cumbersome beasts. It is almost as if we’ve developed a fear of them and treat them with kid gloves.

And I don’t believe it is just VMAX that is suffering from this; all of the Big Iron arrays suffer from this perception of complexity. Perhaps because they are still expensive, perhaps because the vendors like to position them as Enterprise beasts and not as something which as easy as to configure as your home NAS and perhaps because the storage community are completely complicit in the secret occult world of Enterprise storage?

Teach the elephants to dance…they can and they might not crush your toes.

Doctors in the Clouds

At the recent London Cloud Camp; there was a lot of discussion about DevOps on the UnPanel; as the discussion went on, I was expecting the stage to be stormed by some of the older members in the audience. Certainly some of the tweets and the back-channel conversations which were going on were expressing some incredulity at some of the statements from the panel.

Then over beer and pizza; there were a few conversations about the subject and I had a great chat with Florian Otel who for a man who tries to position HP as a Cloud Company is actually a reasonable and sane guy (although he does have the slightly morose Scandinavian thing down pat but that might just be because he works for HP). The conversation batted around the subject a bit until I hit an analogy for DevOps that I liked and over the past twenty-four hours, I have knocked it around a bit more in my head. And although it doesn’t quite work, I can use it as the basis for an illustration.

Firstly, I am not anti-DevOps at all; the whole DevOps movement reminds me of when I was fresh-faced mainframe developer; we were expected to know an awful lot about our environment and infrastructure. We also tended to interact and configure our infrastructure with code; EXITS of many forms were part of our life.

DevOps however is never going to kill the IT department (note: when did the IT department become exclusively linked with IT Operations?) and you are always going to have specialists who are required to make and fix things.

So here goes; it is a very simple process to instantiate a human being really. The inputs are well known and it’s a repeatable process. This rather simple process however instantiates a complicated thing which can go wrong in many ways.

When it goes wrong, often the first port of call is your GP; they will poke and prod, ask questions and the good GP will listen and treat the person as a person. They will fix many problems and you go away happy and cured. But most GPs actually have only a rather superficial knowledge of everything that can go wrong; this is fine, as many problems are rather trivial. It is important however that the GP knows the limits of their knowledge and knows when to send the patient to a specialist.

The specialist is a rather different beast; what they generally see is a component that needs fixing; they often have lousy bedside manners and will do drastic things to get things working again. They know their domain really well and you really wouldn’t want to be without them. However to be honest, are they a really good investment? If a GP can treat 80% of the cases that they are faced with, why bother with the specialists? Because having people drop dead for something that could be treated is not especially acceptable.

As Cloud and Dynamic Infrastructures make it easier to throw up new systems with complicated interactions with other systems; unforeseeable consequences may become more frequent, your General Practitioner might be able to fix 80% of the problems with a magic white-pill or tweak here or there….but when your system is about to collapse in a heap, you might be quite thankful that you still have your component specialists who make it work again. Yes, they might be grumpy and miserable; their bedside manner might suck but you will be grateful that they are there.

Yes, they might work for your service provider but the IT Ops guys aren’t going away; in fact, you DevOps have taken away a lot of the drudgery of the Ops role. And when the phone rings, we know it is going to be something interesting and not just an ingrown toe-nail.

Of course the really good specialist also knows when the problem presented is not their speciality and pass it on. And then there is the IT Diagnostician; they are grumpy Vicodin addicts and really not very nice!

Love to VMAX

It may come as a surprise to some people, especially reading this but I quite like the Symmetrix (VMAX) as a platform. Sure it is long in the tooth or in marketing speak ‘a mature platform’ and it is a bit arcane at times; Symmetrix Administrators seem at times to want to talk a different language and use acronyms when a simple word might do but it’s rock solid.

Unfortunately EMC see it as a cash-cow and have pushed it when at times better fits in their own product set would suit a customer better. This means that many resent it and like to hold it up as an example of all that is wrong with EMC. I certainly have done in the past.

And it might end up being the most undersold and under-rated product; the product pushed into the marketing niche that is Enterprise Storage. Yet it could be so much more.

I think that there is a huge market for it in the data-centres of the future; more so than EMC’s other ‘legacy’ array in VNX. For many years, I thought that EMC should drop the Symmetrix and build up the Clariion (VNX) but I see now that I was wrong; EMC need to drop the Clariion and shrink down the Symmetrix. They need to produce a lower-end Symmetrix which can scale-out block much in the way that Isilon can scale out file. Actually a smaller Isilon would be a good idea too; a three node cluster that could fit into three or four U; presenting 20-40 terabytes.

In fact, for those customers who want it; perhaps a true VNX replacement utilising the Virtual Versions of  the Symmetrix and Isilon might be the way to go but only if there is a seamless way to scale out.

I guess this will never happen apart from in the labs of the mad hackers because EMC will continue to price the Symmetrix at a premium price…which is a pity really.

Defined Storage…

Listening to the ‘Speaking In Tech’ podcast got me thinking a bit more about the software-defined meme and wondering if it is a real thing as opposed to a load of hype; so for the time being I’ve decided to treat it as a real thing or at least that it might become a real thing…and in time, maybe a better real thing?

So Software Defined Storage?

The role of the storage array seems to be changing at present or arguably simplifying; the storage array is becoming where you store stuff which you want to persist. And that may sound silly but basically what I mean is that the storage array is not where you are going to process transactions. Your transactional storage will be as close to the compute as possible or at least this appears to be the current direction of travel.

But there is also a certain amount of discussion and debate about storage quality of service, guaranteed performance and how we implement it.

Bod’s Thoughts

This all comes down to services, discovery and a subscription model. Storage devices will have to publish their capabilities via some kind of API; applications will use this to find what services and capabilities an array has and then subscribe to them.

So a storage device may publish available capacity, IOP capability, latency but it could also publish that it has the ability to do snapshots, replication, thick and thin allocation. It could also publish a cost associated with this.

Applications, application developers and support teams might make decisions at this point what services they subscribe to; perhaps a fixed capacity and IOPs, perhaps take the array-based snapshots but do the replication at an application layer.

Applications will have a lot more control about what storage they have and use; they will make decisions whether certain data is pinned in local SSD or never gets anywhere near the local SSD; whether it needs sequential storage or random access..It might have it’s RTO and RPO parameters; making decisions about what transactions can be lost and which need to be committed now.

And this happens, the data-centre becomes something which is managed as opposed to the siloed components.

I’ve probably not explained my thinking as well as I could do but I think it’s a topic that I’m going to keep coming back to over the months.

 

 

 

The Right Stuff?

I must be doing something right or perhaps very wrong but the last few months have seen this blog pick-up a couple of ‘accolades’ that have left me feeling pretty chuffed.

Firstly Chris Mellor asked whether El Reg could carry my blog; as a long-term reader of Chris’ and of The Register, this made my year. To be picked up by the scurrilous El Reg is pretty cool.

And yesterday I got an email from EMC telling me that I had been voted to EMCElect! Now, that’s a pretty good start to this year.

This doesn’t mean that I’m going to go easy on EMC; I don’t think that’s what they want from me and if I did, El Reg wouldn’t want me either.

So I guess I’ll keep doing what I’m doing and hope you continue to enjoy it.

Enterprising Marketing

I love it when Chuck invents new market segments, ‘Entry-Level Enterprise Storage Arrays’ appears to be his latest one; he’s a genius when he comes up with these terms. And it is always a space where EMC have a new offering.

But is it a real segment or just m-architecture? Actually, the whole Enterprise Storage Array thing is getting a bit old and I am not sure whether it has any real meaning any more and it is all rather disparaging to the customer. You need Enterprise, you don’t need Enterprise…you need 99.999% availability, you only need 99.99% availability.

As a customer, I need 100% availability; I need my applications to be available when I need them. Now, this may mean that I actually only need them to be available an hour a month but during that hour I need them to be 100% available.

So what I look for vendors is the way that they mitigate against failure and understand my problems but I don’t think the term ‘Enterprise Storage’ brings much value to the game; especially when it is constantly being misused and appropriated by the m-architecture consultants.

But I do think it is time for some serious discussions about storage architectures; dual-head, scale-up architectures vs multiple-head, scale-out architectures vs RAIN architectures; understanding the failure modes and behaviours is probably much more important than the marketing terms which surround them.

EMC have offerings in all of those spaces; all at different cost points but there is one thing I can guarantee, the ‘Enterprise’ ones are the most expensive.

There is also a case for looking at the architecture as a whole; too many times I have come across the thinking that what we need to do is make our storage really available, when the biggest cause of outage is application failure. Fix the most broken thing first; if your application is down because it’s poorly written or architected, no amount of Enterprise anything is going to fix it. Another $2000 per terabyte is money you need to invest elsewhere.