Storagebod Rotating Header Image

Storage is Interesting…

A fellow blogger has a habit of referring to storage as snorage and I suspect that is the attitude of many. What’s so interesting about storage, it’s just that place that you keep your stuff? And many years ago as an entry level systems programmer; there were two teams that I was never going to join…one being the test team and the other being the storage team, because they were boring. Recently I have run both a test team and a storage team and enjoyed the experience immensely.

So why do I keep doing storage? Well, firstly I have little choice but to stick to infrastructure; I’m a pretty lousy programmer and it seems that I can do less damage in infrastructure. If you ever received more cheque-books in the post from a certain retail bank, I can only apologise.

But storage is cool; firstly it’s BIG and EXPENSIVE; who doesn’t like raising orders for millions? It is also so much more than that place where you store your stuff; you have to get it back for starters. I think that people are beginning to realise that storage might be a little more complex than first thought; a few years ago , the average home user only really worried about how much disk that they had but the introduction of SSDs into the consumer market has hammered home how the type of storage matters and the impact it can have on the user experience.

Spinning rust platters keep getting bigger but for many, this just means that the amount of free-disk keeps increasing, the increase in speed is what people really want. Instant On..it changes things.

So even in the consumer market; storage is taking on a multi-dimensional personality; it scales both in capacity but also in speed. In the Enterprise; things are more interesting.

Capacity is obvious; how much space do you need? Performance? Well, performance is more complex and has more facets than most realise. Are you interested in IOPs? Are you interested in throughput? Are you interested in aggregate throughput or single stream? Are you dealing with large or small files? Large or small blocks? Random or sequential?

Now for 80% of use-cases; you can probably get away with taking a balanced approach and just allocating storage from a general purpose pool. But 20% of your applications are going to need something different and that is where it gets interesting.

Most of the time when I have conversations with application teams or vendors; when I ask the question as to what type of storage that they require, the answer comes back is generally fast. There then follows a conversation as to what fast means and whether the budget meets their desire to be fast.

If we move to ‘Software Defined Storage’, this could be a lot more complex than people think. Application developers may well have to really understand how their applications store data and how they interact with the infrastructure that they live on. If you pick the wrong pool,  your application performance could drop through the floor or the wrong availability level, you experience a massive outage.

So if you thought storage was snorage; most developers and people still do, you might want to start taking an interest. If infrastructure becomes code; I may need to get better a coding but some of you are going to have to get better at infrastructure. Move beyond fast and large and understand the subtleties; it is interesting…I promise you!

Viperidae – not that venomous?

There’s a lot of discussion about what ViPR is and what it isn’t; how much of this confusion is deliberate and how much is simply the normal of fog of war which pervades the storage industry is debateable. Having had some more time to think about it; I have some more thoughts and questions.

Firstly, it is a messy announcement; there’s a hotch-potch of products here, utilising IP from acquisitions and from internal EMC initiatives. There’s also an attempt to build a new narrative which doesn’t seem to work; perhaps it worked better when put into the context of an EMC World event but not so much from the outside.

And quite simply, I don’t see anything breathtaking or awe-inspiring but perhaps I’m just hard to impress these days?

But I think there are some good ideas here.

ViPR as a tool to improve storage management and turn it into something which is automatable is a pretty good idea. But we’ve had the ability to script much of this for many years; the problem has always been that every vendor has some different way of doing this, syntax and tools are different and often not internally consistent between themselves.

Building pools of capability and service; calling it a virtual array…that’s a good idea but nothing special. If ViPR can have virtual arrays which federate and span multiple arrays; moving workloads around within the virtual array, maintaining consistency groups and the like across arrays from different vendors; now that’d be something special. But that would almost certainly put you into the data-path and you end up building a more traditional storage virtualisation device.

Taking an approach where the management of array is abstracted and presented in a consistent manner; this is not storage virtualisation, perhaps it is storage management virtualisation?

EMC have made a big deal about the API being open and that anyone will be able to implement plug-ins for it; any vendor should be able to produce a plug-in which will allow ViPR to ‘manage’ their array.

I really like the idea that this also presents a consistent API to the ‘user’; allowing the user to not care about what the storage vendor is at the other end; they just ask for disk from a particular pool and off it goes. This should be able to be done from an application, a web-front-end or anything else which interacts with an API.

So ViPR becomes basically a translation layer.

Now, I wonder how EMC will react to someone producing their own clean-room implementation of the ViPR API? If someone does a Eucalyptus to them? Will they welcome it? Will they start messing around with the API? I am not talking about plug-ins here, I am talking about a ViPR-compatible service-broker.

On more practical things, I am also interested on how ViPR will be licensed? A capacity based model? A service based model? Number of devices?

What I am not currently seeing is something which looks especially evil! People talk about lock-in? Okay, if you write a lot of ViPR based automation and provisioning, you are going to be kind of locked-in but I don’t see anything that stops your arrays working if you take ViPR out. As far as I can see, you could still administer your arrays in the normal fashion?

But that in itself could be a problem; how does ViPR keep itself up to date with the current state of a storage estate? What if your storage guys try to manage both via ViPR and the more traditional array management tools?

Do we again end up with the horrible situation where the actual state of an environment is not reflected in the centralised tool.

I know EMC will not thank me for trying to categorise ViPR as just another storage management tool ‘headache’ and I am sure there is more to it. I’m sure that there will be someone along to brief me soon.

And I am pretty positive about what they are trying to do. I think the vitriol and FUD being thrown at it is out of all proportion but then again, so was the announcement.

Yes, I know have ignored the Object on File or File on Object part of the announcement. I’ll get onto that in a later post.

 

 

A New Job – for you maybe?

Although it is no great secret as to who my employer is; I’m sure you can all use Google and LinkedIn, I very rarely post about the specifics of my job. It does influence what I write about obviously but often less than you might think.

This post however is entirely linked to my employer. I have a vacancy in my team and perhaps someone reading this might be interested in coming to work for me. The job advert is here and very good it is too; well, I wrote most of it. However perhaps you might want some more detail.

My storage team provides support and delivery expertise for storage in our Broadcast Technology group. A mixture of delivery, support, some design and general madness will make up your day. Tasks vary from the very mundane; from bar-coding tapes and loading libraries to placating users to the planning of installation of new storage devices to scripting to sitting through vendor powerpoints and trying not to laugh too hard; with all points in between as well.

We use a mixture of traditional storage and some more esoteric stuff; so you may find yourself  upgrading a NetApp filer one-day; adding clients to a TSM server the next; working with the server and network teams to work out why GPFS is misbehaving the next.

At times, you will find yourself on the bleeding edge and doing things that few have done before. Answering big questions like how you build a grow-forever archive and little questions as to how to restore from said archive.

You will get frustrated at both the pace of change but also at times, why does it take so long to adopt some technologies? But you will get to do some cool stuff and when you turn the TV on, you can smile and take pleasure in the fact that you helped put those pictures on the screen.

You can see the technologies I’m interested in in the advert. But more important is attitude and aptitude. You need to want to learn and you must bring a sense of fun and enjoyment with you.

If you’ve got any questions…ask below or work out my email address.

And NO AGENCIES please!

And just in case you missed it…click here

 

 

 

 

Snakebite….

So EMC have unveiled ViPR; their software defined storage initiative; like many EMC World announcements, there’s not a huge amount of detail, especially if you aren’t at EMC World. It has left many of blogger peers scratching their heads and wondering what the hell it is and whether it is something new.

Now like them, I am in that very same camp but unlike them, I am foolish enough to have a bit of guess and make myself look a fool when the EMCers descend on me and tell me how wrong I am.

Firstly, let me say what I think it isn’t; I really don’t believe that is a storage virtualisation product in the same way that SVC and VSP are. The closest EMC have to a product like this is VPLEX; a product which sits in the data-path and virtualises the disk behind it. This I don’t think is a product like this. Arguably these products are mis-named anyway; I think of these as Storage Federation products.

So that is what ViPR isn’t (and can I say that I really hate products with a mix of upper and lower case in their names!).

It is worth looking back in time to one of EMC’s most hated products (by me and many users); Control Center. I think ViPR might have some roots in ECC; to me it feels that someone has taken Control Center and turned it into a web-service; so instead of interacting by a GUI, you interact via the API.

And I wonder if that was how the control component of ViPR came about; when rewriting the core of ECC, I posit that it was abstracted away from the GUI component and perhaps some bright spark came along and thought…what if we exposed the core via an API?

Okay, it might not been of ECC and it could have been Unisphere but this seems a fairly logical thing to do. So perhaps the core of ViPR is nothing really that new, it’s just a change in presentation layer.

[Update: So a lot of the code came from Project Orion which Chad talks about here. So it has been kicking around in EMC for some time, this kind of programmable interface was being discussed and asked for at various ECC user-group/briefings prior to that.]

Then EMC have brought some additional third party arrays into the mix; NetApp seems to be the first one. Using IP that EMC picked up when they bought the UK company, WysDM; who had both a very nice backup reporting tool but also a NAS/Fileserver management tool?

Building additional third party support should be relatively simple using either their CLI or in some cases an exposed API.

So there you go, ViPR is basically a storage management tool without a GUI, or at least it is GUI optional. And with it’s REST API, perhaps you could build your own GUI or your own CLI? Or perhaps your development teams can get on and generally consume all the storage you’ve got but in a programmatic way.

It all seems pretty obvious and begs the question why no-one did this before? I think it might have been arrogance and complacency; this tool should make it easier to plug anyone’s storage into your estate.

But if this was all ViPR was; it’d be pretty tedious. Still EMC obviously read my blog and obviously read this and rapidly turned it into a product or perhaps they simply talk to lots of people too. If I’ve thought it, plenty of others had.

Object Storage has struggled to find a place in many Enterprises; it doesn’t lend itself to many applications and many developers just don’t get it. But for some applications it is ideal; it seems that it would better to have both Object and File Access to the same data, you probably don’t want store it twice either.

So yet again, it’s all about changing the presentation layer without impacting the underlying constructs. However unlike the more traditional gateways into an Object Store; EMC are putting a Object Gateway onto an NFS/SMB share (note to Chuck: call it SMB, not CIFS). Now this is almost certainly going to have to sit in the data-path for Objects. There will be some interesting locking/security model challenges and the like; simultaneous NFS/SMB and Object access is going to be interesting.

It will also require the maintenance of a separate metadata-store, something with a fast database to get that metadata out of. And perhaps EMC own some technologies to do this as well. A loosely coupled metadata store does bring some problems but it allows EMC to leverage Isilon’s architecture and also grab hold of data sitting on 3rd party devices.

[Update: Seems like EMC are using Cassandra as their underlying database. Whether it is Object on File or File on Object; not sure but whatever happens, it is allowing you access via Object or File.]

So ViPR is really at least two products; not one. So..perhaps it’s a Snakebite..

Question is…will it leave them and us lying in the gutter staring at the stars wondering why everyone is looking at us strangely?

 

Can Pachyderms Polka?

Chris’ pieces on IBM’s storage revenues here and here make for some interesting reading. Things are not looking great with the exception of XIV and Storwize products. I am not sure if Chris’ analysis is entirely correct as it is hard to get any granularity from IBM. But it doesn’t surprise me either; there are some serious weaknesses in IBM’s storage portfolio.

Firstly, there is still an awful lot of OEMed kit from NetApp in the portfolio; it certainly appears that this is not selling or being as sold as well as it was in the past. So IBM’s struggles have some interesting knock-on to NetApp.

IBM are certainly positioning the Storwize products in the space which was traditionally occupied by the OEMed LSI (now NetApp) arrays; pricing is pretty aggressive and places them firmly in the space occupied by other competing dual-head arrays. And they finally have a feature set to match their competitors, well certainly in the block space. .

XIV seems to compete pretty well when put up against the lower-end VMAX and HDS ‘enterprise-class’ arrays. It is incredibly easy to manage, performs well enough but is not the platform for the most demanding applications. But IBM have grasped one of the underlying issues with storage today; that is it all needed to be simplified. I still have some doubts about the architecture but XIV have tried to solve the spindle-to-gigabyte issue. There is no doubt in my mind that traditional RAID-5 and 6 are long term broken. If not today, very soon. The introduction of SSDs into the architecture appears to have removed some of the more interesting performance characteristics of the architecture. XIV is a great example of ‘good enough’.

So IBM have some good products from the low-end to the lowish-enterprise block space. Of course, there is an issue in that they seriously overlap; nothing new there though, I’ve never known a company compete against itself so often.

DS8K only really survives for one reason; that is to support the mainframe. If IBM had been sensible and had the foresight to do so; they would have looked at FiCon connectivity for SVC and done it. Instead IBM decided that the mainframe customers were so conservative that they would never accept a new product or at least it would have taken 10 years or so for them to do so. So now they are going to end-up building and supporting the DS8K range for another 10 years at least; if they’d invested the time earlier, they could be considering sunsetting the DS8K.

But where IBM really, really suffer and struggle is in the NAS space. They’ve had abortive attempts at building their own products;  they re-sell NetApp in the form of nSeries these days and also have SONAS/V7000-Unified. Well the nSeries is NetApp; it gets all of the advantages and disadvantages that brings i.e a great product whose best days seem behind it at present.

SONAS/V7000-Unified are not really happening for IBM; although built on solid foundations, the delivery has not been there and IBM really have no idea how to market or sell the product. There have been some quality issues and arguably the V7000-Unified was rushed and not thought all the way through. I mean who thought a two node GPFS cluster was ever a good idea for a production system.

And that brings me onto my favourite IBM storage product; GPFS. The one that I will laud to the hills; a howitzer of a product which will let you blow your feet off but also could be IBM’s edge. Yet in the decade and a bit that I have been involved with it; IBM almost never sells it. Customers buy it but really you have to know about it; most IBM sales would have no idea where to start and even when it might be appropriate.

At the GPFS User Group this week, I saw presentations on GPFS with OpenStack, Hadoop, hints of object-storage and more. But you will probably never hear an IBMer outside of a very select bunch talk about it. If IBM were EMC, you’d never hear them shut-up about it.

One of the funniest things I heard at the GPFS User Group were the guys who repurposed an Isilon cluster as a GPFS cluster. It seems it might work very well.

I personally think it’s about time that IBM open-sourced GPFS and put it into the community. It’s to good not too and perhaps the community could turn it into the core of a software-defined-storage solution to shake a few people. I could build half-a-dozen interesting appliances tomorrow.

Still I suspect like Cinderella, GPFS will be stuck in the kitchen waiting for an invite to the ball.

Free Insight Here!

It never ceases to amaze me that often vendors believe that they can charge an additional fee for something which makes their product work right in the first place.

A founder member of the ‘Free PowerPath Alliance’ who strongly believe that PowerPath licenses should be provided free for any host talking to an EMC block-storage array;  in light of EMC’s intransigence, I’ve abandoned that cause for the time being and as I’m not really responsible for that much EMC block storage….I don’t really care so much anymore.

But I have a new target; another product which should be free and I struggle in many ways to understand why it’s not. I can’t imagine it gets sold an awful lot and probably most customers only come across it when an evaluation license is given to them. And normally, this because something is broken and 3rd line support need more information and granularity than the normal free tools give.

I want you to imagine a self-tuning storage device; one that offers very little opportunity for the end-user to tune; one that if any bespoke tuning is carried out on it, it is almost always at the behest of the vendor. So charging for a tool which monitors performance at a level that is incredibly hard for a customer to make any real use of…is a little strange; charging for a tool which actually makes the vendor’s support teams live’s easier, well guys…not really on!

And wouldn’t you know it….wouldn’t you know who the vendor is…Yep, it’s EMC again.

The tool…’EMC Isilon InsightIQ’

This tool is really useful when you need it but you really don’t need it very often; it makes EMC’s life easier both from a support point of view and from a sales point of view.

It gives really good performance diagnostics; many of which tho’ are not that useful until you are in a support situation. It allows growth forecasting; so you can buy more EMC Isilon. It is an invaluable tool to the EMC account team…in fact I should charge EMC to let them use it against my estate…but I’ll settle for them just making it free!

 

Object Paucity

Another year, another conference season sees me stuck on this side of the pond watching the press releases from afar, promising myself that I’ll watch the keynotes online or ‘on demand’ as people have it these days. I never find the time and have to catch up with the 140 character synopsis that regularly appear on Twitter.

I can already see the storage vendors pimping their stuff at NAB; especially the Object storage vendors who want to push their stuff. Yet, it still isn’t really happening….

I had a long chat recently with one of my peers who deals with the more usual side of IT; the IT world full of web-developers and the likes. He’d spent many months investigating Object Storage; putting together a proposition firmly targeted at the development community; Object APIs and the likes. S3 compatible, storage-on-demand built on solid technology.

And what has he ended up implementing? A bloody NFS/CIFS gateway into their shiny-new object storage because it turns outs what the developers really want is a POSIX file-system.

Sitting here on the broadcast/media side of the fence where we want gobs of storage provision quickly to store large objects with relatively intuitive metadata; we are finding the same thing. I’ve not gone down the route of putting in an Object storage solution because finding one which is supported across all the tools in today’s workflows is near impossible. So it seems that we are looking more and more to NFS to provide us with the sort of transparency we need to support complex digital workflows.

I regularly suggest that we put in feature requests to the tools vendors to at least support S3; the looks I generally get are one of quiet bemusement or outright hostility and mutterings about Amazon and Cloud.

Then again, look how long it has taken for NFS to gain general acceptance and for vendors to not demand ‘proper’ local file-systems. So give it 20 years or so and we’ll be rocking.

If I was an object storage vendor and I didn’t have my own gateway product; I’d be seriously considering buying/building one. I think it’s going to be a real struggle otherwise and it’s not the Operations teams who are your problem.

Me, I’d love for someone to put an object-storage gateway into the base operating system; I’d love to be able to mount an object-store and have it appear on my desktop. At least at that point, I might be able to con some of the tools to work with an object-store. If anyone has a desktop gateway which I can point at my own S3-like store, I’d love to have a play.

 

#storagebeers – Greg’s Buying!

Greg Knierieman, one of the ‘Gang of Three’ who host Speaking In Tech will be in London in mid-April; so it seems only right to celebrate with beer, probably followed by a curry.

This is your chance to meet Greg and ask him all the questions that you have, like

1) Is Eddie really that obnoxious and unemployable? Or is there some other reason why he can’t stay at a company longer than a few months?

2) Is Sarah really carrying a Neanderthal baby?

3) Has Greg been working out in case the Dell deal goes south?

If you fancy talking crap about storage and enterprise tech or just general crap!

So the date is April 17th and the venue is The Dispensary nr Aldgate, East London. Followed by curry afterwards at The Halal Restaurant.

/dev/null – The only truly Petascale Archive

As data volumes increase in all industries and the challenges of data management continue to grow; we look for places to store our increasing data hoard and inevitably the subject of archiving and tape comes up.

It is the cheapest place to archive data by some way; my calculations currently give it a four-year cost something in the region of five-six times cheaper than the cheapest commercial disk alternative . However tape’s biggest advantage is almost its biggest problem; it is considered to be cheap and hence for some reason no-one factors in the long-term costs.

Archives by their nature live for a long-time; more and more companies are talking about archives which will grow and exist forever. And as companies no longer seem to be able to categorise data into data to keep and data not to keep; exponential data-growth and generally bad data-management; multi-year, multi-petabyte archives will eventually become the norm for many.

This could spell the death for the tape-archive as it stands or it will necessitate some significant changes in both user and vendor behaviour. A ten year archive will see at least four refreshes of the LTO standard on average; this means that your latest tape technology will not be able to read your oldest tapes. It is also likely that you are looking at some kind of extended maintenance and associated costs for your oldest tape-drives; they will certainly be End of Support Life. Media may be certified for 30 years; drives aren’t.

Migration will become a way of life for these archives and it is this that will be a major challenge for storage teams and anyone maintaining an archive at scale.

It currently takes 88 days to migrate a petabyte of data from LTO5-to-LTO6; this assumes 24×7, no drive issues, no media issues and a pair of drives to migrate the data. You will also be loading about 500 tapes and unloading about 500 tapes. You can cut this time by putting in more drives but your costs will soon start escalate as SAN ports, servers and periphery infrastructure mounts up.

And then all you need is for someone to recall the data whilst you are trying migrate it; 88 days is extremely optimistic.

Of course a petabyte seems an awful lot of data but archives of a petabyte+ are becoming less uncommon. The vendors are pushing the value of data; so no-one wants to delete what is a potentially valuable asset. In fact, working out the value of individual datum is extremely hard and hence we tend to place the same value on every byte archived.

So although tape might be the only economical place to store data today but as data volumes grow; it becomes less viable as long-term archive unless it is a write-once, read-never (and I mean never) archive…if that is the case, perhaps in Unix parlance, /dev/null is the only sensible place for your data.

But if you think your data has value or more importantly your C-levels think that your data has value; there’s a serious discussion to be had…before the situation gets out of hand. Just remember, any data migration which takes longer than a year will most likely fail.

Service Power..

Getting IT departments to start thinking like service providers is an up-hill struggle; getting beyond cost to value seems to be a leap too far for many. I wonder if it is a psychological thing driven by fear of change but also a fear of assessing value.

How do you assess the value of a service; well, arguably, it is quite is simple…it is worth whatever someone is willing to pay for it. And with the increase prevalence of service providers vying with internal IT departments; it should be relatively simple. They’ve pretty much set the base-line.

And then there are the things that the internal IT department just should be able to do better; they should be able to assess Business need better than external. They should know the Business and be listening to the ‘water cooler’ conversations.

They should become experts in what their company does; understand the frustrations and come up with ways of doing things better.

Yet there is often a fear of presenting the Business with innovative and better services. I think it is a fear of going to the Business and presenting a costed solution; there is a fear of asking for money. And there is certainly a fear of Finance but present the costs to the Business users first and get them to come to the table with you.

So we offer the same old services and wonder why the Business are going elsewhere to do the innovative stuff and while they are at it; they start procuring the services we used to provide. Quite frankly, many Corporate IT departments are in a death spiral; trying to hang-on to things that they could let go.

Don’t think I can’t ask the Business for this much money to provide this new service…think, what if the Business want this service and ask someone else? At least you are going to be bidding on your own terms and not being forced into a competitive bid against an external service provide; when it comes down to it, the external provider almost certainly employees a better sales-team than you.

By proposing new services yourself or perhaps even taking existing ‘products’ and turning them into a service; you are choosing the battle-ground yourselves…you can find the high ground and fight from a position of power.