Storagebod Rotating Header Image

Storage

NetApp leaps into the Future

So NetApp have entered an agreement to resell Quantum’s Stornext product; this is the next step in their big-data strategy and after the LSI acquisition it makes a lot of sense. There are people who put Stornext in front of NetApp filers and the like but honestly it makes much more sense putting it front of relatively dumb disk like the LSI arrays. The LSI disks in my experience really shine in a clustered file-system and are capable of delivering great throughput but they are not the most feature laden disks that you have ever come across.

It might make an interesting proposition if you could somehow integrate the Stornext filesystem into the filer-heads; this could allow NetApp to overcome some of my current objections to the cluster-mode in OnTap 8.

Perhaps NetApp could go and buy/merge with Quantum and have another go at integrating OnTap with something else? Perhaps go the whole hog and look at pluggable file-system technology in OnTap? Okay, it’s a bit of a science experiment but might be interesting.

The LSI acquistion and now the partnering with Quantum for Stornext is interesting though and it shows that NetApp are finally at the stage where they are confident enough to quietly acknowledge that one size does not fit all; this does not mean that their current products are flawed but it shows a maturity that one should not be trying to bang square pegs into round holes.

It also continues us down the path that storage is software; hardware is just a commodity and should be able to be swapped out and changed with ease.  Another pretty solid move from NetApp.

Don’t BackUp

So there are two possible default back-up policies; back-up everything or back-up nothing.

I am coming round to the latter as a default policy and before you think I’ve gone mad, I’ll explain why. Backing-up everything is a lazy option which requires little thought from you and your users but it is terribly inefficient and probably is major cause of failures in your back-up environment.

But surely you need to back-up something? What about the operating system? What about the localised settings? Well, actually no; not really and this does lead to some bizarre conversations.

I’ve had discussions with my colleagues in server teams which go along the lines of this

‘Why do you want the operating system backed up?’

‘Because we always have done and we need it backed up!’

‘Are you sure, will you ever do a bare metal recovery?’

‘No, we use our provisioning servers and simply re-provision!’

‘So, how about we simply back-up the provisioning server?’

‘No, we must have the operating systems and related files backed up!’

And so it goes round and round; it’s backed-up because it always has been done but no-one will ever use the back-up.

In fact, I can think of very little which needs to be backed-up from a core infrastructure/application platform point of view. Just back-up the core deployment environments and you should be able to rebuild the environment quickly and simply from them. Okay, there are infrastructure supporting applications as well which we as infrastructure teams need backing up; authentication servers, name-servers and the like. As infrastructure teams we need to keep our own house in order from that point of view.

So, this leaves us with the data; be it user data, databases and anything else that might be created by an application. Well, it’s probably about time that we got the data owners to tell us what value this has and only back it up if asked; is it transient data? Can it be easily rebuilt?

Making it policy that nothing gets backed-up unless requested takes out all ambiguity. There can be no assumptions about what is being backed-up, it makes it someone’s responsibility as opposed to an assumed default.

I think starting from a policy of zero backups and then building from there is a much better approach than backing up everything and reducing from that position.

And at least if you back-up nothing, you can really run a zero tolerance policy to back-up failures.

Backups==Archives?

Steven’s post and a twitter conversation between Storagezilla, W. Curtis Preston and others about archiving, backup, retention and deletion of data bring home some of the realities of data management and policy.

It is often easier to keep data than delete it; this is both a technical issue and an organisational issue. Many applications are not written with good data management in mind; often if you want to scare a developer, ask them how you age data out of their application? Ask them how you might bulk delete data, ask them how you move historical data to an archive and still have it available in some form for reporting purposes?

Organisationally, trying to get an answer out of a Business user how long that they want to keep their data around for is like getting blood from a stone. You will not get an answer and getting them to sign up to any kind of policy which you dictate is next to impossible as well.

So we end up in a situation where it is technically hard to either delete data or archive it and even if we could, we would never get anyone to agree to it anyway. So the default is to keep everything for ever and if you are doing this, the difference between archiving and backup becomes very blurred very quickly.

Now we all know that the right thing to do is to archive, set good retention periods, delete data and we can keep pushing this rock up the hill. And indeed, I would argue that we should but we also need a certain amount of pragmatism.

Archives and backups in many cases have become the same thing; so companies like EMC should not really be castigated for making products which enable this and make the best out of a bad job. But we should as the experts in the field, try where possible to do the right thing; there are cases where doing the right thing is the easiest thing to do, so do it there.

And when someone asks why their backups are taking longer and longer; why they are spending more and more on storage; why their databases appear to have data in which goes back to the dawn of time; you can nod sagely and say ‘Well, you see; backups are not archives but due to business realities, backups have become so but if you really want to move to a better solution, it will require this, this and this…’

I will bet that they will continue to treat backups as archives because to do anything else has become simply too hard. It’s the same old HSM story; it’s too hard…it requires thought and planning, neither of which is overly abundant in today’s IT.

Simulated Interest

Greg posts here on the Cisco Learning Labs; a long overdue learning tool for Cisco Networkers but rather limited in usefulness for people like Greg who want to experiment with different designs, features and go beyond the basic labs which are provided in the labs.

I really think it’s about time that where vendors have internal simulators available that they start to get them into a state which they can release into the wild and I believe there are some really good reasons to do so.

It’s worth looking at Computer Science/Engineering courses for example, they turn out programmer after programmer but it is very unusual for them to turn out anyone with an interest in infrastructure.

Why? Because it’s very expensive to kit out labs with lots of infrastructure and this is really frustrating for those of us in the industry because we know there are good infrastructure simulators/emulators available. Most of us have used them on expensive courses provided by vendors.

If we want to encourage and capture the best, I think that the vendors can help with this by getting their emulators out there. Don’t think of it as giving something away, think about it as an investment.

Do you think Java would be quite as popular if it wasn’t taught so extensively in universities? And why do you think it gets taught so extensively? Because it got given away.

I know many of the storage vendors have already started to do so but it’s time for some of the rest of you to step up to the plate. I’m looking at you IBM; I’m looking at you HDS….Cisco, Brocade, Juniper and rest, time to stand-up.

EMC, keep up the good work and get the rest of your simulators out there. NetApp, you too.

Serial Killing

So this is a rant, so apologies!

Sometimes vendors make me despair and I don’t know why I have never learnt my lesson but still they do. We have a mysterious problem with some Cisco MDS switches but as many of you probably know, you can’t easily buy MDS switches from Cisco; you generally go through another vendor. In this case it is IBM!

Anyway, one of my guys logs a call with IBM, first of all it gets bounced from the software call-centre to the hardware call-centre; which obviously necessitates a different PMR number? Why is beyond me but it’s always been that way with IBM.

So he logs the call against what he thinks is the correct serial number and what the switch displays as its serial number. We wait many hours and hear nothing; he prods them a few times and still we get nothing. Eventually, we escalate the call to the district support manager to have a rant and we find that there is a note on the call along the lines that the switches are not under maintenance. Funny that considering yesterday I had actually approved this quarter’s maintenance payment. That and they had actually managed to record the wrong contact number and misspell an email address.

But of course the switches are under maintenance; it’s just that IBM stick their own serial number on the back of the switch physically and to find it out, you need to have it recorded somewhere or go to have a look. Why, oh why do vendors do this; or at least, why don’t you have two fields in your maintenance database which ties the OEM and your serial numbers together?

And it’s not just Cisco kit with IBM; it’s their rebadged NetApp, rebadged LSI, rebadged DDN etc, etc. I don’t want to have my engineers to have to look up these sort of details when they are trying to fix problems; I want them to be able to read the serial number directly off the machine, they are rarely in the secure computer rooms and to gain access requires raising access requests etc.

[Of course, I would also have expected IBM to tell me at point of first contact that they didn’t recognise the serial number and they believed that there was no maintenance contract.]

I want to have a single serial number to work off; at the present, I need two for a single piece of kit and that is crap. And I do need the OEM serial number generally because all of the software licensing is tied to that and not to your made-up number.

Of course, don’t get me started on IBM part-numbers!

BTW, we don’t have this problem with EMC; we can give them an Cisco serial number and they can cope with it!

Still IBM are not the only problem who do stupid things like this but please can I suggest that a piece of kit should have a single serial number and it should be the one that the piece of kit reports as its serial number.

Half-Empty Post

One of the EMC announcements earlier in the year got me thinking; now it might come as a bit of surprise to most of you, I’m really not a glass half-empty kind-of guy and generally try to look at the positive things in most announcements but there was an announcement which really took me aback and almost immediately a cynical ‘point of view’ popped into my mind.

No, it wasn’t the slightly cynical rebrand of Clariion and Celerra into a ‘single’ product called VNX; that was pretty transparent and I’ve never had a real problem with the Frankenstorage, I’m all about the ease of management anyway. As far as I am concerned, you can lash half-a-dozen products together, stick a pretty unified interface on it and as long as I can manage it as a single device; that’ll pretty much do.

No, it was the VMAX announcement. The doubling in performance with a software upgrade; what’s not too like you ask? A performance upgrade for free….well, as long as you have maintenance on the array.

Now if I was a VMAX customer, I might actually be asking another question? Why have I been running my VMAX at half its performance capability for the past year or so? Did I really need all the VMAX kit that I bought and did I need to swell EMC’s coffers by quite so much when I was trying to seriously curtail my IT spend? And am I going to see a similar ‘free’ kicker in performance next year?

It does make one wonder if the VMAX was actually released earlier than intended and before it was really ready.

FAST-2 seemed to take a long time to come to fruition with FAST-1 being a re-brand of Symmetrix Optimizer, a place-holder as opposed to anything really revolutionary?

Was the VMAX Enginuity code release running poorly translated/emulated PowerPC code? And the optimisations are nothing  more than the Enginuity team have actually had time to understand and write a true native Enginuity release?

This is obviously speculation and it probably says a lot that VMAX was still performant even though it was running under-baked code. But if EMC continue to improve VMAX performance without hardware kickers, I’d be looking in their code for sleep statements, timing loops and all other kinds of tricks that programmers pull.

I’ll be watching carefully….

An Alternative Reality…

EMC, the world leader in information infrastructure solutions, today introduced a new family of unified data centre solutions – the EMC DCE – Data Center Environment.

These new systems are result of many years work and collaboration between EMC’s industry leading storage, server and virtualisation divisions. Technologies from the Symmetrix, AViiON, McData and VMware divisions have been brought together to build the world’s complete Data-Center Solution.

The boundaries between storage and compute have now been removed with the new DCE range allowing compute workloads to be run alongside storage workloads within in a single device greatly reducing the need for complex storage network infrastructures.

EMC are leveraging their VMware technology to provide the capability to consolidate server workloads into a single unified environment and utilising NUMA expertise and technology from the AViion division to build a scale-out server environment which can support both storage and compute workloads with seamless migration of workloads between the individual engines in the new Data Center Environment.

These technologies running alongside EMC’s leading Enginuity storage operating system in a single environment have enabled EMC to demonstrate the capability to run an entire Data-Center from within a single system including the provision of VDI down to the desktop.

With the McData division providing networking expertise and technology; this is a truly unified Data-Center environment.

Yes, the above is based on the premise that EMC did not can Data General’s server division and continued to develop the technology that it acquired, also that EMC did not spin VMware and McData off but how far is it from being a reality if EMC decided to make it so?

Could they take the VMAX hardware platform and turn it into a single Data Centre Environment? I suspect the answer is yes but their partnership with Cisco somewhat precludes them from doing this but if Cisco ever moved into storage?

The only piece that EMC are really missing is the network but I reckon that is a rectifiable situation; if EMC ever buy a network company, you know where EMC are going.

Ever More Confusion?

Stuiesav blogs about EMC’s positioning of VMAX vs VNX and the problems that EMC marketing appear to have with  putting clear blue water between the products. This has been a long running problem which started with the purchase of DG and the introduction of the Clariion range into the EMC portfolio leading to some serious internal conflict within EMC between competing product teams.

It did appear however that peace had more or less broken out in the camps but I wonder? As VNX continues to grow in capability and scalability, the high-end product operates a more and more rarified atmosphere and there is less requirement for it. And like other companies before, EMC are at risk at some quite serious internal compete; like IBM when often you would find AS/400, RS/6000 and mainframe all pitched into a user to do the same job, there is clear and pleasant danger of this happening here.

Actually EMC have a tremendous amount of overlap across most of their product-range at the moment, VMAX overlaps with VNX which overlaps with Isilon which overlaps with Atmos which overlaps with Centera; there’s not a huge amount of external coherence.

The senior-guys, the marketing gods and the bloggers within EMC will pitch this as strength in diversity but will the sales-guys get it and will they be able to successfully pitch or will they simply concentrate on the products which generate the most revenue and get them closer to their target? If I’m being cynical, I’d expect the account manager to try to get VMAX through the door in preference to VNX but if there was a risk of a customer going NTAP/HP/IBM, then I’d pitch in with VNX.

And of course, I’m aware that I’ve not mentioned one of EMC’s platforms; VPLEX. I suspect like many, I’m wondering what is going on with VPLEX and what direction it is going in. It’d be nice to see The Storage Anarchist put together a blog on VPLEX and VPLEX directions? But perhaps it’s simply easier to trash-talk the opposition and blog about meaningless benchmarks?

Waffle on no WAFL

We have a lot of LSI disk; media and broadcast companies tend to; it pops up in many forms, from traditional IT vendors such IBM, SGI, SUN Oracle to Broadcast/Media vendors who offer LSI as their badged solution.

For the most part, we use almost none of the functionality beyond RAID and it is used as big lumps of relatively high throughput and cheap disk. Many of our internal customers are only interested in one thing; can it stream uncompressed HD to their workstations and can it do it reliably and consistently. We tend to have very low attach rates with one smallish array split between a handful of hosts; sometimes even down to one.

If we need special functionality, it’ll often be done at the application layer or the file-system layer; for example, we use GPFS and Stornext clustered file-systems. We don’t make a huge use of NAS in these specific areas either, we would like to use more but at times we struggle to get application vendor support. For what is at times seen as a bleeding edge industry, we are very conservative about deploying new infrastructure; when you are trying to put stories to screen before the other guy, you tend not to pee about. If it works, you don’t mess about with it.

Thin provisioning, snaps etc don’t have a huge amount of worth; dedupe is practically worthless. Once a project is finished, it gets archived away (to tape 🙂 ) and onto the next one. There’s little time to dedupe or compress even if it we got reasonable ratios.

LSI is great for this small but growing niche; I’m hoping NetApp don’t mess it up too much and that we don’t have a horde of NetApp sales-guys trying to up-sell us to something we don’t need and our users don’t want. There are plenty of other smaller storage vendors who would fancy filling that niche but I, like a lot of people in the media game, have loads of this stuff and we would rather not have to start again with a new storage environment.

However, starting again is always an option because our data doesn’t tend to live on disk for very long and much of the storage logic is in the application anyway. And you see, this might actually be a model for the future for everyone; the disk won’t really matter that much to us, so you better be able to control that cost and get it as low as possible and then focus how you get value-add much, much further up the stack.

The much vaunted ‘Big Data’ might well be a case in point; it is entirely possible that many ‘Big Data’ applications will use very little of your advanced array functionality and will be looking for drag-racer type storage which gets from ‘A’ to ‘B’ as quickly as possible.

‘Cheap, Fast and Dumb’…it’s not sexy but it’s a good reason for NetApp to buy LSI; if you told a NetApp engineer to develop ‘Cheap, Fast and Dumb’, they’d probably walk out in disgust or deliver something five years later which was none of those things but was a really good general purpose, all things to all men ‘Unified Storage System’.

Arguably, NetApp had done the hard stuff already; they need someone to do the simple stuff for them.

NetApp’s moment of Engenio?

NetApp’s acquisition of the external storage division of LSI yesterday was a bit of a shock; for me, it was who acquired as opposed to Engenio being acquired. I’ve been expecting the acquisition for some time but I was expecting it to be someone like Dell or even IBM but not NetApp.

In general, I think that the acquisition is a good one for the industry; whether it turns out to be a good one for NetApp, time will tell but I suspect it’s a bit of curate’s egg driven by necessity and realism within the NetApp ranks.  But like EMC’s DG acquisition; the benefits will probably be  realised in the longer term.

NetApp are still growing fast that’s for sure but I think that they want to accelerate even more and doing this purely organically is quite hard; you get the feeling when talking to some of the guys that they are pedalling as hard as they can but they are struggling to keep up with themselves.

Also, I think OnTap 8 is the beginning of a story which will long term lead to a very different appliance long-term and one which shockingly may not be completely built on WAFL and RAID-DP. The LSI acquisition is possibly part of the picture.

But there are a few places where LSI makes sense today

Bycast; the combination of Bycast and OnTap appliances is not especially attractive to the core audience for Bycast’s object storage technology at the moment. The costs do not really stack and I wonder if like EMC; NetApp have come to the conclusion that they need some ‘cheap’ and relatively dumb RAID arrays to put behind it. LSI’s disk would fill this role for them very nicely. The DE6900 disk enclosure particularly is made for that sort of application.

High Density Filers; yet again, the DE6900 and Engenio 7900 makes a lot of sense here. If OnTap 8.1 manifests some significant improvements in both aggregate size but also c-mode operation; this sort of density becomes a must. And certainly in the HPC and Media applications which NetApp wants a piece of,  there is a need for this sort of density.

IBM: IBM have sold a lot of nSeries gateways with rebadged Engenio disk; NetApp can get their hands on all of this revenue.  It also means that this relationship could end up being more healthy than it is currently; it could extend the lifetime of the relationship some-what. It may also hasten it’s end!

VNXe: NetApp’s current low-end offering is expensive when compared to the cost of a VNXe. LSI gives NetApp the potential to engineer their own VNXe killer.

Direct Attach: There is still an awful lot of direct attach disk sold; NetApp really don’t have an offering in this space. This will enable them to have the conversation and look at owning the whole of the storage estate.

So there are quite a few reasons to do the deal but….NetApp are not great at acquisition and the Engenio acquisition comes with complexities as it has a rich channel and OEM customers who compete with NetApp. If these relationships are not maintained, the acquisition could quickly turn into a nightmare.

And what happens to the Unified Storage message? That’s not as clear as it was.