Storagebod Rotating Header Image

gestaltit

How Big Is It?

Sometimes I think storage vendors use a special type of man-ruler when measuring the size of their bits and bytes; especially when it comes down to working out how much storage their array can support. Please note, this is not about utilisation; this is about the maximum number of disks that a array can support and actually use.

Every time a vendor announces a new array; there is inevitably an increase in the maximum spindle count and almost invariably as spindles get bigger, there is a commensurate increase in capacity. More spindles, more spindles and even more spindles are the order of the day. These beasts get bigger and bigger according to their spec sheets but then you have conversations with your account manager.

'Okay, so how much disk can I put in your array?'

'Well, that depends; what are you going to use it for?'

'Oh you know, stuff!'

'Let us do a detailed study and we'll get back to you…'

'But you say you support 1000 spindles?'

'Ahhh, we do but only when the wind blows from NNE and the Sun is waxing in Aries and the moon is mooning Virgo! Then we support 1000 spindles but at all other times, we support rather less than we say! I can't tell you how much less…but we do support less or more, it depends!'

At times, the whole business is more opaque than a black-out blind! And when software is priced on a per frame/head/array basis; sometimes based on the maximum capacity that an array could support on that special day when it can support its maximum amount of disk, it's quite frankly mildly irritating!

Lets have sensible incremental software pricing, lets have proper disclosure as to what is meant when talking about maximum capacity and lets get some transparency into this business.

Protocols, Religions and Heresy!

I've just come back from a NetApp training course; good course and recommended for anyone who wants to pick up some storage fundamentals, it covers all the NetApp bases and by the end of it, you should be fairly confident to do pretty much all the day-to-day routine tasks that you might be asked to do as an administrator of a NetApp array.

It does not cover SAN in any detail and the FC coverage is limited to this is how you present a LUN as a fibre channel device but this led on to some interesting conversations around the complexity of FC vs iSCSI.

I, for sometime, have been saying that the complexity of FC is over-stated and actually it is not really any harder than iSCSI. This often leads to looks of disbelief and complete disagreement, it is almost as if I am spouting heresy. The iSCSI camp think I am mad and the FC camp seem to think that I'm diminishing them.

But, if you take OnTap; there is really very little difference in how you present a iSCSI LUN compared to how you present an FC LUN. It is certainly no harder to do FC from an array management point of view.

And then we go to the host; lets take Windows for example. Now this is where I think a lot of the perception of the simplicity of iSCSI comes in; Microsoft have implemented a very nice software initiator, it's there and it's standard. A bit of pointing and clicking and you are there.

Unfortunately, traditional FC cannot be implemented completely in software and needs FC HBAs, hence we need to install additional drivers and software to make it work; these are not part of the standard Windows build and suddenly it all becomes 'complex'.

If we go to Unix, we end up mucking about configuration files in general for both iSCSI and FC; so really it's not any harder to do FC than iSCSI.

So if it's not hard to do at the host level and if it's not hard to do at the array; where is it hard? And this is where I think it becomes more interesting; it's the network! Is a large Data Centre IP network harder to set-up than a large Data Centre FC network?

Arguably, the FC network is easier but it is different. In the FC network you have a lot less to worry about; you run less protocols, services, it's non routeable, the security model is simpler, there is less potential for different workloads to clash, you do not have address space management to worry, you do not have name services to worry about and if I were a Network Admin, I would argue that it's them who are being diminished by this constant claim that iSCSI is easier.

To do either iSCSI or FC properly is probably equally hard. If you just want to bung a block-level array in and do not care about segregation of traffic, quality of service, don't care whether your IP back-up traffic and IP storage traffic contend and make your back-ups over-run. If you know you've got enough headroom on your existing IP network to carry your block traffic, go ahead with iSCSI; it's easier because you've already got the infrastructure in place.

But if you want put in place a dedicated storage network; the choice is not as clear-cut as people would like to make out. Even when you start looking at cost; yes GbE is cheaper than FC but FC is generally running at 4 gig now and is faster. FC ports are in my experience are significantly cheaper than 10GbE ports. So if you need the throughput, then FC might well be cheaper than iSCSI.

There might be a small saving in the number of FTEs you require as you could have a single Network team but I believe that FC is actually so simple that if you get over the politics, you could have a single Network team which manages both FC and IP. This is purely politics and a turf-war!

If you are a small shop and you have just a few administrators who do everything, iSCSI might also make sense but don't believe FC is hard; administering a small SAN with a couple of switches might not add a huge amount of additional overhead.

Neither iSCSI or FC are wrong answers but make sure that when you have got to the answer, you show your working. And when a vendor tells you what the answer is, ask them to show your their working and challenge it.

Of course, if I was building a green-field data-centre and could simply start again, I'd probably look at putting in Data Centre Ethernet which would give me the option of FCoE. I would have a single Network team from the get go.

It'll be interesting to see if Microsoft bundle a software initiator for FCoE into Windows at some point; then I think we'll see perceptions of complexity change again.

Yes, I've completely ignored NAS but actually many of the same arguments apply.

I'd welcome some thoughts; that's if anyone is still reading and not exploded in apoplectic rage at the heretic!

Fishworks Simulator Hint and other VSA stuff

The Fishworks Simulator has been driving me a bit nuts; I couldn't get ESXi to reliably use a Fishwork's iSCSI LUN as a data-store; it would consistently hang whilst installing an Virtual Machine. I coudn't see anything wrong but it would just die. I tried moving things round, running two copies of the simulator, one providing the NFS share with the install files on and one providing the target iSCSI LUN and it was still dieing.

Fortunately the guys from the Fishworks team had made contact and I can ask them questions. They suggested reducing the number of virtual disks, so I've dropped down to two disks now. One for the Fishworks binaries and one to store data. This has made all the difference, it now works and doesn't die.

So people, if you are actually trying to use the Fishworks simulator to do useful stuff, you might want to consider reducing the number of virtual disks; I might try increasing the number of virtual disks and putting them on discrete spindles but I don't have 15 disks to play with; so I might try 3+1.

As per usual, I get all my VSAs built then both Nexenta and EMC release new versions. And someone has found me another VSA to play with. Falconstor seem to be ignoring my attempts to register for their free version.

At moment, I'm ignoring the VSAs which provide features like backups, dedupe but VSAs seem to be really taking off. I do like the concept of providing discrete infrastructure capabilities as a bundled virtual appliance and at some point I shall branch out into looking at some of these as well.

Arrays Now Good Enough?

I loved this blog entry about Web 2.0 because the last paragraph is beginning to have relevance in our own world of Storage.

"The lesson to take away is that, in this cycle anyway, the hard architectural problems have been solved to a "good enough" degree, now its about making them easier to use etc. Its like post war cars – once it was sorted out how they worked, where to put the pedals etc, it became largely a styling game for quite a while."

Despite a lot of the huffing and puffing by the various vendors, I think that we are about to enter a cycle where this is true for array-based storage. Most vendors, if you have a serious and open conversation admit that their competitors arrays are pretty much good enough for what-ever you want them to do. Of course, there are a few features which differentiate but they are all catching up with each other in this space.

It will get harder and harder to choose between vendors based on features; it is going to come down to useability, manageability and reliability.

The array is heading towards functional maturity and after this year is done, I'm expecting the battle-ground to move to the skies.

V is for value??

I have a problem with Kostadis’ latest posts on virtualisation and especially the ability of using vSeries as a virtualisation controller to reclaim unused disk and improve efficiency etc. The problem with the vSeries going into an existing environment, is that of the triumvirate of SVC, USP-V and vSeries virtualisation appliances/controllers is that it is just about the most disruptive thing that you can do.

SVC and USP-V will go in with no data migration and you can simply use them to virtualise your existing physical LUNs; that is the beauty of keeping things relatively simple and not relying on an additional abstraction layer such as WAFL. You can then migrate into the native SVC and USP-V formats at your leisure if you wish but you are not forced into making changes which potentially lock you in to the virtualisation vendor.

This is extremely important in today’s environment; firstly it minimises the amount of swing space that you require to do the conversion into a virtualised environment, secondly it minimises the amount of potential outage and thirdly it’s not a one way trip.

If all I want to do is reclaim unused disk and I am prepared to take the level of disruption that putting a v-Series into my environment; I suspect I might be better spending the money on analysing my existing environment and coming up with a better, more efficient lay-out and working at doing things better in my current environment.

For example, I would consider looking at Virtual Provisioning in my EMC environment; sure the licenses cost but it’s not going to be as expensive as going to a vSeries. I could convert from BCVs to Clones or Snaps. There’s a multitude of things which could be done before going virtualised. And once you've got your environment fixed, then take up NetApp and their space guarantee; don't make things easy for them!

Don’t fix with new technology that which could be fixed by being a little smarter with your existing technology. You should always ask the question when being sold something new, can I do that already? It is amazing how often the answer is yes!

WAFL is a great technology with some great features but putting it in as a virtualisation technology would be an expensive mistake at the moment.  Put NetApp in because you want to use NetApp but don’t put it in to virtualise your existing environment unless you are prepared for a whole lot of work. Take it from me, I’ve looked at it.

If you simply want to virtualise and build a consolidated pool of disk, you might well be better looking at SVC or USP-V. If you are looking at re-engineering your environment, NetApp is one of many companies you should look at but you know that anyway.

Extreme Cash Cow – Redux

I’m still getting comments on the ‘Extreme Cash Cow’ entry I wrote last year as a diatribe against the current state of the SRM market. I feel it’s probably about time that I updated it.

Firstly, since then I have moved jobs and no longer have responsibility for day-to-day storage management, I do obviously keep an eye on things but unfortunately it means that I cannot really influence how SRM develops in my organisation. This is a bit sad as actually I had a very positive response from EMC who took the brunt of my diatribe and I have been unable to take them up on their offer to work with me on understanding what I would like to see from a SRM product. I do hope that someone in the organisation I work for does take them up and help them understand what a large storage end-user requires and where the issues are.

Anyway, I thought I would put some thoughts together about the challenges that SRM tools or at least pose some questions.

Is the problem one of scale and complexity? If you look at what we expect the SRM tool to do, we expect currently expect it to understand our storage environment end-to-end. So look at what an SRM tool needs to do.


1) It needs to understand the array and how that is configured – easy

2) It needs to understand the switch fabric – fairly easy

3) It needs to understand the IP fabric – moderately hard

4) It needs to understand the hosts/servers, including virtual – moderately hard

5) It needs to understand the applications – hard

6) It needs to be able to correlate all the above information into useable and consistent model – potentially very hard

So to be fair to the SRM vendors, what they are trying to do is non-trivial and we the end-users don’t always make their jobs easy. We have a duty to ensure that organisational standards are set, adhered to and maintained otherwise the data consistency checking becomes horrible. We have to give them a chance.

Do we want an end-to-end management tool which allows us to understand our whole IT infrastructure because the relationship between storage and data is intrinsically linked?

What do we actually want from a SRM tool which will make it useful to us so that we do not carry on cursing the vendors and writing our own scripts? Perhaps we should hand over the contents of our individual script/tools directories and say, we want a tool which does all this and does it reliably. Perhaps the SRM vendors should send out an investigatory team wearing red shirts to discover what the storage civilisations are up to?

We can probably say that we don’t want ECC and its ilk; perhaps SanScreen is closer to what we want. I suspect that is very much the case; we do not want an all-singing, all-dancing provisioning/configuration tool but we do want something which gives us an immediate view of our storage environment and allows us to drill down the layers into the individual components getting performance, capacity and configuration details? It would be incredibly useful if it understood the reality which is a heterogenous storage environment with SAN, NAS and in future Object/Cloud.

And vendors if you continue to expand the number of different storage families in your product range and do not standardise on your management APIs, interfaces etc, you are making your job harder. And even in a product family, 37 varieties of LUN are not making your job any easier. As part of the development track of any new feature; the question needs to be asked, how will this be managed and the question needs to be asked early in the development cycle.

So what do you want your SRM tool to do?

Deal or No Deal?

Stephen has a post here about pricing, about getting close to your vendor and developing a relationship with just a few trusted partners. Nice idea but in any relationship there needs to be some tension to keep it fresh and alive; otherwise you find yourselves doing something because you have always done so, it becomes comfortable.

Now comfort is all very well at home and in your personal life but when you are spending lots of money (your lots will vary) with a vendor, comfort is not good. I have walked into situations where the position has been far too comfortable and ultimately it becomes dangerous for both parties. You need to shake things up once in a while.

1) Single vendor relationships are not good to drive value. Competition is key, this does not mean that every bid should be competitive, there aren't enough hours in the day and everyone gets tired. But every eighteen months, pick a technology area and review it. Do both a technology review and a commercial review. Your review cycles may be shorter, it depends on your work load.

2) Review street prices regularly; there are a variety of sources for this, some formal, some informal. Vendors do not like it but they know it happens. But it is important to understand why prices differ; it could be size of organisation or it could be a prestige thing. Let your sales-man know that you are paying above street price and that you are reviewing things.

3) New requirements need to go competitive. A sign that things have got too comfortable is that you simply default giving new business to the incumbent.

4) Be aware of the market, talk to the incumbent about competitive products; let them know that you are aware of the competition. Attend trade-shows; talk to other vendors, I know sales-guys are irritating but when the incumbent phones up and finds you are in a meeting with their rivals, there will be a moment of doubt.

And when you are doing the deal, here's a few tips and thoughts. I am not a procurement expert but I have spent a few million here and there on storage.

1) List price is meaningless; discount levels are meaningless. Vendors should produce a cost of manufacturer and then try to negotiate a premium on this. Vendors also hate breaking things down by line-item as it reveals that you are paying massive amounts for commodity items. Do not let a vendor flannel you with a cost for a solution, get it broken down and understand what you are paying for.

2) Maintenance; you should always be able to negotiate improved maintenance terms. Hardware maintenance is pretty easy to get extended gratis, software is often harder. Review maintenance regularly; if a piece of software is at or close to it's terminal release, consider dropping the maintenance. If you really need it at a latter point, you can often re-instate, you'll have to pay the back maintenance but you'll likely not need it anyway. Make sure this is contractually agreed.

3) Technical refresh/take-out; if you are refreshing with the current vendor, only pay maintenance on one lot of kit whilst the refresh is happening. If you are refreshing with a new vendor, agree that the new maintenance/warranty period only starts when the migration is complete. Always try to negotiate a trade-in.

4) Software licensing; try to negotiate a pay for the amount you use as opposed to the pay for the whole frame!  And always try to agree that software licenses are transferable between frames.

5) A vendor TCO model is worthless unless they are willing to guarantee it without caveats. If they think their kit will save you money, skin in the game is key!

6) Training; I have had teams which have had more training than any other team in a department because I ensure that any deal is sweetened by the provision of training for 'free'. Big deals should come with free training and I am amazed at the number of people who do not leverage this.

7) One-off-deals; one-off-deals, you know the end of quarter/year specials? We all do them, we all regret them at times. Plan your one-off-deals, you know they're coming but treat them like a normal deal. You know when the vendor quarters/year-ends are; so try and align your procurement cycles if you can. And I've never had the pricing on a one-off-deal pulled because I have missed the cut-off.

8) Guaranteed price decline, the cost of kit goes down all the time; ensure that you've got a guaranteed price deflator on a quarter-by-quarter basis to reflect this.

You will not get all of the above but at least vendors will know that you are serious, that you are thinking about things. And if your sales-man agrees any kind of special, non-standard terms; get it in writing and keep the evidence. Sales-men move around a lot and the next guy may not honour a verbal, gentleman's agreement; get the evidence.

I am sure there's more tricks that I have forgotten or not even be aware of; please share!

Don’t Be Blinded by the Flash!

TSA has done a good if slightly skewed write-up on the current positioning of Enterprise Flash disks and what the various vendors are doing with them. And at the moment, they are all using them as faster spindles; as faster replacements to spinning rust. In the same way that I could remove my laptop/pc drive and replace it with a SSD; I can do it in an array at some extortionate price.

I know there’s been a fair amount of tweaking etc to get them into the various arrays but it doesn’t appear to be the proverbial rocket science! So can we have some rocket science now? 

We’ve actually looked at leveraging EFDs in our current environment but at the moment as well as the initial expense; the actual retrofitting into the environment is either a lot of work or it means buying a lot more SSD than we would possibly need. The layout in a shared array becomes arcane and possibly more complex than a mere ‘bod can hold in their heads. This does not mean that we won't do it but it's not a magic bullet; it'd probably be a lot easier if we were not looking to implement into an already running business critical environment.

I think if the industry simply looks at these as faster disk spindles; we are missing a huge opportunity to make storage a lot more efficient. We need to understand a lot more about what actually goes on at an application layer; for example, which individual files are hot and need to be stored on faster storage. In a hot LUN (a concept which must die); what percentage of the data is actually hot, is the whole LUN a hotspot or is it a tiny chunk of that LUN? If it's a tiny proportion of that LUN, is it not really wasteful to put the whole LUN onto this expensive resource.

Things like Sun’s ZFS hybrid disk pools might help a lot; we need to see more collaboration between the Operating System/File System and Storage Vendors to get this all to work efficiently and cost effectively.

No, you don’t need to make huge changes to use EFDs but to use them efficiently and not simply give the greedy storage vendors even more money; you might want to think about making huge changes.

Of course, by the time we've fixed the problems, EFDs will probably have come down so far in price that the economic issues will have gone away. So perhaps the solution is for all the vendors to take the hit now and bring the costs of EFDs down to the cost of traditional disk?

Doing More (with the same or less)

Capital budgets are pretty tight this year and IT teams all over the globe are being to do more with the same or less. This is good in many ways and hopefully we will overcome some of the profligacy of the past but the responsibility is in our (the end-users) hands, we can’t simply expect the vendors who have encouraged of profligacy in the past to come riding to our rescue. There is an old adage which depending on which continent you sit on goes either ‘Beware Greeks bearing gifts’ or ‘Beware Indians bearing gifts’ or perhaps in modern parlance ‘There aint no such thing as a free lunch’.

So when a vendor comes knocking on your door saying they save you 30/40/50% of your current opex/capex; you have to look very carefully and closely at what they are offering and before you do, I would suggest that there are a few things that you can do prior to this.

1) Go through your spreadsheets or however you manage your storage and find all that storage which was reserved for projects which never happened!
2) Whilst you’re at that; try and locate that storage which has been freed up from all those servers which have been consolidated and virtualised. Those are the ones that Control Center has been spamming you with alerts about and you studiously ignore as no-one is complaining about the server being down.
3) Tell users that there is no more storage available and that they should start deleting things. It is amazing how much stuff can be removed/archived. I’ve just taken over a new team and they have project documentation going back years. I’ve asked them to move all the stale project documentation into an archive directory which I will then zip. Will I gain huge amounts of storage back? No but I’ll get some back and if everyone did this, we’d get quite a lot back.
4) Look at your RAID levels; do you actually need everything at RAID-1? Even the Evil Machine Company no longer recommends RAID-1 for everything. You can generate huge wins with this. Last year, we allocated a lot more storage than we bought by doing this.
5) And whilst you do the above, perhaps take the chance to right size over-sized file-systems.

Okay, all this stuff takes time and time isn’t always available but before you buy ‘pixie dust’ from vendors; it might be worth having a look at doing some of the above. It’s not a free lunch but it’s a lunch you packed yourself and might be a little cheaper than that free lunch you’ve just been offered.

Just some thoughts anyway….

 

Ready?

The announcements are starting to pile up; nothing dramatic yet tho' but we are probably at the start of the year when most Storage vendors refresh their main product lines and it's going to be interesting to see the variety of approaches which are taken.

Am I expecting dramatic things from the big boys; almost certainly! Now, it is too soon for the economic downturn to have dramatically impacted what are often very long R&D cycles; so much of what we will see will have been in the labs for a long time. The downturn may have impacted some final feature sets with things like efficiency rising up the stack but these features were well up the list anyway.

When we talk about efficiency what do we mean? We can focus on the amount of storage we can actually use in our array, can we use 50% of that slow SATA disk before performance tails off and it becomes unuseable for anything else? Cheap/=efficient. We could focus on operational efficiency, how many heads does it take to manage your disk? How quickly can we service the business? What level of complexity do we expose to the business, what level of complexity is exposed to the storage administrators.

Actually, I'm expecting the big product announcements to focus on all these things. How do I make SSDs and SATA work efficiently and effectively? Note, SSDs and SATA; not FC! FC as disk technology will head to the toilet and look concerned about it's prospects of being flushed! How do I automate storage provisioning and tiering? The gravy train for basic storage admin will come off the rails, there will be some delays in this due to the wrong type of rain/snow/leaves but automation will become prevalent.

This with a whole raft of server virtualisation announcements will change the IT Infrastructure environment; I already see server/network admins looking greedily at the storage domain, wanting to take charge of this key part of the infrastructure. I don't see the currently harassed storage admins looking greedily the other way, I see a lot of wagons starting to go round in circles.

If you work in storage and want to continue to work in storage, it's time for you to start getting ready for a different world and start to prepare yourselves to move away from Hypers, Metas. LSSs, RDF, PPRC etc, etc; time to get ready to manage more end-to-end. Sure, you'll still be storage specialists but you'll need to know a lot more, it'll be fun!

An old manager of mine used to say 'A Career isn't a sprint, it's a marathon'; he was wrong, 'A Career isn't a sprint, it's a series of them'. Had another one who used to say 'Get focused or get f**ked'; he was more right but you've got to worry about the Bokeh, not just the focus!