Storagebod Rotating Header Image

What a Waste..

Despite the rapid changes in the storage industry at the moment, it is amazing how much everything stays the same. Despite compression, dedupe and other ways people try to reduce and manage the amount of data that they store; it still seems that storage infrastructure tends to waste many £1000s just by using it according to the vendor’s best practise.

I spend a lot of my time with clustered file-systems of one type or another; from Stornext to GPFS to OneFS to various open-source systems and the constant refrain comes back; you don’t want your utilisation running too high..certainly no-more than 80% or if you feeling really brave, 90%. But the thing about clustered file-systems is that they tend to be really large and wasting 10-20% of your capacity rapidly adds up to 10s of £1000s. This is already on-top of the normal data-protection overheads…

Of course, I could look utilising thin-provisioning but the way that we tend to use these large file-systems does not it lend itself to it; dedupe and compression rarely help either.

So I sit there with storage which the vendor will advise me not to use but I’ll tell you something, if I was to suggest that they didn’t charge me for that capacity? Dropped the licensing costs for the capacity that they recommend that I don’t use; I don’t see that happening anytime soon.

So I guess I’ll just have factor in that I am wasting 10-20% of my storage budget on capacity that I shouldn’t use and if I do; the first thing that the vendor will do if I raise a performance related support call is to suggest that I either reduce the amount of data that I store or spend even more money with them.

I guess it would be nice to be actually able to use what I buy without worrying about degrading performance if I actually use it all. 10% of that nice bit of steak you’ve just bought…don’t eat it, it’ll make you ill!


12 Comments

  1. TimC says:

    Are you also pissed that the vendor who made your car suggests you not drive it with your foot to the floor 24 hours out of the day. And that if you keep bringing it in with a blown engine, they’ll recommend you not keep your foot mashed to the floor 24 hours a day? You paid for a car that will let you push the pedal to the floor, but the recommendation is not to do it except for short period of time. Why is it such an issue that this fact pervades pretty much every facet of our lives? If you want something to work well for a long time, you don’t press it fully all the time. Vendors who allow you to run something 100% all of the time generally have designed it to work at 120% capacity and just artificially limit you.

  2. Car analogies…what is it with IT and car analogies…they don’t work!

    If you sell me 100 terabytes of storage; I think should be able to use 100 terabytes of storage. If you are selling me licensing around 100 terabytes of storage, I should be able to reasonably use that storage. If you want a car analogy; if you sell me a gallon of petrol, telling me I can only use 80% of it…well, that’s crazy!

    If I buy a house and I can only use 80% of the available space…

    Car analogies….the last refuge of the scoundrel!

  3. TimC says:

    Because cars and computers go hand in hand! IT guys (generally) love cars and guns, it’s just a something in the genetics.

    In any case, if you buy a house and try stacking lead plates floor to ceiling it will collapse. You absolutely are not allowed to use 100% of your house. They are not engineered to take the amount of weight you could potentially put on the floor. There is literally almost no product you can buy that is engineered to be used at 100% of it’s capacity.

  4. Alex says:

    I wonder what industry TimC works in ?

    Just to refute tim’s argument….

    Pretty much every container used in the food industry – think how fully your milk carton is ?
    Pretty much every container used in the shipping industry – what shipper wants to carry empty space ?
    Starbucks coffee cups
    Pint glasses

    The first two industries have especially big incentives to not carry empty space around, and who likes a half full pint ?

    So apart from the global shipping, food distribution and drinks retail industries – you are spot on.

    Alex

    1. TimC says:

      Really? So food packaging designers design their products to be utilized at exactly 100% capacity? A gallon of milk weighs approximately 9 lbs. A gallon milk container holds WAYYY more than 9lbs. Try filling a gallon container of milk to the brim with wet sand. You’ve nearly doubled the weight of the milk, and the carton won’t break. Why? Because they engineered it to hold more. A gallon milk container was absolutely designed to hold more than what they ship you. You’re going to have to try again.

  5. OneFS Customer says:

    i wish we only wasted 10-20% capacity. we store many millions of small files that get doubled and tripled (under 128k) causing our logical vs physical size to be a joke on isilon (+3x). anyone who stores many small files, run from isilon as fast as you can because the product will waste a ton of space and only frustrate you.

    we ran from emc/celerra misery to isilon and it broke our hearts to have emc acquire them…

  6. John Martin says:

    Martin,
    I kind of agree with you and I kind of don’t. Firstly empty space is valuable for a variety of reasons, maybe its just because I’ve finished reading the sayings of Lao Tzu where he talks about emptiness and fullness “The utility of a house comes from the empty space inside of it” .. . On a more practical side of things reasonable amounts of unused space makes capacity planning decisions easier and allows a useful amount of slack within the provision processes, so its not just an esoteric eastern philosophy thing.

    Having said that IF you buy a storage container that is advertised to contain X amount of storage at a certain performance level then IMHO you should be able to store X amount without dire warnings about performance implications. Most SSD’s have much more actual capacity in them than you get to use for exactly this reaso. In some cases I’ve heard the usable amount presented to the customer is less than half of the flash installed in the unit due to performance and durability reserves. I suspect that if customers were asked to put flash chips inside of SSD’s rather than having them as unopenable boxes, that hard questions would be asked of SSD vendors about flash chip capacity efficiency.

    Enterprise Storage Array vendors could choose to do the same thing with disk arrays and filesystems, but the amount you need to reserve would need to change depending on the workload. I’ve sometimes argued that NetApp should provide an option to increase the WAFL reserve from the default 10% (which is a reasonably good number for a typical home-directory workload), to 20% for high throughput random I/O intensive write workloads, or for write seldom read many style workloads to drop it to some smaller number like 5%. Having said that, given the amount of flack that NetApp has had around “WAFL Tax”, I’m not surprised that this isn’t an entirely popular option.

    The other problem is that most people have very little ability to control the kinds of workloads that are placed on their arrays over the lifetime of the array, so determining the correct “hidden” reserve percentage in the first place would be incredibly difficult. Plus I also suspect that if some admins had the ability to dial down the reserves when they’re running out of space they’d do that even at the risk of killing performance.

    In my experience this issue you raise is partly because of I/O density issues generally which I wrote about this issue almost 2 years ago in a blog post here http://storagewithoutborders.com/2011/09/10/how-does-capacity-utilisation-affect-performance/ but it also partly due to the challenges that write allocation algorithms have when trying to find new places to write new data …to disk .. as the systems fill, it gets harder to find new places to write data efficiently. Disks are tricky beasts that don’t like being driven to much more than 60-70% utilization before they start doing unpredictable things with latency response times. Flash (or at least SSDs) get around this problem partly by hiding the capacity reserves needed for consistent performance and partly by their very nature of not having to worry about what order is the most efficient way of moving a drive head to access the blocks on a disk. As flash gets cheaper and more filesystems work out how to make really effective use of it, I think you’ll get what you’re looking for, and I don’t think that time is very far away at all.

  7. Jason says:

    I thought it a very simple and logical question.

    If I buy 100TB of storage pitched by the vendor as being able to deliver sustained 500k IOPS … then I should be able to use all 100TB and still get 500k IOPS.

    If the performance is predicated on a minimum free overhead or a utilisation ceiling then either I have been sold something that is not fit for purpose or the vendor needs to make provision for that in the device specification.

    Given that its usually the case that you cant add more useable storage to your array without obtaining the appropriate upsized licence, why cant the storage vendors roll in a utilisation metric that allows for just this scenario?

    No real impact outside of the customer getting what they ordered, other than the vendor having to cover the cost of the 10% or 20% of overhead which they would just pass on in the cost of the storage anyway … oh wait isn’t that what’s fundamentally happening now?

  8. As users get more and more used to a metered environment; this sort of question and requirement is going to become common. Now despite the fact that Enterprise Storage environments are not Cloud or even Cloud-like; our users and financial overlords are getting used to purchasing IT services like this, in their personal and professional lives.

    So you try telling a user who you have provisioned a 100 terabytes of storage; that actually they can use 10-20% of them. You can witter on as much as you like, they can still see that they are only using 80% of it. Of course, you might decide to charge in a more complex method and look at IOPs; then we need a very simple way of showing users how many IOPs that they are using.

    Storage is complex, users understanding of it is not quite so complex…although SSDs have helped in that they understand that it’s not all about capacity.

    But it might be about time for vendors to look at different charging models; perhaps as suggested useable space at a certain amount of guaranteed IOPs. Oh look, some vendors are beginning to move to this but we probably need all vendors to do so to make it work in the marketplace.

    1. Rob says:

      Why would you need to tell a user they can’t use 10-20% of their provisioned storage? Shouldn’t we as the admins be handling that overhead prior to giving them storage?

      We face the same 10% recommendation on our system. All our projections and usable space accounts for not having that 10%. It needs to be part of the calculation from the beginning. The problem comes in if the vendor isn’t up front about this overhead.

      1. Often it is not the end-user; although when they hit df or such commands and see that they’ve still got capacity; they expect to be able to use it. And you can’t really hide it from them. But actually it is the end-user manager/customer who most often gets the hump; they end up paying of their budgets for storage that they can see but can’t use. And then try to have a discussion with a finance and procurement team; yes…we’ve bought a petabyte of storage but once we’ve taken into account that we need to protect using an appropriate protection level and the overhead for best performance; you’ve only got half the storage…it’s not an easy conversation.

        And btw, when you start down the road of clustered file-systems, the overhead even after RAID protection can be be 20-30%; you are generally paying licensing on the whole file-system.

  9. Al says:

    ‘Take 1’ – Disk overheads:
    A bunch of disks (or SSDs) that have a physical capacity of size X.
    Remove capacity for the low level format to use on whatever storage system you want (A).
    Remove capacity for use as hot spares (B)
    Remove capacity for use as ‘wear balancing’ (SSD only) (C)
    Remove capacity for parity when configured into some sort of protection grouping (RAID?) (D).
    Remove capacity for file system structure when formatted to an OS (E).
    Remove capacity required for OS housekeeping of file system (F).
    Keep a capacity margin to allow for growth (G)
    Take off a bit more capacity (I must have forgotten something) (H)*
    Real usable capacity = X – (A+B+C+D+E+F+G+H).

    Any customer should be able to understand at least some of these. If they don’t, then explain. Create a PowerPoint and educate them.

    ‘Take 2’ – Real world manufacturer metrics (the obligatory car analogy) :
    Speed – You’re almost never legally allowed to use the car at that speed. Anyway how often are the roads clear enough anyway?
    Passengers – What percentage of time is this relevant? Most cars are either parked or frequently contain only the driver. (Anyone car share?)
    Consumption – Does anyone actually get the quoted mpg? Real life conditions are not the same as the rolling road in a building in Korea. When did you last check? How heavy is your foot?
    Trunk size – Who cares about standard suitcases? Why isn’t it measured in Ikea boxes? How often is it full anyway?
    Torque, power, displacement – Who but the manufacturer measures these? Ever measured your torque?

    Did you pay for all of those – sure! But when did you last go back to the salesman and complain about your under utilization against any of those metrics?

    In this “metered environment” it is incumbent upon the storage provider to educate their customers about the overheads. If one of them is really only a vendor limit (and you can’t change vendor or – more likely – can’t find one who doesn’t also have the limit) then explain that. Explain how it impacts their requirements and utilization in the real world. Or alternatively take a lead from the car salesmen and provide metrics that avoid the question altogether.

    Too much? Then just charge them for the used capacity and increase the per/TB price accordingly!.

    (* Database structure, system files, block size etc. etc. …)

Leave a Reply

Your email address will not be published. Required fields are marked *