Storagebod Rotating Header Image

Cloud

Unclear Utility….

So one of the largest public clouds has suffered a major outage; it was going to happen as were the inevitable recriminations, crowing competitors and a general Nelson like ‘Ha ha’ from many. As a dawning of reality on many about the nature of single sourcing, lack of meaningful SLAs, credulous users and lack of architectural vision from many; do we find ourselves at a crossroads with regards to Cloud?

Amazon should be and are taking some heat on this; their lack of transparency has been frightening many for some time. The AWS dashboard is about as useful as your average ISPs and needs to be much more detailed or at least allow the expert user to drill down further. In this kind of event, accurate information is essential.

Public Cloud Providers such as Amazon need to take a lead in guiding their customers into how to design ‘infrastructure’ on top of Public Clouds for availability. This needs a full breakdown of risks and mitigation that a user can take to remove SPOFs.

For many the promise of Cloud was that you can get rid of your costly and obstructive infrastructure teams; these teams can be obstructive and costly because they carry the wounds of the past but don’t for one moment believe that getting rid of these teams will stop you suffering the wounds. The principles of availability, reliability and supportability are as valid now as they were in the decades gone by.

Anyone who has worked in Enterprise IT will tell you; ‘proof of concepts’ rapidly become  products and you are suddenly running an infrastructure which takes no account of your non-functional requirements. The agility that Amazon almost formalizes and legitimises this practise. Your business model maybe predicated on Public Cloud but don’t imagine  that this does not mean you can yet absolve yourself of the understanding of infrastructure and it’s impact on your applications and business.

We aren’t yet at the stage where computing is provided as a pure utility like electricity and even then, companies still have back-up generators, M&E teams, Facilities teams etc.

Yet, despite this the advantages of something like AWS to getting product to market can be amazing; we’ve seen a startling renaissance of new businesses getting up and running very quickly; many of these businesses may actually fail but lets hope that they fail because they were bad ideas and not because they built their infrastructure on sand.

I think that we are at cross-roads or at least at a significant point in the journey; I think we shall see some shoring-up and some rethinking but ultimately, the pain will be worthwhile. The utility providers such as Amazon will have to think a bit more about how they build their infrastructure, the same way the traditional utility companies had to; users will have to understand the nature of the utility and what it means if that utility is not available.

However, it’s far too early to say whether the model is flawed or not; expect more failures but expect their impact to diminish as people understand better how to design for failure. Expect new development techniques but also traditional architectural disciplines to have a place.

And expect more navel gazing from the likes of me and other wannabe pundits…if things like Amazon didn’t fail once in a whilst; we’d have to write about successes and that’s just boring!

Evil Cloud!

There is much to say about the AWS outage and there is much that has already been said about the outage. However whilst the outage was happening I was at the UK National Science Fiction Convention (yes, I am that much of a geek) and one of our favourite t-shirt designers had the following t-shirt design available..

Evil Cloud

It seemed very appropriate and made me laugh, so obviously I bought one. But if you want one too and I suggest that you all do, you can get them from Genki Gear. They’ve got lots of other cool designs and I won’t be held responsible for your credit card bills!

They obviously have very little idea as to why it is so funny but I think all ‘Cloud Experts’ can appreciate it, no matter where they stand on Cloud!

Tempo

In my youth, I played a fair amount of chess to a reasonable club standard and there was often discussion about gaining a tempo. Gaining tempo was the sign of a good and clever player; one who could achieve desired position fewer moves than normally expected or at least forcing the opponent into making moves and hence wasting them. There was always a certain amount of debate about what tempo was worth and whether a sacrifice was worth making in order to gain tempo; the answer was ‘it depends’. A very good player can often make tempo pay off where as a less good player would probably be better sticking to a more conventional position.

I am beginning to look at Cloud as a way of businesses gaining tempo; public clouds can allow new start-ups gain tempo early on in their development, this speed of deployment can allow position to be developed which in a conventional infrastructure deployment would take much longer. It allows experimentation and flexibility which is not a usually available but there is a risk that you can manoeuvre yourself into a position that is hard to get out of longer term. You still need to develop a strategy and just like chess, you need to look at the moves ahead.

More established companies can use the public cloud in a similar way to start-ups but also with an established infrastructure and organisational structures; private Cloud may be a way to build flexibility and agility into the organisation but often at the sacrifice of some budget.

Either way, once you have built your Cloud infrastructure; you will almost certainly gain tempo both over your competitors but also your own internal customers.

But tempo is pointless without some kind of idea where it takes you.  I think that is probably the biggest risk of all; if Cloud changes nothing then you might have been better sticking to a conventional strategy.

If we are talking chess, think Karpov and Kasparov.

Email BackUp

Okay, so you’ve read the Gmail horror story and now feel a little bit vulnerable; well this product referenced by Lifehacker looks quite sweet. Yes, it is Windows only unfortunately but you might want to give it a spin and then back-up the files to your favourite other Cloud-provider.

Cluster Fscked

For another project, I had reason to try and find out the limits of the OnTap 8 Cluster-Mode; apart from the fact that NetApp need taking out and shooting with regards to their documentation, I’ve ended up reading more about Cluster-Mode than was probably entirely necessary for this project but it has been fascinating and some subsequent conversations with people have left with a very distinct impression that few people really understand what Cluster-Mode does. I made assumptions about what it was based on biases and prejudices from using other products.

For some reason, I equated OnTap 8 Cluster-Mode with SONAS and Isilon for example; this was a mistake, NetApp have as per usual taken their own approach and have produced something quite different. It is probably useful to understand some terminology.

Global Namespace is a term which gets thrown about a lot and is going to probably cause confusion when comparing products.

A File System is a method of storing and organising computer files and their data.

A Global NameSpace provides an aggregated view of an Enterprise’s File Systems allowing files to be accessed without knowing where they physically live; it is a virtual directory structure.

NetApp’s Cluster-Mode is really the aggregation of a number of NetApp Filers and volumes into a single virtual server, providing a Global Namespace for this aggregation, meaning that instead of having to know which filer a share lives on, it is possible to move from mounts which look like:

\\server1\share1
\\server2\share2

to mounts which look more like

\\company\share1
\\company\share2

This is obviously a lot easier for the end-users and means migrations etc do not require reconfiguration of end-user machines. This is a good thing!

OnTap 8 Cluster-Mode does allow you to aggregate performance of up to 24 filers; all of them can see all of the data and serve it out. But as far as I understand and I will be happy to be corrected, you are still limited to file-system sizes of 100 Tb i.e the maximum size of a 64-bit aggregate and also as far as I can tell, you are limited to having the file-system owned by a single HA pair. Data is not striped across all the nodes and the loss of an HA pair will result in the file-systems being hosted by that pair going off-line.

Isilon, IBM and HP in their products allow file-system sizes far in excess of this and measure their sizes in petabytes; this is because at their back-end, there is a true cluster file-system. This enables an individual directory for example to be considerably larger than 100 Tb or even individual files to be larger than this. Some of you are probably now shaking your heads and wondering what sort of files could possibly be that large; I will admit that 100 Tb would a very large file but files in excess of 1 Tb are not that uncommon in my world and a 100 Tb file-system could only have 100 of these in, not that much really.

A single file-system can also be made up of pools of storage which performs in different ways and can be fairly easily be automatically tiered; you can have files in the same directory which actually live on differing tiers of storage. Your home directory for example, those files which you are currently working on could be sitting on active disk whereas the files which are gathering dust could live on spun-down SATA.

Isilon, IBM and HP don’t have to implement a Global Namespace because their file systems are large enough and are distributed in such a way that a single file system could provide all file space required by an Enterprise.

Now NetApp’s approach does have some huge advantages when building Secure Multi-Tenancy for example and it allows for a very granular approach for replication etc. Also NetApp don’t have to deal with huge amounts of meta-data and their file-locking is probably easier but it is different.

There  is certainly a take-away from my research….

Global Namespace != Single Huge Filesystem

Now perhaps you already knew this but I suspect many were under the same delusion as me! And does this mean that NetApp don’t have a true Scale-Out Solution? I can certainly make arguments for both views.

Bring on OnTap 9! I reckon that’s the release which will combine it all including Striped Volumes from 7G.

p.s If I am completely wrong; it’s NetApp’s fault, the documentation and description of Cluster-mode is rubbish! Truly terrible! And yes, I could get someone from NetApp to come and talk to me about it but if I can’t get it from the documentation…you fscked up!

Take Ownership and Back-Up!

And so the humble subject of backup bubbles up again as Google manage to loose 150,000 Gmail accounts. Yes, they will probably restore them and I suspect that most people will get their precious emails back. But people, you really need to take responsibility for your own data!

Google even tell you how to back-up your email using POP3

‘Backing up your mail with POP’

Learn to back-up everything that you are storing in the Cloud. It’s not interesting, it’s not exciting but it could save you a whole lot of time and hassle.

Not Quite As Grumpy

Now some people might think I was being a bit unfair picking on EMC/Isilon in my previous entry and to be honest I was but in a good cause. I’d picked up on a story which ‘Zilla retweeted a link to and it actually gave me the hook to something that I had been intended to write for some time.

IT Procurement policies need to change to reflect new paradigms and technologies. In the story that I picked apart, the customer bought 1.4 petabytes of Isilon disk up front; the customer only needed 200 terabytes of disk to cope with the first year of growth with an additional 340 terabytes per year there-after.

Isilon’s technolgy would make it easy to simply purchase additional nodes as needed and there was probably no technical reason for this up-front procurement. But I suspect the up-front was a requirement due to the way that the project was financed and budgeted, there was no internal mechanism to procure capacity when needed.

Now I suspect EMC/Isilon would have offered a decent additional discount for the initial up-front purchase but would it have offset the additional maintenance, power-requirements etc that would be accrued by early installation? And would it offset the year-on-year price decline of storage?

I can’t really blame a vendor for taking all the money up front and I certainly would not blame the sales-team for accepting an order which might have blown their targets. (There is a downside to the sale-team tho’; next year’s target might expect that you do another large deal again even though you have just sold a customer three years capacity).

I have read people talking about the possibilities and opportunities for IT to move to a purely Opex model using the Public Cloud; let’s get real for a minute, we are not currently capable of tweaking our Capex models to reflect a reality where money does not need to all spent up front. What are the odds that we can completely change our finance models to an Opex model? In my reality, we are constantly being challenged to reduce our Opex model; we often do this with Capex purchases. I’m not sure suggesting increasing my Opex even with a decreased Capex would meet with an entirely favourable reaction.

If the Capex model was tweaked to allow a more linear and time-based investment; I could have an even more dramatic impact on Opex and also reduce Capex expenditure but I’m a simple IT manager, what do I know?

Cloud and Dynamic IT is technically achievable in large parts today and not at huge disruptive cost but it does require disruptive thinking.

Big Data, Little Information?

Big Data like Cloud Computing is going to be one of those phrases which like a bar of soap is hard to grasp and get hold of; the firmer the hold you feel you have on the definition, the more likely you are to be wrong. The thing about Big Data is that it does not have to be big; the storage vendors want you to think that it is all about size but it isn’t necessarily.

Many people reading this blog might think that I am extremely comfortable with Big Data; hey, it’s part of my day-to-day work isn’t it? Dealing with large HD files, that’s Big Data isn’t it? Well, they are certainly large but are they Big Data. The answer could be yes but the answer generally today is no. As I say Big Data is not about size or general bigness.

But if it’s not big, what is Big Data? Okay in my mind, Big Data is about data-points and analysing these data-points to produce some kind of meaningful information; in my mind, I have a little mantra which I repeat to myself when thinking about Big Data; ‘Big Data becomes Little Information’.

The number of data-points that we now collect about an interaction of some sort is huge; we are massively increasing the resolution of data collection for pretty much interaction we make. Retail web-sites can analyse your whole path through a web-site; not just the clicks you make but the time you hover over a particular option, this results in hundreds of data-points per visit and these data-points are individually quite small and actually collectively may result in a relatively small data-set.

Take a social media web-site like Twitter for example; a tweet being a 140 characters, so even if we allow a 50% overhead for other information about the tweet, it could be stored in 210 bytes and I suspect possibly even less; a billion tweets (an American billion) would take up about 200 Gigabytes by my calculations. But to process these data-points into useful information will need something considerably more powerful than a 200 Gigabyte disk.

A soccer match for instance could be analysed in a lot more detail than it is at the moment and could generate Big Data; so those HD files that we are storing could be used to produce Big Data to then produce Little Information. The Big Data will probably be much smaller than the original Data-set and the resulting information will almost certainly be much smaller.

And then of course there is everyone’s favourite Big Data; the Large Hadron Collider, now that certainly does produce Big Big Data but let’s be honest, there aren’t that many Large Hadron Colliders out there. Actually a couple of years ago; I attended a talk by one of the scientists involved with the LHC and CERN all about their data storage strategies and some of things they do. Let me tell you, they do some insane things including writing bespoke tape-drivers to use the redundant tracks on some tape formats and he also admitted that they could probably get away with loosing nearly of their data and still derive useful results.

That latter point may actually be true of much of the Big Data out there and that is going to be something interesting to deal with; your Big Data is important but not that important.

So I think the biggest point to consider about Big Data is that it doesn’t have to be large but it’s probably different to anything you’ve dealt with so far.

Verily, tis Consolidated Evil

Okay, despite the title; I don't actually think that VCE is Evil, certainly no more so than any other IT company. As we move to an increasingly virtualised data-centre environment; I believe that VCE does have something to offer to the market, the more vertically integrated a stack, the more chance that it is going to work within design parameters and with a reduced management footprint. 

Yet, I do have a problem with VCE and that is it is not consolidated enough; it's like some kind of trial marriage or perhaps some troilistic civil partnership. (And I have similar problems with the even looser Flexpod arrangement with NetApp/Cisco/VMWare.) 

As a customer; it seems that I am expected to make a huge commitment and move to an homogeneous infrastructure even if it is some kind of 'virtual homogeneity'. Yes, there are management benefits but the problem with both vBlock and Flexpods is what happens when the relationship founders and fractures? What happens to my investment in consolidated management tools to support these environments; where is the five, ten year roadmap? And where is the commitment to deliver? Who gets custody of the kids?

Are the relationships going to last more than one depreciation/refresh cycle? Perhaps the problem is that the evil is not consolidated enough? Or am I just cynical in expecting the relationship to fail? I would say that the odds are that one of them will fail, you?

Perhaps Cisco/EMC/NetApp should de-risk by all merging!

Elegance

I went to the London Turing Lecture given by a very cold-ridden and croaky Donald Knuth; you always know things are going to be interesting when the speaker opens with comments along the lines of 'I got the idea for the format of this lecture from my colleague and friend at Caltech in the 60s; Richard Feynman'. See even the great Donald Knuth can name-drop with the best of them.  The format was of a question and answer session where Donald took questions on any subject from the floor and I believe that it will be available to watch as a web-cast; please note that he was very cold-ridden and it's probably not his best 'performance'.

Don for a long time has been talking about Literate Programming and that programs should be written for human beings to read and not just for computers to process; arguing that 'They tend to work more correctly, are easier to change, easier to maintain, and just a pleasure to write and to read'. He is passionate that code can be beautiful and art; funnily enough I feel very similarly about IT infrastructure and I think that is what potentially Cloud can bring to the world of infrastructure.  

I'm not sure we can have a 'Literate Infrastructure' but I wonder if we can get to 'Elegant Infrastructure'; I come across infrastructures all the time which make me question the byzantine perversity of infrastructure architects and designers. At times it is like an artist who has decided to throw all the colours in his palette at a canvas with little understanding of aesthetics and form; yes, you can do this but you really have to understand what you are doing and unless you are very good, you will simply produce a mess. 

This is why the various block-based infrastructures are potentially so appealing (this is not a discussion as to the merits of vBlock versus Flex-pod versus another-Lego-block) as they restrict the tendencies of techies to throw everything but the kitchen sink at a problem. Yet the most stringent advocate of these infrastructures has to acknowledge that they will not solve every problem and at times, a little subtle complexity is more elegant than adding more and more blocks. 

The infrastructures of the future will be simple, understandable but not necessarily devoid of colour and subtlety. Otherwise we'll fall into another trap that Don hates; 'the 90% good enough' trap. Infrastructure needs to be 100% good enough; 90% won't do because 90% will not be easy to manage or understand. I think this is the challenge that the vendors will face as they try to understand what they are selling and creating.