Storagebod Rotating Header Image

Storage

Open Source Scale Out Storage

So you want to build yourself a storage cloud but you don’t have the readies to build one using one of the commercial products which are available.  Well, don’t worry, there are open source alternatives which might allow you to get a taste of Scale Out without breaking the bank.

Gluster is one such open source alternative and is now part of OpenStack, the open source cloud computing platform being built by a number of developers and vendors.

Gluster is available as a commercial software appliance or you could simply download the packages and install it on a variety of Linux distributions including Ubuntu and Redhat-derived Linux distributions. I have recently built a small cluster using Scientific Linux 6.0 (Scientific Linux is my new favourite Redhat derived Linux and SL 6.0 is based on RHEL 6) and ESXi.

The initial set-up is pretty easy and it took me less than a couple of hours to stand-up a three node cluster and build a small environment. The documentation is clear and should be simple for anyone with a modicum of Linux knowledge to follow.

I will give people a couple of tips; if you do not want to play with IPtables, turn them off to get yourself up and running. And the latest version of Gluster requires rsync 3.0.7 for its geo-replication; there does not appear to be a RPM for RHEL 6.0 at present, just use the Fedora RPM and that appears to work fine.

Adding additional nodes is simple and I’ve quickly added a fourth virtual node non-disruptively; then it is simply a case of telling gluster to rebalance the files across the nodes.

But only supporting Linux means that if you want to serve files to other operating systems, this means utilising NFS and CIFS. There seems to be conflicting information on whether Gluster supports  CTDB and the necessary locking; so at the present I am only exporting NFS  from a single node with no fail-over support yet. My next experiment will be to see if I can get it configured as a true scale-out NAS solution.

I will let you know how I get on!!

 

AoE – Video 2000

Recently Coraid have been trying to make noise again about their storage arrays and ATA Over Ethernet. I’ve always dismissed it but never tried AoE  so I thought it was about time that I at least gave it a try before dismissing it.

I quickly brought up another Linux server on one of my lab boxes and loaded in the AoE modules; grabbed the vblade package (vblade is the package which allows you to serve AoE targets from a Linux server), a quick peruse of the how-to and set up the server instance to serve an AoE target. I must admit that it was ridiculously simple, a lot more simple than configuring iSCSI targets.

I then downloaded the Starwind software initiator for AoE for Windows, installed it, fired it up; it automatically  found the target that I had already configured on the Linux server and I added that in.

Job done; from start to finish, excluding the time to install a new Linux server instance; it probably took me fifteen to twenty minutes. Not bad considering I’d never looked at AoE before.

So if it is this simple, why are AoE and Coraid not making a big splash?

It’s certainly simpler than iSCSI and I get about 20-30% better performance than iSCSI from my initial tests from the same hardware.

It’s quite simple really; it’s just too late, in the same way Video 2000 was too late to make an impact in the video market despite being a superior format,  AoE and Co-Raid just don’t have the momentum and I just don’t see them ever getting it. Their best chance is to get some of the smaller home/SMB network storage vendors to pick it up and try to build ground-swell there but it’s going to be tough.

But if you have an hour to kill, you might as well give it a go…

If Buddha was Welsh would he still go OM?

Commodity hardware, commodity disk, commodity platforms are enabling interesting developments in the world of storage infrastructure; especially in what I like to think of as Boutique storage which is probably a posh and pretentious way of saying Niche!

Object Matrix with their MatrixStore product is one such development; aimed squarely at the video market with post houses, in-house video production teams and anyone who wants store and archive video assets in cost-effective and efficient manner; the team has developed an object store which has quietly been becoming a bit of a hit in the UK and is now spreading far from its roots in Wales.

Chatting with Nick Pearce, one of the co-founders, you find that he has a background in object storage spreading back into to the early days of Centera being responsible for…well, I’ll let him admit to his folly but suffice to say, he is a bit of veteran with all the scars required to mean that he knows of what he speaks and sells.

But most importantly, he knows that they don’t need to spend a fortune on hardware RnD and that they can concentrate on building a robust software product which runs on top of an open source operating system. They have decided to still ship with a hardware platform, Nick explained that this is to minimise the sheer amount of testing and verification that they have to do which allows them to keep support and development costs down. However it is still built on industry standard components which allows them to leverage performance and capability quickly and easily.

We can expect improvements in network capability, although it is not really a bottle neck for them at the moment; these are NL and archive solutions and not designed to support live streaming and editing.

We can also expect them to leverage the newer large capacity hard drives in the near future; further enhancing the value proposition; talk to Object Matrix if you want their costs but even list prices are actually very attractive and took me aback. These guys are not about building out of commodity components and sticking a crazy premium on top like some of the other broadcast specialist companies.

Supporting compliance, distributed search, off-site replication, application integration and an exposed API; MatrixStore might actually be the product that many post-production teams are looking for and the guys actually talk your language. They use the same jargon, they know the same people and they know the industry dirt…and they won’t turn up in suits looking like they are here to collect the rent. They have the local knowledge which is very important in this space but they also know the IT side; they know that Broadcast IT is more about ethernet cables than SDi cables these days but they do know what an SDi cable is.

More importantly, Object Matrix are no longer what I would class as a start-up and they dropped some seriously large names in conversation; they are quietly building themselves a market position. No, you won’t come across them in your IT department but if you look beyond, you might find that your in-house video production team has one or two stuck in cupboard pretending to be something innocuous.

And although they are rather cautious about moving beyond the video and broadcast arena, it might be worth having a look at them if you have an object storage requirement…

They also have some interesting ideas as to what to do with all these extra CPU cores; things which address some of the potential data-refresh and format challenges going forward.

[Disclaimer: Nick Pearce bought me a couple of drinks and ham, egg and chips but if you think that’d sway my opinion, I’ve got a bridge in the centre of London to sell you.]

Future Postive

Chuck’s post about Big Data Storage (as opposed to your tiny data storage) is pretty much on the money; simplicity is very much the key and it is this ‘no frills’ approach to storage which means that small teams can manage large amounts of storage with the minimum amount of fuss.

The key to us is the ability to scale quickly and easily; adding storage in and then simply using software to do clever stuff like balancing; be it OneFS, GPFS or StorNext, the job is relatively simple assuming that you’ve done the initial set-up correctly.

Much about what he says about Snaps, DeDupe and the other features that the more traditional general purpose storage arrays have also rings very true. Much of the data we deal with does not lend itself to DeDupe and there are some other interesting aspects of the data we deal with; once the file is written, it is never changed.

Think of it like a RAW file from a digital camera; you don’t actually change the file, you may develop the file, in some applications, we may save a file which details the edits and transformations which are required to produce the processed file and in others, we may save a copy of the processed file.

Replication and archive is handled at an application level; we could do it at the file-system or storage level but it is easier to let the application handle it and this means we can be completely storage-agnostic.

We are relinquishing a certain amount of control and empowering the user to take responsibility for their environment. They decide on the number of replicas they require and to a certain extent, they decide as to where these replicas are stored; if they want to store eight copies locally, then they are empowered to do so.

We do need better instrumentation to allow us to look at ways of telling them exactly how much data they are storing and also how much bandwidth etc they are consuming. This could develop into a charge-back system but I suspect it will be more an awareness exercise. It would also allow us to model the impact of moving to a public cloud provider for instance where both available bandwidth and bandwidth consumption are important factors.

Looking at this model may have longer term implications for general purpose storage; if VMware and other operating systems continue to add features such as snapshot and replication into their software stacks; then the back-end array type becomes almost entirely commodity.

If we consider the impact of caching flash moving up into the server stack; yet more array functionality becomes server-level functionality. Of course, there are still challenges in this space with ensuring that the array is aware and co-operating with the server but if the replication and other functionalities are actually carried out at the server level, then this becomes more feasible.

I can imagine a time where the actual brains of the array are completely virtualised and live on the servers; virtualised VMAXs, VNXs, Filers, v7000s etc are all within the realms of possibility in the near future and probably exist today in various vendor labs. The rust/SSDs become a completely commodity item.

Where does this leave the storage team? In the same place it is today, at the core of the Enterprise working to provide simple and effective access to the information asset; they may be working at different levels and collaborating with different people but they’ll still be there. But they might have smiles on their faces instead of frowns…now there’s a thought!

Beyond Tiresome….

Please stop it now….

Would NetApp please stop using their infamous CX3 Model 40 benchmark; you can’t even buy that array from EMC any more and haven’t been able to do so for some time. Weasel-worded apologies that you can’t find any newer data is just that….weasel-words!

Move on, you’ll be better for it! And if you constantly compare yourself to outdated technology, it really does not do you any favours at all!

 

The Dynamics of a Relationship

After his no more tiers gaffe and I will still maintain it was a bit of a gaffe; Georgens has redeemed himself in my eyes with his statements on the IBM relationship, smart guy seems to really understand the OEM dynamic.

Lifted from Chris’ excellent article at the Register

“The desire by their [IBMs] internal groups to develop their own products makes the positioning very, very complicated. And are we happy with the positioning? No. On the other hand, our engagement with IBM’s customer facing groups, the people who actually have to put solutions in front of customers, that relationship is actually exceptionally strong. So I think that an approach to storage from a pure platform perspective and basically creating SAN products and NAS products and unified products in a very, very hardware point of view is interesting, but it’s just recreating the fractured product line that’s given us an opportunity to gain share … IBM has introduced products that are competitive with both Engenio and with the NetApp offerings over the years, yet this business still continues to grow, it still continues to be robust.

I think that [IBM] internal groups are looking to compete and develop competitive products and if they truly are competitive and they truly can compete with our feature set both from a hardware and a software perspective, clearly demand will shift in that direction. But if we continue to out-innovate them and have a higher development gains and introduce products to market faster, then we’ll preserve the business. It’s no different. It all comes down to innovation and execution excellence. And it’s been that way for the last five years and that’s the nature of the OEM business.”

It really represents the dynamic that I see being played out as an IBM customer; deep down IBM don’t really want to sell the NetApp stuff but whilst the NetApp stuff is better than theirs, they’ll take the margin and revenue. But there’s no illusion, if SONAS and v7000 can really gain a foothold; IBM will be off.

As a customer, it’s really interesting seeing this play out and if at the end of the day, IBM end up with viable, thriving and credible storage story of their own; they are going to owe NetApp big time for giving them the space and the time to do so. But Georgens’ confidence in his own product and its ability to keep its place in the IBM portfolio speaks volumes.

Now Val, about this roadmap you keep promising?

More Storage Apparel

Okay, another t-shirt for the storage geek!

Can be found here!

Back to Excel!

When I started this blog what seems years ago; people got used to me railing against the state of storage management tools and especially SRM tools.

Since then I’ve changed roles and moved away from the more traditional corporate back-office storage environment and I really do not have the same level of exposure to the SRM tools and we have a much more simple infrastructure.

But this is beginning to change, our environment is growing incredibly quickly as we store more and more content and as we start to roll out more traditional IT storage to support our Creative teams; with over 20 arrays, multiple clustered NAS environments, multiple tape-libaries and back/archive environments, I am starting to look for a tool to manage and report on this heterogeneous infrastructure.

So I am starting to have another look at the state of the art with regards to Storage Management tools again….And sadly I am finding that the storage of the art is still woeful; the tools appear to have moved on not one iota.

I still find talking to peers and colleagues that spreadsheets are almost universally used; the vendors have still not yet delivered the improved tools that they were promising 2-3 years ago. I still hear ECC derided on a regular basis; I hear people expressing disappointment that NetApp’s stewardship of SanScreen has not delivered and IBM’s TPC is still lacking in features.

It does seem to be pretty much beyond the storage industry to deliver a tool to manage storage in an heterogeneous and scalable fashion. It’s obviously just not sexy and we should all move to integrated stacks or perhaps the Cloud; oh well, back to Excel!

Handbags At Dawn….

Every now and then, it is good to see Storagezilla let fly as he does here and here. Yes, I know I normally rail against such behaviour but it’s always good to see Storagezilla frothing at the mouth in full rabid attack dog mode.

I find myself in the strange position in agreeing with both Zilla and Alex tho’…

Alex is right that the market has changed and some not-so-new requirements have bubbled up the agenda and are becoming more than a niche. And NetApp needed to do something to address these customer requirements.

Storagezilla is right that NetApp are basically finding themselves dangling from their own petard and could do with a dose of mea cupla.

The Cult of WAFL really meant that NetApp could not develop themselves out of the situation they had got themselves into; the only way to jump-start their presence into this ‘new’ market was to buy themselves into it.

Much as we’d all like this storage thing to be unified and simple (well, some of us would anyway); it’s not and I for one welcome NetApp to reality.

But it does give NetApp a small market positioning issue; you can’t well point at EMC and say that they obviously don’t believe in Unified Storage because they have different products to meet different requirements…EMC have acknowledged in a few places that they got it wrong with regards with Unified Storage perhaps, just perhaps, NetApp can say the same thing.

Of course, EMC never, ever told us that Celerra with MPFS was a better solution to scale-out-storage than Isilon. They certainly don’t now.

I also sometimes wonder why EMC never released Infiniflex as a stand-alone product; now that’d be a very interesting product in the Big Data world. Sometimes the value is lack of frills; not every product needs frills beyond excellent engineering.

But I’m not sure EMC would want to add another storage product to their portfolio….well, not this week.

 

 

V-Convergence?

I studiously avoided blogging on the EMC World announcements last week, I wasn’t there and there was enough verbiage from those who were attending. Chad appeared to have a never ending stream of blog entries, I suspected all of which prepared in advance covering the various announcements. But it was this one about Project Lightning which especially caught the eye especially

Then, we vMotioned a bandwidth-constrained workload to a vSphere cluster which was running co-resident on the same hardware running Isilon, increasing the amount of bandwidth dramatically. Yes, this idea (vSphere running on the arrays) does indeed exist within the walls of EMC, as does vSphere running co-resident on VMAX hardware.   If you think about a big Isilon cluster, with 100+ nodes of Intel x86 based power, or a future generation VMAX with 16+ similarly Intel-powered storage engines, it makes all the sense in the world – particularly for workloads where bandwidth and the parameters of the dataset make it easier to move the compute closer to the data rather than the other way around.

So EMC have vSphere running on Isilon nodes and VMAX; I suspect we could also add VNX to that list as well. And I completely agree with Chad that it does make all the sense in the world to do so but is it a good idea?

Now the techie in me says yes; IBM have had the unused and unexplored capability to run AIX/Linux workloads in the DS8k for some time and it has always bugged me that they have never leveraged this. There are simply some workloads that you might want to run as close to the storage as possible.

But, there is another part of me which says no! This is not the techie but more the person who cares about the complex eco-system that has built up around VMware; VMware has thrived because of the support of various other companies, these companies have also grown with VMware’s support, companies such as NetApp have been part of VMware’s journey to dominance in the server virtualisation marketplace.

The server companies have also embraced VMware and as VMware was not competing at a hardware level; it was allowed to become the de-facto standard.

If EMC utilise VMware to give them a serious competitive advantage in their storage platforms and give them a unique capability; do we risk a splintering of the server virtualisation marketplace? Vendors such as IBM, HP and Dell might well look at producing there own hypervisors to allow workloads to run in their arrays; as I say IBM already have the capability but this is for Power-based workloads but could IBM leverage their technical expertise in hypervisor technologies to build an x86-based hypervisor to allow them to run workloads on the v7000, SONAS, SVC and even XIV?

And of course, if EMC start to muscle in on the server market as well; where does that leave VCE? Will EMC even need Cisco at that point, I’ve said it to people before that Cisco need EMC more than EMC needing Cisco.

As I say, technically a good idea…but I would suggest that the jury is out on whether it is a great idea for EMC/VMware long term?