Storagebod Rotating Header Image

Cluster Fscked

For another project, I had reason to try and find out the limits of the OnTap 8 Cluster-Mode; apart from the fact that NetApp need taking out and shooting with regards to their documentation, I’ve ended up reading more about Cluster-Mode than was probably entirely necessary for this project but it has been fascinating and some subsequent conversations with people have left with a very distinct impression that few people really understand what Cluster-Mode does. I made assumptions about what it was based on biases and prejudices from using other products.

For some reason, I equated OnTap 8 Cluster-Mode with SONAS and Isilon for example; this was a mistake, NetApp have as per usual taken their own approach and have produced something quite different. It is probably useful to understand some terminology.

Global Namespace is a term which gets thrown about a lot and is going to probably cause confusion when comparing products.

A File System is a method of storing and organising computer files and their data.

A Global NameSpace provides an aggregated view of an Enterprise’s File Systems allowing files to be accessed without knowing where they physically live; it is a virtual directory structure.

NetApp’s Cluster-Mode is really the aggregation of a number of NetApp Filers and volumes into a single virtual server, providing a Global Namespace for this aggregation, meaning that instead of having to know which filer a share lives on, it is possible to move from mounts which look like:

\\server1\share1
\\server2\share2

to mounts which look more like

\\company\share1
\\company\share2

This is obviously a lot easier for the end-users and means migrations etc do not require reconfiguration of end-user machines. This is a good thing!

OnTap 8 Cluster-Mode does allow you to aggregate performance of up to 24 filers; all of them can see all of the data and serve it out. But as far as I understand and I will be happy to be corrected, you are still limited to file-system sizes of 100 Tb i.e the maximum size of a 64-bit aggregate and also as far as I can tell, you are limited to having the file-system owned by a single HA pair. Data is not striped across all the nodes and the loss of an HA pair will result in the file-systems being hosted by that pair going off-line.

Isilon, IBM and HP in their products allow file-system sizes far in excess of this and measure their sizes in petabytes; this is because at their back-end, there is a true cluster file-system. This enables an individual directory for example to be considerably larger than 100 Tb or even individual files to be larger than this. Some of you are probably now shaking your heads and wondering what sort of files could possibly be that large; I will admit that 100 Tb would a very large file but files in excess of 1 Tb are not that uncommon in my world and a 100 Tb file-system could only have 100 of these in, not that much really.

A single file-system can also be made up of pools of storage which performs in different ways and can be fairly easily be automatically tiered; you can have files in the same directory which actually live on differing tiers of storage. Your home directory for example, those files which you are currently working on could be sitting on active disk whereas the files which are gathering dust could live on spun-down SATA.

Isilon, IBM and HP don’t have to implement a Global Namespace because their file systems are large enough and are distributed in such a way that a single file system could provide all file space required by an Enterprise.

Now NetApp’s approach does have some huge advantages when building Secure Multi-Tenancy for example and it allows for a very granular approach for replication etc. Also NetApp don’t have to deal with huge amounts of meta-data and their file-locking is probably easier but it is different.

There  is certainly a take-away from my research….

Global Namespace != Single Huge Filesystem

Now perhaps you already knew this but I suspect many were under the same delusion as me! And does this mean that NetApp don’t have a true Scale-Out Solution? I can certainly make arguments for both views.

Bring on OnTap 9! I reckon that’s the release which will combine it all including Striped Volumes from 7G.

p.s If I am completely wrong; it’s NetApp’s fault, the documentation and description of Cluster-mode is rubbish! Truly terrible! And yes, I could get someone from NetApp to come and talk to me about it but if I can’t get it from the documentation…you fscked up!


21 Comments

  1. Blue Arc’s “Global Namespace” is similar. You’re still limited to 256TB per FS, IIRC, and you have to address it like \\company\fs1 and \\company\fs2

    I always thought “global namespace” = “one giant filesystem” until I had to read up on the Blue Arc implementation. You can see tutorial videos on YouTube for more details.

    I much prefer GPFS or Isilon.

  2. Storagezilla says:

    Any idea how many people are using this warmed over version of DFS in production?

    1. VMTyler says:

      @Storagezilla

      Not many. They were offering full HW swaps w/ migration for GX customers to get them on 8 Cluster mode.

  3. Storagesavvy says:

    Based on what I’ve read, I have the same impression as you that NetApp Cluster-mode does not stripe across filers in the cluster. The workaround from what I’ve heard, though I haven’t seen it to verify, is to replicate the filesystem between all of the nodes, which obviously uses more storage than a striped filesystem like Isilon presents. It would also seem that the filers that own the filesystem would have higher load than any other filesystem so load balancing would not be completely even. But, just as you indicated, the documentation is not entirely clear in all of these details and the marketing collateral is even worse.

  4. Martin, here’s an offer.

    I’m going to DM tweet you with my phone number, you call, and we agree to a time & place when I can get you a full explanation of cluster mode from the finest brains in NetApp.

    And I’ve taken the doc issue on board.

    Deal?

    1. glad to see you are back blogging Alex. I didn’t get a chance to sit down with you at Insight this year but maybe this coming one we can share a pint again.

      -Rich

  5. Gerald Coon says:

    ^NetApp employee

    I also wanted to add for anyone that sees room for our improvement in documentation. NetApp Information Engineering puts in all the documentation a public E-mail address of doccomments@netapp.com. They are always glad to have feedback on how the documentation can be improved.

    1. Gerald – As someone who has been on both sides of NetApp’s door, I understand what a challenge it is to provide good documentation in this industry.

      That said, if I’m not doing a good job, it’s not the customer’s responsibility to figure out how I should improve.

      NetApp needs to hire tech writers who write in plain English, not tech-marketing lingo. Conveying information should come before marketing agendas.

  6. Neil OKeefe says:

    Scale Out NAS is a hot technology at the moment, so the buyer does need to beware and understand clearly how the approaches differ in order to help make informed choices about what is best for the probem they are trying to solve.

    There are only two technologies which implement standard NAS capabilities (i.e. CIFS and NFS access) in a form which is straightforward to consume (i.e its a product not a consulting exercise) and has good scalability – Isilon and IBRIX (HP’s X9000 product).

    SONAS like (BlueArc and GX) also aggregates a number of smaller filesystems (2 PB) together to get to their claimed 14PB. Whilst the GPFS limit for a filesystem is much higher, you need a custom SONAS engagement to get around the 2PB filesystem limit.

    I was interested to spot the Huawei-Symantec OceanSpace system appear recently which also purports to be a NAS system which scales to 15PB. It might, but with filesystems which are 256TB in size.

    IBRIX and Isilon build systems which bolt together to scale to in excess of 10PB, and hundreds if not thousands of file serving nodes. Their core filesystem architectures are more modern and were designed specifically to scale.

    As datasets in a number of industries become much bigger limitations on physical filesystem size become inhibitors to managing that data on a filer which is not designed from the outset to scale. In genetic sequencing research for example multi PB datasets are starting to become commonplace.

    1. Richard Swain says:

      I am not sure where 2PB came from but I think you need to look again.

      Straight from the documentation:

      Max. size of a single shared file system (GPFS™) 524288 Yobibytes (2^99 Bytes)

      http://publib.boulder.ibm.com/infocenter/sonasic/sonas1ic/index.jsp

      Attribute Limit
      Max. number of interface nodes 30
      Max. number of storage nodes 60
      Max. capacity 134217728 Yobibytes (2^107 Bytes)‏
      Max. size of a single shared file system (GPFS™) 524288 Yobibytes (2^99 Bytes)
      Max. number of file systems within one cluster 256
      Max. size of a single file 16 Exibytes (2^64 Bytes)
      Max. number of files per file system 2.8 quadrillion (2^48)
      Max. number of snapshots per file system (GPFS) 256
      Max. number of subdirectories per directory 2^16 (65536)
      Max. number of exports that can be created per service 1000
      Max. number of groups a user can be part of 1000

  7. Martin Glassborow says:

    Neil,
    I’ve been discussing the SONAS limit with IBM and they deny that this is the case; I’m absolutely fascinated because this is not the first time I’ve heard this and I wonder if you can point to a spot in the SONAS documentation?

    Martin

    1. Neil OKeefe says:

      It’s in the IBM Red Book published in November 2010 “IBM Scale Out Network Attached Storage Concepts”.

      (http://www.redbooks.ibm.com/redbooks/pdfs/sg247874.pdf)

      3.6.1 SONAS file system scalability and maximum sizes.

      “The SONAS maximum file system size (in TB) for standard support is 2 PB. Larger PB file systems are possible by submitting a request to IBM for support.”

      As I was very careful to point out in my response, neither HP (I work for HP – should have noted that in my first reply – apologies) or EMC restrict filesystem size in practice, whilst quoting higher numbers in their marketing materials. I don’t pretend to understand why IBM put such a restriction in place – I’m sure there are good support reasons for doing so.

      My point being that true Scale Out NAS where you get capabilities you’re used to in the mid range, but which seamlessly grows in capacity and performance is actually very hard to do. NetApp bought Spinnaker in 2004, and its taken many years to get to GX and then OnTap 8.

      There is a lot of hype and aspirational marketing around at the moment. Behind the marketing speak there are some technologies designed to scale, and some older technologies being retrofitted. The market will work out which is most viable over time, but currently buyer beware, and make sure you understand the architecture!

      1. Martin Glassborow says:

        Where you see custom, I see RPQ. I’ve been through enough RPQs on enough different systems to know that it’s not that much of a red-flag.

        To be honest, anyone buying a 2 Pb NAS environment is not going to be doing so without some consultancy; I’d expect that to be pretty much standard. 2 Pb of storage is not an off-the-shelf deployment….yet! I suspect pretty much all of you would want to sit down with a customer who is buying a 2 Pb+ NAS solution and qualify the solution?

        And I see nowhere which implies that it is not a global namespace and is stitching file-systems together. I think you were reaching…

        IBM, HP and EMC all have true Scale-Out NAS solutions.

      2. Richard Swain says:

        The 2 PB was a support statement carried over from older GPFS development. The create file system gui/CLI doesn’t stop you from creating a larger one.

        If you think about it, it kinda makes sense too. You may not want all of your data in a 8-14 PB file system. breaking it up into smaller chunks makes it a little more manageable and remember the benefits you can get using the ILM movement of data from faster high $/TB disk to a lower $/TB disk.

        In the grand scheme it may not be that big of a deal.

      3. Neil and Martin,
        “Global Name Space” could be either a big huge file system OR a mount point aggregator of multiple file systems.

        SONAS can be one huge file system, or can be divided into smaller file systems. In the case of smaller file systems, SONAS does not aggregate these into a bigger file system. These are separate file systems that can be exported separately, adminstered separately, and backed up separately. In theory, these separate file systems could be “mount aggregated” externally, such as with F5 Acopia product, but SONAS does not aggregate multiple file systems. Separate file systems are intended for multi-tenancy environments such as cloud service providers.

        SONAS 1.1 with TSM 6.1 had selected 2PB limit to simplify backup. Backing up a file system bigger than 2PB requires dividing it up into multiple backup domains, and coordinating recovery from multiple TSM servers. Anyone who wants to have bigger than 2PB file systems merely submits RPQ, and we help them set up their backup environment.

        SONAS 1.2 with TSM 6.2 lifted this to 8PB. Again, if you want more than 8PB, you need to submit RPQ. With current RAID and disk size options, the biggest usable file system space can be up to 11.5 PB in size.

        While SONAS also supports snapshot, synchronous and asynchronous mirroring, we realize that this is not always a complete solution. The recent examples of Google’s recovery of lost Gmail data, and EMC’s failure at the State of Virginia, both of which required tape recover after disk-to-disk replcations were unable to provide full recovery.

        I don’t know whether HP Ibrix or EMC Isilon have scalable backup solutions like IBM SONAS does. Neil? Care to enlighten us?

  8. Martin Glassborow says:

    BTW Neil, it is considered common courtesy on most of the storage blogs to disclose your vendor affiliation…so in case anyone is interested, Neil works as the EMEA Sales Manager for HP Scale-Out-Storage.

    1. Neil OKeefe says:

      Noted – and apology in my most recent post.

  9. […] since Isilon and SONAS have interpretations that are different from NetApp. You may want to read Martin Glasborrow’s (Storagebod) post which talks about this.)  The data sheet for Clustered Mode Data Ontap is available here. […]

  10. Darren says:

    We have GX, 8C and Isilon in our datacentre.

    If I can help clarify and of the docs/differences we’ve found then I’d be more than happy to – drop me a mail.

    1. dave says:

      Darren,
      Im interested in understanding the diff’s. Drop me a note at david.j.berkes@citi.com

  11. Bart Dozier says:

    Intelligent File Virtualization is a viable alternative to the “one box solution” for a Global Namespace. F5 ARX offers this (used to be Acopia), I encourage you to look into it,
    It provides a layer in the network that decouples the logical acces to files from their physical location. Behind the F5 ARX are filers accessed via CIFS or NFS. File Virtualization gives the freedom to choose the file storage that best meets requirements over time, avoiding vendor lock-in.
    It goes way beyond a federation of filers; it allows to stripe acrross a number of filers and to define placement rules based on file name, size, age.

    Note: I work for a F5 partner. No, marketing is not their strong suit.
    Bart Dozier

Leave a Reply

Your email address will not be published. Required fields are marked *