A few people have asked me about the project I'm working on at the moment; now I'm going to have be careful precisely what I say because I suspect much of the detail is still covered under various confidentiality agreements but I think it is possible to talk about it in very general terms with going into details of technology.
Firstly, most of people are probably aware that I work for a large broadcaster so what I'm discussing may not be entirely relevant to most of you but you it is possible that you might find some of the concepts trigger some ideas/thoughts in what you do, especially around archiving and infrequently changing files.
Broadcast is an industry which has undergone and is still undergoing rapid change as the consumption model for media changes; people want to consume media on pretty much any device which has a screen and at of a time of their choosing, this brings many infrastructure challenges but one of the largest ones for many broadcasters is that all of our video still sits on video tape and all of our workflows involve physically moving tapes around. And we have a lot of tapes!
In the new multi-format world this is not a fantastic place to be and also it means that it isn't as easy as it might be to sweat the asset and of course, we have a massive tail as well which if it were more easily accessible, we would be able to do more with. So the time has come to move to a completely digital workflow or should I say workflows.
The heart of the system is called Tapeless which is a bit of a misnomer as you will see. The plan is remove all of our video tapes and replace them with digital media but also ensure that the resulting digital archive is searchable, scalable and online i.e all of the media handling is automated, no more humans running round the place with tapes.
The archive will grow 'forever' which brings some interesting challenges but we have no intention of expiring stuff from the archive; feasibly stuff could be deleted from the archive but in general; once it goes in, it stays in. Technical refresh and migration considerations have been considered pretty much from the get-go.
So if we look at a life of an asset, it's really quite simple.
The asset is ingested into the system from a variety of sources; from video-tape to a multitude of digital sources.
The asset is stored on high-speed disk, this disk is basically a working cache; most content will only be worked on in the first few days of it's life. Content will be deleted from the cache once various thresholds are reached. High-speed disk for us is highly optimised for sequential workloads; random I/O is not that much of an issue for us but throughput is.
The asset is simultaneously stored on tape; as I said, tapeless is a misnomer.
The asset is also stored in a browse format on disk; this is the equivalent to thumb-nails of photos. Browse copies are never deleted from disk and this area will also grow forever. This will be viewable from a standard desktop and most rough edits will be able to be done from a standard desktop before and if they are transfered to a craft edit suite for refinement.
Metadata about the asset is stored in central application from basic information like it's filename to more detailed information about what the asset actually is. This application also manages how the asset moves about the whole digital environment.
It is important to note that assets go straight into the archive and straight onto tape; this copy is never changed, you do not 'edit' this copy. It is possible that an edited copy could also be stored in the archive but we always have a clean copy.
And also all media assets on disk are accessible from any of hosts in the cluster, this is done via a clustered file-system.
So there you go, it's all rather simple. We have minimal backups to worry about, we have predictable workloads or at least easily calculable workloads i.e we know how much a bandwidth a videostream will use and our only real variable is the users and the workflows they use. This is probably our biggest challenge as the various teams will work in different ways and put different strains on the system; we expect sports content to be handled in a very different way to drama for example.
People do ask why the archive goes to tape as opposed to a massive disk farm but putting it on disk will not give us a huge amount of benefit for a massive increase in costs and tape is really, rather good at sequential workloads dealing with large files.
People also ask why a clustered file-system as opposed to NAS; application requirements is the easy answer and we are already moving into unknown territory for some of our application vendors without throwing an additional variable into the mix.
I suppose one of the things which people might consider in more general environments is consider archiving earlier than later and treating faster disk as a cache tier.
And I wish I could take credit for designing the system but other people can take the blame for that; my team just have to make the storage bit of it work.
Sounds like a *lot* of fun …
Out of idle curiosity — was there any discussion of an object-based approach that had metadata bound to the object?
Or was it a more traditional separate repository & datastore type of arrangement?
Thanks!
— Chuck
It’s all down to what the application supports; I have pointed EMC at the application vendors in the past. If you want to play in this space and you might not because it is niche; you need to get the application vendors on board.
But having a looser coupling between meta-data repository and data-store has some advantages for this sort of environment, especially one which has lifespan potentially measured in decades.
And yes it’s fun and dealing with media folks is very different at times to dealing with your normal corporate IT user.
Point taken on the “working w/application vendors”.
Any help or guidance on the ones to work with that are relevant for your industry would be most appreciated — we usually don’t have the deep vertical understanding we need to figure out who’s best to work with ..
Thanks!
— Chuck