So hopefully we all agree that EFDs have a place in the Storage Infrastructure of the future but we also have to ask ourselves what is this infrastructure is going to look like? If we look at some of the press releases and comments with regards to Fusion-IO, you would probably believe that the SAN was on the way out or actually shared storage in general would die.
Some of the figures are impressive; an un-named company believed that they were losing 15% of potential web-business potentially due to storage timeouts and the slowness of the response of the array.
That's a huge amount of business to be loosing due to the slowness of the array but I wonder how true that is; was that really due to the end-to-end slowness of the system? Was it due to non-optimised SQL? I've seen SQL queries tuned down from 300 accesses to half a dozen with a couple of hours work. Did they blame the storage because they were the one team who couldn't give a transparent view of their environment?
Often storage is a great diagnostic tool; just looking at the I/O profile can lead to interesting questions. If you see weird I/O ratios which step way out of the normal profile for an OLTP application it can be an indicator of sub-optimal code. But to do so, you need the tools which present the information in quick and easily digestable manner.
At the moment, it is all too easy to blame the storage because the management tools are not great and the estate becomes very opaque to the outside viewer. If we had the right tools, we could often become a crack team of dysfunctional diagnosticians like House and his team and people would come to us asking we know it's not a storage problem but perhaps you can help us identify what is going on in our infrastructure.
That'd be a great step forward, don't you think?
As is pretty well know in the performance tuning game, resolving a bottleneck here (points with finger) simply makes it appear over there (points further way).
The ultimate blame game — and I’ve given as well as received — is to point to the bit of the pipe *as far away as possible* from where the problem manifests itself, to where there are no metrics, or over-generic metrics that don’t let you assign them to this specific app, as you note. Over there (handwaves in general direction of storage in data centre) is here, if storage admin is where you’re at.
There are tools that try to pull all this together in some (il)logical fashion, but it’s been a bugbear since the first shared resource made an appearance.
I don’t think it’s going to get any better either; the vision of the “super mainframe” that Maritz (VMware) has fills me with dread. Same for cloud compute/storage/apps.
If you can’t measure it, how can you manage it?
On the other hand, I could put my capacity/performance hat back on and make a tidy living by pointing and handwaving. 🙂
The best tool at the moment seems to be Analytics in Sun’s Storage 7000 system. Best of all, it’s free.
It really pisses me off, when I have performance issues and I need to send historic performance data over to the vendor.
Makes it really difficult to solve problems without having a live view. Vendor’s seldom know the application well enough, all they see is some kind of I/O without knowing what the cause is.
@Alex – LOL at handwaving.
A quick point
Putting aside the question of toolsets for a moment, I often wonder at the disjointed way in which new applications are sometimes developed in organsiations: The developers are working out which problems to fix in code and which to fix in the infrastructure; the DBAs are wondering how on earth they’ll get the performance required; the security guys are wondering how to manage the shotgun blast of port requirements the application will probably need; the server guys what platform might be best etc. etc. until eventually we get down to the storage team who get asked for XXXGB please – yesterday by preference.
It’s really quite rare that anyone in the process has a really strong technical understanding of how performance flows through the system being built. When those folk are in place, you’ll usually see a far more robust system built. More importantly, it will probably be a system which is easily understood and ultimately, more easily fixed when things go wrong.
Solution architects who really understand application and physical infrastructure seem to be the way forward. In other words, generalists – people who can interact at every level of the development chain and translate requirements between them.
Okay, I agree with Brainy that the best analytics tool available is currently the tool provided with Sun’s 7000 series but it’s not really free! It comes bundled with the array, you have to purchase the array to get it.
And I’ve made it very clear that I like the Sun 7000 series but it needs some serious beefing up before it’s there. But it has potential.