I've been thinking a bit about benchmarking and benchmarketing; pretty everyone agrees that SPC is a very poor representation of real world storage performance but at the moment, it's the only thing that most of the market supports with one key exception. So I thought I'd come up with my own, so let me introduce the SSAC.
Storagebod's Storage Assault Course
This is a multi-part benchmark and is supposed to reflect the real world life of an storage array. You may bid what you like but all costs including the costs of team who support and run the benchmark must be declared.
1) You must specify an array to run the SPC benchmarking suite; this array must be ordered through the normal ordering process.
2) A short period of time prior to the delivery of the array; you will be informed that the workload for the array has been changed to another random work. However, the delivery date agreed must still be met; so any changes to the configuration must be made without impact to the delivery.
3) You must carry out an audit of the existing SAN environment and ensure that the installation of your array will not cause any impact to the already running environment. In the course of the audit, you will discover that pretty much the whole of the environment is down-level and not currently certified against your new array. You must agree any outage required to upgrade the new array; this may involve you auditing every single server and switch to ensure that key variables such as time-outs on multipathing are set correctly.
The day before installation, you will discover a dozen servers which you were not aware of and at least three of these will be running operating systems which are so far out of support that no-one is sure what is going to happen.
You will be responsible for raising changes, carrying out risk assessments and arranging site surveys, traffic plans and any other supporting tasks.
4) On day of delivery, you will discover that the array is to be installed in another part of the data centre which neither has power and the array will probably fall through the floor crushing the secretarial team below.
You will be responsible for arranging the remedial works and rescheduling all the changes required.
5) Once the array is installed, you will be responsible for powering it up and installing the initial configuration.
6) A Major Production Incident is declared and you will be responsible for convincing everyone that it is not your new array which caused the problem.
7) You are finally allowed to run your first benchmark.
8) You discover that all of the SAN switches have been set to run at the wrong speed and you need to raise changes to correct this.
9) You are allowed to re-run your benchmark again but halfway through your benchmark a Performance Major Production Incident is declared and you are responsible for first proving that it is not your array which caused the problem and then fixing the Incident which is nothing to do with the work you are carrying out.
10) You manage to successfully run your benchmark.
11) You receive an urgent change request and you must find space on your array for a new workload; this workload has completely different performance requirements but you will told to JFDI.
12) You run your second benchmark.
13) You receive an urgent call from the original application team; apparently they mis-stated their requirement and they are servicing twice as many requests as originally thought. You must expand the environment as an urgent priority.
14) Through the normal ordering channels, you must arrange any necessary expansion to the current environment.
15) You must carry out any necessary reconfiguration without impact to the running workloads or agree any downtime with the business to allow you to carry out the necessary reconfiguration.
16) You will recieve an emergency technical notice from your vendor support teams; a fix must be implemented immediately or the array will catch fire (possibly).
17) You must re-audit the environment and carry out any remedial work on servers, switches and applications to support the new code level.
18) You will be informed that the system must now have a DR capability due to government regulation.
19) Carry out any design and certification work to provide DR capability. The order for the DR capability will be held up in procurement but you still need to implement within the government timescales or face severe financial penalties.
20) Re-benchmark the entire environment showing the impact of replication on the environment.
Explain to a non-technical person that the speed of light differs in fibre to that of that in vacuum. Explain in a louder voice that no, the speed of light is not something which can be overcome and no, rival company X has not overcome despite what they might say.
21) A third workload will be detailed; this must be shoe-horned onto the array at no extra cost but with no impact to the additional applications.
22) Benchmark again to prove this case.
23) You are summoned to a meeting to explain and justify your array to a Senior Manager who has just been bought a three star lunch by a rival vendor.
24) You will experience a failure on your array; record how many logs and diagnostics that the vendor support team (and no cheating, you must use the customer route) ask you before agreeing to replace the failed disk.
25) You will be informed that Group Audit have looked at the application that you provided DR for and decided that actually it was exempt, hence you have wasted time and money.
26) Plan to repurpose DR array for a new workload requirement.
27) A new capability is announced and will require a major firmware upgrade that may require downtime but your support team are not sure. Carry out the risk analysis and plan for upgrades.
28) You are outsourced!
At this stage please disclose:
1) Total Staff costs including time-off for stress related illness
2) Total Capital Expenditure
3) Total Downtime due to upgrades and other non-disruptive activities
4) Any other expenditure due to work required to existing SAN environment and Server infrastructure
And those performance benchmarks, throw them away; you didn't think I was actually interested in them!?
You forgot to mention the “We over-estimated our capacity requirements therefore you sold us too much storage. We want a credit but you can’t have the storage back”.
I’ve never done that…you should have qualified my requirements better!
Awesomeness. You should work for a vendor! heheh
No vendor would have me…I’m far too much trouble!
Nothing beats a real-world benchmark π
J
Mate, that is hilarious, and so on the mark. I am preparing for a customer preso at the moment – I think I might lead with your blog post then exit stage right π
very nice. happened to me once, vendor sold us our nice 8gb fc switches, and put in 1 gb sfp’s, probably happens more often than people think