I was chatting to one of the XIV sales-men at Storage Expo and he threw a figure at me which has been bugging me for a few days; he was claiming that an XIV box can do 60,000+ IOPs.
Now I don't have any science to go on but that seems awfully high for a box with 180 7.2k SATA drives in; lets be really generous and say that a 7.2k SATA can sustain 100 IOPs, that gives me 18,000 IOPs that the disk can handle. XIV's cache algorithms must be fantastic or the workloads they are testing must be the most cache friendly in the world.
I didn't get a chance to quiz him more than that i.e what type of workload, read/write mix etc, etc and the what the latency goes to.
But I did get to play/see the XIV GUI; if you get a chance, go and have a play! I think IBM got their money's worth for the GUI alone!! Not as good as a VR Network Management interface concept that I saw at BT's labs once but it's still pretty damn nice!
What they will do (they = vendors) is replay a real-world transaction log or somesuch. I’ll wager it isn’t an SPC-1 test. I trust they can drive it that fast as there will be quite a bit
of sequential read and you can pre-fetch that. Here is IBM doing “IOPS from disk read”:
http://www.redbooks.ibm.com/redbooks/pdfs/sg246363.pdf
It shows the higher end DS boxes delivering 86000 IO per
86000 IOPS over 224 drives works out to 383 IOPS per drive.
383 IOPS per 15K FC drive is pretty much “impossible”, again quite a bit of pre-fetch. Pre-fetch large chunks and read 4K/8K out of cache and your overall rate works out. You can google and see 270 IOPS for modern 15K FC drives, random read, totally saturated. The math would tell you that you get somewhere near 180 IOPS but everyone neglects to take into account TCQ. With SATA II , you have the equiv NCQ.
So the mystery is what is the ratio of sequential read in that XIV benchmark? Pretty high, I’d say.
The XIV can drive things as fast they can go, 15 QUADs will drive things a lot faster than the 2 or 4 quads that others have. They won’t be running out of CPU any time soon.
Any rules against immediate double posts?
Good.
Perhaps it is a datawarehouse full table scan. The XIV (guessing) would read the 1 MB blob into cache so those 64K DW IOs would have 16 in a row satisfied per XIV read. Maybe. But as you do the math and get really high IO rates per 7.2K SATA, Some sort of mostly sequential IO – is the only way to hit that 60000 IOPS (other than a cache cheat of re-reading the same blocks, don’t believe that’s the case).
Rob, I beg to differ re: “383 IOPS per 15K FC drive is pretty much “impossible””
I’ve seen some recent drives easily sustain over 400 IOPs and even up to 500 IOPs. This all depends on I/O re-ordering, elevator seek etc etc – these very clever algorithms combined with the smarts in the backend of the DS8000 (which the other half of our team works on) means that you can seriously reduce seek time – especially when a drive is very busy.
As for XIV, not sequential, not table scan, a real OLTP type simulation… I’m still collecting data myself how its done 🙂
Rob, true I keep forgetting about TCQ and NCQ. But I still think XIV are doing something pretty clever to get 60k+ IOPS out of 180 SATA spindles; it seems an awful lot and would be interesting to see how they are getting it.
I look forward to Barry posting an exposition on his blog!
“I’ve seen some recent drives easily sustain over 400 IOPs and even up to 500 IOPs.”
500 random read IOPS?
That’s without a head seek,
you’re on the same track
there, small random sample I’d say.
I’ve seen similar numbers claimed. A vendor -not big 4- had us running random read IOMeter to test their arrays, I had to chuckle. We were testing a 20 million block range on 100 Gig LUNs carved out of 146 GByte disks. The heads had to move a few tracks at most.
“This all depends on I/O re-ordering, elevator seek etc etc”
It’s all about TCQ. IOs are re-ordered, the closer one is read with starvation taken into account. Of course it is a lot fancier than that and I’m sure the drive vendors jealously guard their algorithms.
I’ve tested a number of 15K disks, I get a few more IOPS with queue depth of 32 random read IO versus 16 or so. At 32 pending IO, I’ve seen the 270 IOPS that others mention. I also see 380 IOPS mentioned and then read they have write caching turned on at the drive level and are doing a mixed IO stream.
Without knowing exactly what you are testing, how it is set up, model of drives, this is a lot like waving at each other in passing. And oh, if you are hitting those numbers, I’d also be very interested in transfer times.