Storage Spaces Direct Mirroring vs MRV (Parity) performance

HP Lefthand

Back several years ago (about 6 or 7 if i remember correctly) when Storage Spaces and Storage Spaces Direct (S2D) did not exist yet, there was a another vendor called “Lefthand” which did kind of the same trick. Lefthand was bought by HP and the product was renamed to P4000 and later on HP StorVirtual 4000. The principal of this type of storage is different to other vendors. These storage nodes use local raid level on several disks in a system and additional nodes with the exact same hardware are pooled into a cluster. On top of the cluster volumes with network raid level’s are created. So disks in a system (1 disk in case of a single local RAID 5 pool of disks) could die without losing the node. In this setup you have several layers of redundancy on a storage node, but also on the entire cluster.

You could start with 2 systems and create a mirrored volume. The raw capacity of 1 node minus the raid level was the total usable capacity. So take a two node setup with 12x 1TB disks you have give or take 21TB of usable capacity (12 disks in a RAID 5 = 12TB minus 1TB and minus some lost bits and bytes so give or take 10,5TB usable). In a two node setup you will have 10,5 TB of usable space with mirrored volumes because the data is mirrored across both nodes. Mirroring data like that brings you high availability on storage on a node level. So you could loss a storage node without the volume going offline but it will cost you half of the raw storage capacity. If you add an extra node to make a total of 3 nodes, you would have 10,5 * 3 = 31,5TB of raw capacity. Taking the mirroring in consideration you will have about 15,7TB of usable capacity. And this keeps going in a 10 node setup you have 105TB of raw capacity and about 50+TB of usable capacity. So all the time you will loss halve of the capacity in a Network Mirror volume. If you chose for 3-Way or even a 4-Way mirror (don’t know why but it is possible) you have massive redundancy and performance but a terrible efficiency because in a 4-Way mirror on 4 nodes you only have 25% of the RAW capacity available for data.

Yes.. it’s seems like a waste of space, so Lefthand (and later on HP) came up with Network RAID 5. When you have three or more nodes you could setup a volume with Network RAID 5. Then the data is places on 2 nodes and the 3 node is doing parity. You could still lose a disk in a node or an entire node…. BUT it was dreadfully slow and HP recommended AGAINST setting up Network RAID Level 5….
.
So I hear you thinking what is with all the “old stuff” on Lefthand… Well, Storage Spaces Direct is kind of the same principal and the same applies on Parity volumes… But bear with me on this 🙂
.

Mirror vs MRV Volumes.

With Storage Spaces the resiliency level is set on Volume level. That means that you can create Mirrored and Parity volumes and also a new flavor named Mixed Resiliency Volumes with both Parity and Mirrored space. With traditional hardware RAID, mirrored disks are always faster as Parity disks because of the parity process. The same applies for S2D. With Mirrored volumes all data is mirrored across an x amount of nodes and disks in the cluster. By default Storage Space Direct uses a 3-way mirror layout. All blocks written to disk are copied to 2 other nodes (in case of a 3 node or higher cluster). Because of this you default lose 2/3 of your raw capacity.

Microsoft S2D Program Manager Cosmos Darwin created a nice website to make some calculations on how much usable space you get with different combinations of disks, capacity and resiliency settings, check it out on http://aka.ms/s2dcalc

When you create a 1 TB 3-way mirrored volume that volume has a 3 TB footprint. That’s a simple calculation because 1 TB of data is copied 2 additional times in a 3 way-Mirror which makes 3 TB. When you create an MRV of 1 TB with for example 30% Mirrored capacity and 70% Parity capacity we have to do a bit more math. So 300 GB * 3 is 900 GB. Then we have 700 GB parity space that will require double the space for that a total of 1400 GB. The total footprint of a 1 TB MRV Disk is 900 GB + 1400 GB = 2300 GB. So with an MRV disk you save 700 GB of space on a 1 TB volume.

Because of the massive loss of capacity with 3-way mirror people (most of them are the people who are responsible for the budget) are forcing or highly recommending to use or consider Parity or a form of Mixed reciliency to get more GBs/TB’s out of there hardware.. But at what cost?

Test Hardware

During a recent project we got to the point where we needed to setup volumes but i wanted to test this to be sure the customer could make a good decision based on capacity, performance and costs instead of just capacity and costs. Below is the hardware i used to test it:

7x Nodes which were HP DL380 Gen9 servers;
  • All nodes have 256GB of memory
  • 2 processors with 8 core’s each (I know 8 might be a bit low but since datacenter licensing is changed lower cores can be cheaper in some cases)
  • Then for the disks drive we had 3x 960GB mixed I/O SSD and 6x 4TB of HDD’s (unfortunately no NVME drives)
  • 2x 10Gbit Mellanox network cards HP 546FLR-SFP+ with RDMA. (Beware RDMA is very important with S2D.) Drivers version: 5.25.12665.0 and Firmware version: 2.40.5016.
The 7 nodes are joined in a cluster and setup with storage spaces direct. The setup and configuration of S2D is not discussed in this blog, that’s for a later one 😉

VMFleet as workload

To simulate workload I used VMFleet. VMFleet is a powershell tool based on DiskSPD and a set of powershell scripts. You can get a copy of VMFleet here and how to setup here.

Basically you setup several volumes based on your desire, either three-way mirror, parity or MRV and let VMFleet deploy a amount of VM’s on it. Start the VM’s and let them connect to a volume over an internal network it creates. On the volume a test file is placed that the VMs all execute at the same time to generate load.

For my test i setup 4 VMs per CSV and every node has it’s “own” CSV. A total of 28 VMs are created. All the tests are runned with 40 Outstanding I/O, 2 Threads and running for 300 seconds.

Test Mirrored Volumes

 Then I kicked off all six tests on the 7 mirrored volumes with the parameters and results bellow.

Test Name Access Blocks IOPS MB/sec
100%-Read and 0 % Write 100% Read 4K 1.043.406 4273
70%-Read and 30% Write 70% Read 4K 576.155 2356
0%-Read and 100% Write 0% Read 4K 254.686 1043
100%-Read and 0 % Write 100% Read 64K 150.637 9854
70%-Read and 30% Write 70% Read 64K 52.470 3364
0%-Read and 100% Write 0% Read 64K 18.515 1209

Test MRV Volumes

All VMs were deleted and all volumes were deleted and I setup 7 new volumes, but this time with a mixed resiliency setup of 30% mirrored data and 70% parity data. The same tests were runned as you can see below but with completely different results.

Test Name Access Blocks IOPS MB/sec
100%-Read and 0 % Write 100% Read 4K 940.226 3849
70%-Read and 30% Write 70% Read 4K 159.698 653
0%-Read and 100% Write 0% Read 4K 52.283 212
100%-Read and 0 % Write 100% Read 64K 128.067 8391
70%-Read and 30% Write 70% Read 64K 19.278 1190
0%-Read and 100% Write 0% Read 64K 6.914 394

Conlusion

So when we compare the tests with the 2 different type of volumes we can see that the first tests in both scenario’s with 100% read are not that different, since it’s all coming out of the SSD disks which are used as cache. But as soon as we start writing data the performance is a lot lower on the MRV’s.

When we put it in numbers with precentages:

IOPS Mirror MRV mirror in % faster then MRV
100%-Read and 0 % Write 4K 1.043.406 940.226 10.97%
70%-Read and 30% Write 4K 576.155 159.698 260.78%
0%-Read and 100% Write 4K 254.686 52.283 387.13%
100%-Read and 0 % Write 64K 150.637 128.067 17.62%
70%-Read and 30% Write 64K 52.470 19.278 172.18%
0%-Read and 100% Write 64K 18.515 6.914 167.79%
MBs Mirror MRV mirror in % faster then MRV
100%-Read and 0 % Write 4K 4273 3849 11.02%
70%-Read and 30% Write 4K 2356 653 260.80%
0%-Read and 100% Write 4K 1043 212 391.98%
100%-Read and 0 % Write 64K 9854 8391 17.44%
70%-Read and 30% Write 64K 3364 1190 182.69%
0%-Read and 100% Write 64K 1209 394 206.85%

So considering these value’s and results. Maybe it’s better to add a couple of ‘cheap’ SATA HDD’s to the setup to have a good level of resiliency, performance and capacity in your cluster before you start using Parity or Mixed Resiliency volumes on your cluster for the cheap win, but with the price of a huge dip in performance… Or make a couple of Parity or MRV’s for some static low I/O / Bandwidth VMs. Your gain in capacity is little but you also lose as little performance as possible. And if you really want more performance go all SSD or all NVMe. But that is way more expensive 🙂 and this Mirrored vs. MRV volume performance penalty still applies.

So, were at the end of this blog, a big thank you to my colleague Darryl for his time on all our discussions during our projects with Storage Space Direct and reviewing this blog.

If you have any questions or feedback, please feel free to make your comments or contact me on twitter

Pascal

2 thoughts on “Storage Spaces Direct Mirroring vs MRV (Parity) performance”

  1. One needs to measure cache hits/misses and publish those to help folks understand just how much hot data is spilling out of the cache in the MRV/MAP setup especially.

    Cache sizing could make a big difference on those numbers depending on the cache hit count.

    Reply
  2. Hi Philip,

    Good comment! Yes, you are right, it could be a good option when you have a small set of hot data. Much is learned since I wrote this blog. I need to update my blog it’s been static to long 🙂

    Reply

Leave a Comment