More on RAID performance

So, I wrote a script that would try various combinations of stripe_cache and read_ahead, using bonnie++ to measure the impacts on performance.

On  my test / lower spec server, I ran it once, and the results looked a bit funny.  I wrote a script that combined all the results, and put into a spreadsheet, normalising the throughput on each of the bonnie++ measures to 100%, and colour coding the results that were close to 100% as green, those a bit worse as orange, those worse still as red.  That gave me tables like this:

PutBlock 3 RA
1024 2048 4096 8192
Stripe 256 54.75% 50.62% 54.15% 54.52%
512 80.35% 72.37% 85.11% 48.63%
1024 78.55% 45.86% 89.34% 75.62%
2048 83.17% 74.32% 71.44% 98.80%
4096 82.90% 83.64% 81.80% 100.00%

And this:

Seeks 7 RA
1024 2048 4096 8192
Stripe 256 63.75% 33.93% 77.76% 43.98%
512 82.16% 95.40% 91.80% 97.70%
1024 38.86% 100.00% 36.39% 93.42%
2048 63.06% 59.55% 59.81% 97.58%
4096 72.24% 86.42% 89.12% 88.76%

As expected, they were different across different measures, some things benefited from stripe_cache, some were harmed.  So I consolidated into a min and average for each, trying to find the settings that didn’t impact any of the different metrics too much.  This gave me:

Minimum RA
1024 2048 4096 8192
Stripe 256 44.74% 19.12% 54.15% 20.17%
512 32.98% 34.34% 70.32% 48.63%
1024 30.46% 45.86% 21.39% 75.43%
2048 29.82% 19.35% 55.74% 48.43%
4096 60.16% 49.24% 77.41% 36.57%
Average RA
1024 2048 4096 8192
Stripe 256 55.06% 38.76% 73.00% 52.03%
512 68.45% 74.56% 86.31% 81.00%
1024 59.09% 78.48% 59.00% 89.06%
2048 71.58% 60.79% 73.95% 77.59%
4096 76.35% 67.12% 86.07% 78.23%

Problem is that some of the measures just look funny, I was thinking that the 86.31% average, which had no measure worse than 70% was pretty good (RA of 4096, Stripe of 512).  Thing is, it makes no sense that the min was 70%.  So I ran another run, which gave this summary:

Minimum RA
1024 2048 4096 8192
Stripe 256 64.70% 18.02% 48.38% 14.47%
512 32.60% 46.00% 58.48% 56.90%
1024 74.53% 53.95% 38.91% 53.92%
2048 44.25% 35.40% 76.09% 27.19%
4096 74.51% 65.63% 72.29% 38.90%
Average RA
1024 2048 4096 8192
Stripe 256 80.04% 63.47% 70.18% 51.37%
512 67.23% 78.47% 89.11% 78.39%
1024 87.61% 80.83% 65.80% 80.20%
2048 77.47% 62.84% 86.13% 60.23%
4096 88.54% 82.93% 87.63% 75.14%

And it looked nothing like the first one.  So then I did a third run, and that gave me this:

Minimum RA
1024 2048 4096 8192
Stripe 256 62.23% 60.30% 39.87% 32.92%
512 84.43% 46.62% 65.63% 52.95%
1024 44.37% 27.95% 30.57% 67.63%
2048 70.47% 74.60% 66.64% 45.58%
4096 43.03% 51.92% 70.09% 49.50%
Average RA
1024 2048 4096 8192
Stripe 256 82.83% 73.95% 66.00% 70.29%
512 87.91% 72.64% 79.51% 74.65%
1024 80.74% 55.33% 59.99% 79.28%
2048 86.67% 81.20% 82.75% 73.36%
4096 72.40% 81.32% 78.53% 79.35%

Still again, no correlations.  Sigh.

My learning here is that something else is impacting my performance.  There are definitely some middle of the road settings that deliver benefit and don’t hurt any particular element of performance too much, but there’s no magic setting here.  Further to the point, the one that is perhaps best here is 4096/4096, but that consumes a lot of RAM.  Not sure it’s worth it, it’s not that much better than some of the other settings.  The scripts and spreadsheet are available if anyone wants them.

Next, I did the same process on my other server (my main server).  Interestingly, the throughput on this server is dramatically slower.  So all the tests were about 30% of the throughput of my nominally less good server, at all settings.  Hmm.  I then tried using “disk utility” directly against the drives and the raid arrays.  And that gave much better results – similar to the second server.  Hmm.

So, overall, my thought is that I need to look further afield for my performance issues.  As noted in the earlier post, I think my secondary server (where the main problem was VM throughput) I’ve tracked to RAM settings, not RAID.  And on my main server, I think the problem lies between the arrays and the file system – so I’ll be looking more closely at LVM and the ext4 settings (maybe stride and stripe are useful after all) when I get time.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s