25 Jun 2012 09:24
[patch 00/10 v3] raid5: improve write performance for fast storage
Shaohua Li <shli <at> kernel.org>
2012-06-25 07:24:47 GMT
2012-06-25 07:24:47 GMT
Hi, Like raid 1/10, raid5 uses one thread to handle stripe. In a fast storage, the thread becomes a bottleneck. raid5 can offload calculation like checksum to async threads. And if storge is fast, scheduling async work and running async work will introduce heavy lock contention of workqueue, which makes such optimization useless. And calculation isn't the only bottleneck. For example, in my test raid5 thread must handle > 450k requests per second. Just doing dispatch and completion will make raid5 thread incapable. The only chance to scale is using several threads to handle stripe. Simpliy using several threads doesn't work. conf->device_lock is a global lock which is heavily contended. patch 3-9 in the set are trying to address this problem. With them, when several threads are handling stripe, device_lock is still contended but takes much less cpu time and not the heavist locking any more. Even the 10th patch isn't accepted, the patch 3-9 look good to merge. I did stress test (block size range 1k - 64k with a small total size, so overlap/stripe sharing guaranteed) with the patches and looks fine except some issues fixed in the first two patches. That issues aren't related to the series, but I need them in stress test. With the locking issue solved (at least largely), switching stripe handling to multiple threads is trival. Threads are still created in advance (default thread number is disk number) and can be reconfigured by user. Automatically creating and reaping threads is great, but I'm worrying about numa binding. In a 3-disk raid5 setup, 2 extra threads can provide 130% throughput(Continue reading)
> <at> <at> -3984,6 +3985,51 <at> <at> static struct stripe_head *__get_priorit
> return sh;
> }
>
> +#define raid5_unplug_list(mdcb) (struct list_head *)(mdcb + 1)
I really don't like this sort of construct. It is much cleaner (I think) to
add to a structure by embedding it in a larger structure, then using
"container_of" to map from the inner to the outer structure. So I have
changed that.
> <at> <at> -4114,7 +4161,14 <at> <at> static void make_request(struct mddev *m
> if ((bi->bi_rw & REQ_SYNC) &&
> !test_and_set_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
> atomic_inc(&conf->preread_active_stripes);
> - release_stripe(sh);
RSS Feed