Shaohua Li | 12 Mar 04:04 2012

[patch 0/7] Add TRIM support for raid linear/0/1/10

The patches add TRIM support for raid linear/0/1/10. I'll add TRIM support for
raid 4/5/6 later. The implementation is pretty straightforward and
self-explained.

Thanks,
Shaohua
Shaohua Li | 12 Mar 04:04 2012

[patch 2/7] md: linear supports TRIM

This makes md linear support TRIM.

Signed-off-by: Shaohua Li <shli <at> fusionio.com>
---
 drivers/md/linear.c |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Index: linux/drivers/md/linear.c
===================================================================
--- linux.orig/drivers/md/linear.c	2012-03-09 16:56:41.173790011 +0800
+++ linux/drivers/md/linear.c	2012-03-12 10:15:44.916611071 +0800
 <at>  <at>  -129,6 +129,7  <at>  <at>  static struct linear_conf *linear_conf(s
 	struct linear_conf *conf;
 	struct md_rdev *rdev;
 	int i, cnt;
+	bool discard_supported = false;

 	conf = kzalloc (sizeof (*conf) + raid_disks*sizeof(struct dev_info),
 			GFP_KERNEL);
 <at>  <at>  -171,6 +172,8  <at>  <at>  static struct linear_conf *linear_conf(s
 		conf->array_sectors += rdev->sectors;
 		cnt++;

+		if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
+			discard_supported = true;
 	}
 	if (cnt != raid_disks) {
 		printk(KERN_ERR "md/linear:%s: not enough drives present. Aborting!\n",
 <at>  <at>  -178,6 +181,11  <at>  <at>  static struct linear_conf *linear_conf(s
 		goto out;
(Continue reading)

Shaohua Li | 12 Mar 04:04 2012

[patch 1/7] block: makes bio_split support bio without data

discard bio hasn't data attached. We hit a BUG_ON with such bio. This makes
bio_split works for such bio.

Signed-off-by: Shaohua Li <shli <at> fusionio.com>
---
 fs/bio.c |   22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

Index: linux/fs/bio.c
===================================================================
--- linux.orig/fs/bio.c	2012-03-09 16:56:41.203790008 +0800
+++ linux/fs/bio.c	2012-03-12 10:10:40.696612399 +0800
 <at>  <at>  -1492,7 +1492,7  <at>  <at>  struct bio_pair *bio_split(struct bio *b
 	trace_block_split(bdev_get_queue(bi->bi_bdev), bi,
 				bi->bi_sector + first_sectors);

-	BUG_ON(bi->bi_vcnt != 1);
+	BUG_ON(bi->bi_vcnt != 1 && bi->bi_vcnt != 0);
 	BUG_ON(bi->bi_idx != 0);
 	atomic_set(&bp->cnt, 3);
 	bp->error = 0;
 <at>  <at>  -1502,17 +1502,19  <at>  <at>  struct bio_pair *bio_split(struct bio *b
 	bp->bio2.bi_size -= first_sectors << 9;
 	bp->bio1.bi_size = first_sectors << 9;

-	bp->bv1 = bi->bi_io_vec[0];
-	bp->bv2 = bi->bi_io_vec[0];
-	bp->bv2.bv_offset += first_sectors << 9;
-	bp->bv2.bv_len -= first_sectors << 9;
-	bp->bv1.bv_len = first_sectors << 9;
(Continue reading)

Shaohua Li | 12 Mar 04:04 2012

[patch 4/7] md: raid 1 supports TRIM

This makes md raid 1 support TRIM.
If one disk supports discard and another not, or one has discard_zero_data and
another not, there could be inconsistent between data from such disks. But this
should not matter, discarded data is useless. This will add extra copy in rebuild
though.

Signed-off-by: Shaohua Li <shli <at> fusionio.com>
---
 drivers/md/raid1.c |   21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

Index: linux/drivers/md/raid1.c
===================================================================
--- linux.orig/drivers/md/raid1.c	2012-03-12 10:09:42.426612652 +0800
+++ linux/drivers/md/raid1.c	2012-03-12 10:16:50.276610783 +0800
 <at>  <at>  -673,7 +673,12  <at>  <at>  static void flush_pending_writes(struct
 		while (bio) { /* submit pending writes */
 			struct bio *next = bio->bi_next;
 			bio->bi_next = NULL;
-			generic_make_request(bio);
+			if (unlikely((bio->bi_rw & REQ_DISCARD) &&
+			    !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
+				/* Just ignore it */
+				bio_endio(bio, 0);
+			else
+				generic_make_request(bio);
 			bio = next;
 		}
 	} else
 <at>  <at>  -835,6 +840,7  <at>  <at>  static void make_request(struct mddev *m
(Continue reading)

Shaohua Li | 12 Mar 04:04 2012

[patch 5/7] md: raid 10 supports TRIM

This makes md raid 10 support TRIM.
If one disk supports discard and another not, or one has discard_zero_data and
another not, there could be inconsistent between data from such disks. But this
should not matter, discarded data is useless. This will add extra copy in rebuild
though.

Signed-off-by: Shaohua Li <shli <at> fusionio.com>
---
 drivers/md/raid10.c |   26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

Index: linux/drivers/md/raid10.c
===================================================================
--- linux.orig/drivers/md/raid10.c	2012-03-12 10:09:42.426612652 +0800
+++ linux/drivers/md/raid10.c	2012-03-12 10:19:05.046610195 +0800
 <at>  <at>  -800,7 +800,12  <at>  <at>  static void flush_pending_writes(struct
 		while (bio) { /* submit pending writes */
 			struct bio *next = bio->bi_next;
 			bio->bi_next = NULL;
-			generic_make_request(bio);
+			if (unlikely((bio->bi_rw & REQ_DISCARD) &&
+			    !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
+				/* Just ignore it */
+				bio_endio(bio, 0);
+			else
+				generic_make_request(bio);
 			bio = next;
 		}
 	} else
 <at>  <at>  -926,6 +931,7  <at>  <at>  static void make_request(struct mddev *m
(Continue reading)

Shaohua Li | 12 Mar 04:04 2012

[patch 7/7] blk: use correct sectors limitation for discard request

max_discard_sectors doesn't equal to max_sectors/max_hw_sectors. Without this,
discard request merge might be ignored.

Signed-off-by: Shaohua Li <shli <at> fusionio.com>
---
 block/blk-merge.c      |    9 +++++++--
 include/linux/blkdev.h |    5 +++++
 2 files changed, 12 insertions(+), 2 deletions(-)

Index: linux/block/blk-merge.c
===================================================================
--- linux.orig/block/blk-merge.c	2012-03-09 14:05:35.562062857 +0800
+++ linux/block/blk-merge.c	2012-03-09 14:07:55.432062246 +0800
 <at>  <at>  -228,13 +228,16  <at>  <at>  no_merge:
 int ll_back_merge_fn(struct request_queue *q, struct request *req,
 		     struct bio *bio)
 {
-	unsigned short max_sectors;
+	unsigned int max_sectors;

 	if (unlikely(req->cmd_type == REQ_TYPE_BLOCK_PC))
 		max_sectors = queue_max_hw_sectors(q);
 	else
 		max_sectors = queue_max_sectors(q);

+	if (unlikely(req->cmd_flags & REQ_DISCARD))
+		max_sectors = queue_max_discard_sectors(q);
+
 	if (blk_rq_sectors(req) + bio_sectors(bio) > max_sectors) {
 		req->cmd_flags |= REQ_NOMERGE;
(Continue reading)

Vivek Goyal | 13 Mar 17:00 2012
Picon

Re: [patch 7/7] blk: use correct sectors limitation for discard request

On Mon, Mar 12, 2012 at 11:04:19AM +0800, Shaohua Li wrote:
> max_discard_sectors doesn't equal to max_sectors/max_hw_sectors. Without this,
> discard request merge might be ignored.
> 
> Signed-off-by: Shaohua Li <shli <at> fusionio.com>
> ---
>  block/blk-merge.c      |    9 +++++++--
>  include/linux/blkdev.h |    5 +++++
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> Index: linux/block/blk-merge.c
> ===================================================================
> --- linux.orig/block/blk-merge.c	2012-03-09 14:05:35.562062857 +0800
> +++ linux/block/blk-merge.c	2012-03-09 14:07:55.432062246 +0800
>  <at>  <at>  -228,13 +228,16  <at>  <at>  no_merge:
>  int ll_back_merge_fn(struct request_queue *q, struct request *req,
>  		     struct bio *bio)
>  {
> -	unsigned short max_sectors;
> +	unsigned int max_sectors;
>  
>  	if (unlikely(req->cmd_type == REQ_TYPE_BLOCK_PC))
>  		max_sectors = queue_max_hw_sectors(q);
>  	else
>  		max_sectors = queue_max_sectors(q);
>  
> +	if (unlikely(req->cmd_flags & REQ_DISCARD))
> +		max_sectors = queue_max_discard_sectors(q);
> +

(Continue reading)

Shaohua Li | 12 Mar 04:04 2012

[patch 6/7] blk: add plug for blkdev_issue_discard

In raid 0 case, a big discard request is divided into several small requests
in chunk_size unit. Such requests can be merged in low layer if we have
correct plug added. This should improve the performance a little bit.

raid 10 case doesn't matter, as we dispatch request in a separate thread
and there is plug there.

Signed-off-by: Shaohua Li <shli <at> fusionio.com>
---
 block/blk-lib.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux/block/blk-lib.c
===================================================================
--- linux.orig/block/blk-lib.c	2012-03-09 16:56:41.043790011 +0800
+++ linux/block/blk-lib.c	2012-03-12 10:21:38.716609525 +0800
 <at>  <at>  -47,6 +47,7  <at>  <at>  int blkdev_issue_discard(struct block_de
 	struct bio_batch bb;
 	struct bio *bio;
 	int ret = 0;
+	struct blk_plug plug;

 	if (!q)
 		return -ENXIO;
 <at>  <at>  -78,6 +79,7  <at>  <at>  int blkdev_issue_discard(struct block_de
 	bb.flags = 1 << BIO_UPTODATE;
 	bb.wait = &wait;

+	blk_start_plug(&plug);
 	while (nr_sects) {
(Continue reading)

Vivek Goyal | 13 Mar 16:51 2012
Picon

Re: [patch 6/7] blk: add plug for blkdev_issue_discard

On Mon, Mar 12, 2012 at 11:04:18AM +0800, Shaohua Li wrote:
> In raid 0 case, a big discard request is divided into several small requests
> in chunk_size unit. Such requests can be merged in low layer if we have
> correct plug added. This should improve the performance a little bit.

Martin posted a patch to remove the support for allowing merging of discard
requests. But this seems to be a reasonable use case for allowing mering
discard requests. CCing Martin.

Thanks
Vivek
Martin K. Petersen | 13 Mar 18:04 2012
Picon

Re: [patch 6/7] blk: add plug for blkdev_issue_discard

>>>>> "Vivek" == Vivek Goyal <vgoyal <at> redhat.com> writes:

Vivek> On Mon, Mar 12, 2012 at 11:04:18AM +0800, Shaohua Li wrote:
>> In raid 0 case, a big discard request is divided into several small
>> requests in chunk_size unit. Such requests can be merged in low layer
>> if we have correct plug added. This should improve the performance a
>> little bit.

Vivek> Martin posted a patch to remove the support for allowing merging
Vivek> of discard requests. But this seems to be a reasonable use case
Vivek> for allowing mering discard requests. CCing Martin.

Merging discard requests is hard given how we need to prepare the
command payload at the bottom of the stack. The current upstream merge
code pretends to be working but it actually doesn't. That's why I want
it dead and buried.

I have some changes pending (that I need for the REQ_COPY support) that
will make merging of non-rw requests easier to deal with. But that's a
kernel release cycle away...

--

-- 
Martin K. Petersen	Oracle Linux Engineering
Vivek Goyal | 13 Mar 18:14 2012
Picon

Re: [patch 6/7] blk: add plug for blkdev_issue_discard

On Tue, Mar 13, 2012 at 01:04:58PM -0400, Martin K. Petersen wrote:
> >>>>> "Vivek" == Vivek Goyal <vgoyal <at> redhat.com> writes:
> 
> Vivek> On Mon, Mar 12, 2012 at 11:04:18AM +0800, Shaohua Li wrote:
> >> In raid 0 case, a big discard request is divided into several small
> >> requests in chunk_size unit. Such requests can be merged in low layer
> >> if we have correct plug added. This should improve the performance a
> >> little bit.
> 
> Vivek> Martin posted a patch to remove the support for allowing merging
> Vivek> of discard requests. But this seems to be a reasonable use case
> Vivek> for allowing mering discard requests. CCing Martin.
> 
> Merging discard requests is hard given how we need to prepare the
> command payload at the bottom of the stack. The current upstream merge
> code pretends to be working but it actually doesn't. That's why I want
> it dead and buried.
> 
> I have some changes pending (that I need for the REQ_COPY support) that
> will make merging of non-rw requests easier to deal with. But that's a
> kernel release cycle away...

So first we will get rid of mering discard request and then enable after
one release cycle once REQ_COPY support is in?

Thanks
Vivek
Martin K. Petersen | 13 Mar 18:19 2012
Picon

Re: [patch 6/7] blk: add plug for blkdev_issue_discard

>>>>> "Vivek" == Vivek Goyal <vgoyal <at> redhat.com> writes:

Vivek> So first we will get rid of mering discard request 

We'll get rid of code that doesn't do anything other than confuse.

Vivek> and then enable after one release cycle once REQ_COPY support is
Vivek> in?

The new code will be entirely different. And one cycle is optimistic
given the current state of affairs in T10.

--

-- 
Martin K. Petersen	Oracle Linux Engineering
Shaohua Li | 12 Mar 04:04 2012

[patch 3/7] md: raid 0 supports TRIM

This makes md raid 0 support TRIM.

Signed-off-by: Shaohua Li <shli <at> fusionio.com>
---
 drivers/md/raid0.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Index: linux/drivers/md/raid0.c
===================================================================
--- linux.orig/drivers/md/raid0.c	2012-03-09 16:56:41.133790013 +0800
+++ linux/drivers/md/raid0.c	2012-03-12 10:16:18.526610922 +0800
 <at>  <at>  -88,6 +88,7  <at>  <at>  static int create_strip_zones(struct mdd
 	char b[BDEVNAME_SIZE];
 	char b2[BDEVNAME_SIZE];
 	struct r0conf *conf = kzalloc(sizeof(*conf), GFP_KERNEL);
+	bool discard_supported = false;

 	if (!conf)
 		return -ENOMEM;
 <at>  <at>  -201,6 +202,9  <at>  <at>  static int create_strip_zones(struct mdd
 		if (!smallest || (rdev1->sectors < smallest->sectors))
 			smallest = rdev1;
 		cnt++;
+
+		if (blk_queue_discard(bdev_get_queue(rdev1->bdev)))
+			discard_supported = true;
 	}
 	if (cnt != mddev->raid_disks) {
 		printk(KERN_ERR "md/raid0:%s: too few disks (%d of %d) - "
 <at>  <at>  -278,6 +282,11  <at>  <at>  static int create_strip_zones(struct mdd
(Continue reading)

Roberto Spadim | 12 Mar 04:18 2012
Picon

Re: [patch 0/7] Add TRIM support for raid linear/0/1/10

nice!

Em 12 de março de 2012 00:04, Shaohua Li <shli <at> fusionio.com> escreveu:
> The patches add TRIM support for raid linear/0/1/10. I'll add TRIM support for
> raid 4/5/6 later. The implementation is pretty straightforward and
> self-explained.
>
> Thanks,
> Shaohua
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo <at> vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--

-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Holger Kiehl | 12 Mar 19:22 2012
Picon

Re: [patch 0/7] Add TRIM support for raid linear/0/1/10

Hello,

On Mon, 12 Mar 2012, Shaohua Li wrote:

> The patches add TRIM support for raid linear/0/1/10. I'll add TRIM support for
> raid 4/5/6 later. The implementation is pretty straightforward and
> self-explained.
>
First, thanks for this patch!

I have applied those patches against 3.3.0-rc7 and during boot the kernel
reports a lot of the following:

    Mar 12 18:56:00 c3po kernel: [    7.611045] md/raid0:md3: make_request bug: can't convert block across
chunks or bigger than 512k 18861064 512
    Mar 12 18:56:00 c3po kernel: [    7.611047] md/raid0:md3: make_request bug: can't convert block across
chunks or bigger than 512k 18862088 512
    Mar 12 18:56:00 c3po kernel: [    7.611049] md/raid0:md3: make_request bug: can't convert block across
chunks or bigger than 512k 18863112 512
    Mar 12 18:56:00 c3po kernel: [    7.611052] md/raid0:md3: make_request bug: can't convert block across
chunks or bigger than 512k 18864136 512
    Mar 12 18:56:00 c3po kernel: [    7.611054] md/raid0:md3: make_request bug: can't convert block across
chunks or bigger than 512k 18865160 512
    Mar 12 18:56:00 c3po kernel: [    7.611056] md/raid0:md3: make_request bug: can't convert block across
chunks or bigger than 512k 18866184 512

The raid looks as follows:

   cat /proc/mdstat
   Personalities : [raid0] [raid1]
(Continue reading)

NeilBrown | 14 Mar 03:24 2012
Picon

Re: [patch 0/7] Add TRIM support for raid linear/0/1/10

On Mon, 12 Mar 2012 11:04:12 +0800 Shaohua Li <shli <at> fusionio.com> wrote:

> The patches add TRIM support for raid linear/0/1/10. I'll add TRIM support for
> raid 4/5/6 later. The implementation is pretty straightforward and
> self-explained.
> 
> Thanks,
> Shaohua

Thanks.
They look mostly OK.

In raid0.c, I think you'll need to change

		/* Sanity check -- queue functions should prevent this happening */
		if (bio->bi_vcnt != 1 ||
		    bio->bi_idx != 0)
			goto bad_map;

to also allow for 'bi_vcnt == 0' like you did in bio_split.

Also I wonder about handling failure in RAID1.
I think the code will currently treat it like a write error, and
maybe record a bad block (then fail the device is writing the badblock
record fails). Is that what were want?

And of course resync/recovery will mess up the discarded sector information,
so this isn't a complete solution for RAID1.  But  it is a reasonable start.

Thanks,
(Continue reading)

Shaohua Li | 14 Mar 03:47 2012

Re: [patch 0/7] Add TRIM support for raid linear/0/1/10

On 3/14/12 10:24 AM, NeilBrown wrote:
>  On Mon, 12 Mar 2012 11:04:12 +0800 Shaohua Li <shli <at> fusionio.com> wrote:
>
> > The patches add TRIM support for raid linear/0/1/10. I'll add TRIM 
support for
> > raid 4/5/6 later. The implementation is pretty straightforward and
> > self-explained.
> >
> > Thanks,
> > Shaohua
>
>  Thanks.
>  They look mostly OK.
>
>  In raid0.c, I think you'll need to change
>
>  /* Sanity check -- queue functions should prevent this happening */
>  if (bio->bi_vcnt != 1 ||
>  bio->bi_idx != 0)
>  goto bad_map;
>
>  to also allow for 'bi_vcnt == 0' like you did in bio_split.
>
>  Also I wonder about handling failure in RAID1.
>  I think the code will currently treat it like a write error, and
>  maybe record a bad block (then fail the device is writing the badblock
>  record fails). Is that what were want?
Mainly to simplify the code. And I thought a normal discard should not fail.
If it fails, something is wrong, marked it as badblock maybe not bad.

(Continue reading)

Mark Lord | 17 Mar 19:14 2012

Re: [patch 0/7] Add TRIM support for raid linear/0/1/10

On 12-03-13 10:47 PM, Shaohua Li wrote:
> On 3/14/12 10:24 AM, NeilBrown wrote:
>>  On Mon, 12 Mar 2012 11:04:12 +0800 Shaohua Li <shli <at> fusionio.com> wrote:
..
>>  Also I wonder about handling failure in RAID1.
>>  I think the code will currently treat it like a write error, and
>>  maybe record a bad block (then fail the device is writing the badblock
>>  record fails). Is that what were want?
> Mainly to simplify the code. And I thought a normal discard should not fail.
> If it fails, something is wrong, marked it as badblock maybe not bad.

That sounds like a VERY bad idea.
Failures happen for lots of reasons,
but generic comm errors are not an excuse to suddenly mark sectors as bad.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shaohua Li | 18 Mar 03:03 2012

Re: [patch 0/7] Add TRIM support for raid linear/0/1/10

2012/3/18 Mark Lord <kernel <at> teksavvy.com>:
> On 12-03-13 10:47 PM, Shaohua Li wrote:
>> On 3/14/12 10:24 AM, NeilBrown wrote:
>>>  On Mon, 12 Mar 2012 11:04:12 +0800 Shaohua Li <shli <at> fusionio.com> wrote:
> ..
>>>  Also I wonder about handling failure in RAID1.
>>>  I think the code will currently treat it like a write error, and
>>>  maybe record a bad block (then fail the device is writing the badblock
>>>  record fails). Is that what were want?
>> Mainly to simplify the code. And I thought a normal discard should not fail.
>> If it fails, something is wrong, marked it as badblock maybe not bad.
>
> That sounds like a VERY bad idea.
> Failures happen for lots of reasons,
> but generic comm errors are not an excuse to suddenly mark sectors as bad.
Not sure if I got it, but we treat discard similar like write. Did you
mean discard
request error is common?

Gmane