Christian Balzer | 22 Jun 2012 09:06

MD Raid10 recovery results in "attempt to access beyond end of device"


Hello,

the basics first:
Debian Squeeze, custom 3.2.18 kernel.

The Raid(s) in question are:
---
Personalities : [raid1] [raid10] 
md4 : active raid10 sdd1[0] sdb4[5](S) sdl1[4] sdk1[3] sdj1[2] sdi1[1]
      3662836224 blocks super 1.2 512K chunks 2 near-copies [5/5] [UUUUU]

md3 : active raid10 sdh1[7] sdc1[0] sda4[5](S) sdg1[3] sdf1[2] sde1[6]
      3662836224 blocks super 1.2 512K chunks 2 near-copies [5/4] [UUUU_]
      [=====>...............]  recovery = 28.3% (415962368/1465134592) finish=326.2min speed=53590K/sec
---

Drives sda to sdd are on nVidia MCP55 and sde to sdl on SAS1068E, sdc to
sdl are identical 1.5TB Seagates (about 2 years old, recycled from the
previous incarnation of these machines) with a single partition spanning
the whole drive like this:
---
Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
(Continue reading)

NeilBrown | 22 Jun 2012 10:07
Picon
Gravatar

Re: MD Raid10 recovery results in "attempt to access beyond end of device"

On Fri, 22 Jun 2012 16:06:32 +0900 Christian Balzer <chibi <at> gol.com> wrote:

> 
> Hello,
> 
> the basics first:
> Debian Squeeze, custom 3.2.18 kernel.
> 
> The Raid(s) in question are:
> ---
> Personalities : [raid1] [raid10] 
> md4 : active raid10 sdd1[0] sdb4[5](S) sdl1[4] sdk1[3] sdj1[2] sdi1[1]
>       3662836224 blocks super 1.2 512K chunks 2 near-copies [5/5] [UUUUU]

I'm stumped by this.  It shouldn't be possible.

The size of the array is impossible.

If there are N chunks per device, then there are 5*N chunks on the whole
array, and there are are two copies of each data chunk, so
5*N/2 distinct data chunks, so that should be the size of the array.

So if we take the size of the array, divide by chunk size, multiply by 2,
divide by 5, we get N = the number of chunks per device.
i.e.
  N = (array_size / chunk_size)*2 / 5

If we plug in 3662836224 for the array size and 512 for the chunk size,
we get 2861590.8, which is not an integer.
i.e. impossible.
(Continue reading)

Christian Balzer | 22 Jun 2012 10:42

Re: MD Raid10 recovery results in "attempt to access beyond end of device"


Hello,

On Fri, 22 Jun 2012 18:07:48 +1000 NeilBrown wrote:

> On Fri, 22 Jun 2012 16:06:32 +0900 Christian Balzer <chibi <at> gol.com>
> wrote:
> 
> > 
> > Hello,
> > 
> > the basics first:
> > Debian Squeeze, custom 3.2.18 kernel.
> > 
> > The Raid(s) in question are:
> > ---
> > Personalities : [raid1] [raid10] 
> > md4 : active raid10 sdd1[0] sdb4[5](S) sdl1[4] sdk1[3] sdj1[2] sdi1[1]
> >       3662836224 blocks super 1.2 512K chunks 2 near-copies [5/5]
> > [UUUUU]
> 
> I'm stumped by this.  It shouldn't be possible.
> 
> The size of the array is impossible.
> 
> If there are N chunks per device, then there are 5*N chunks on the whole
> array, and there are are two copies of each data chunk, so
> 5*N/2 distinct data chunks, so that should be the size of the array.
> 
> So if we take the size of the array, divide by chunk size, multiply by 2,
(Continue reading)

Christian Balzer | 23 Jun 2012 06:13

Re: MD Raid10 recovery results in "attempt to access beyond end of device"

On Fri, 22 Jun 2012 17:42:57 +0900 Christian Balzer wrote:

> 
> Hello,
> 
> On Fri, 22 Jun 2012 18:07:48 +1000 NeilBrown wrote:
> 
> > On Fri, 22 Jun 2012 16:06:32 +0900 Christian Balzer <chibi <at> gol.com>
> > wrote:
> > 
> > > 
> > > Hello,
> > > 
> > > the basics first:
> > > Debian Squeeze, custom 3.2.18 kernel.
> > > 
> > > The Raid(s) in question are:
> > > ---
> > > Personalities : [raid1] [raid10] 
> > > md4 : active raid10 sdd1[0] sdb4[5](S) sdl1[4] sdk1[3] sdj1[2]
> > > sdi1[1] 3662836224 blocks super 1.2 512K chunks 2 near-copies [5/5]
> > > [UUUUU]
> > 
> > I'm stumped by this.  It shouldn't be possible.
> > 
> > The size of the array is impossible.
> > 
> > If there are N chunks per device, then there are 5*N chunks on the
> > whole array, and there are are two copies of each data chunk, so
> > 5*N/2 distinct data chunks, so that should be the size of the array.
(Continue reading)

NeilBrown | 25 Jun 2012 06:07
Picon
Gravatar

Re: MD Raid10 recovery results in "attempt to access beyond end of device"

On Fri, 22 Jun 2012 17:42:57 +0900 Christian Balzer <chibi <at> gol.com> wrote:

> 
> Hello,
> 
> On Fri, 22 Jun 2012 18:07:48 +1000 NeilBrown wrote:
> 
> > On Fri, 22 Jun 2012 16:06:32 +0900 Christian Balzer <chibi <at> gol.com>
> > wrote:
> > 
> > > 
> > > Hello,
> > > 
> > > the basics first:
> > > Debian Squeeze, custom 3.2.18 kernel.
> > > 
> > > The Raid(s) in question are:
> > > ---
> > > Personalities : [raid1] [raid10] 
> > > md4 : active raid10 sdd1[0] sdb4[5](S) sdl1[4] sdk1[3] sdj1[2] sdi1[1]
> > >       3662836224 blocks super 1.2 512K chunks 2 near-copies [5/5]
> > > [UUUUU]
> > 
> > I'm stumped by this.  It shouldn't be possible.
> > 
> > The size of the array is impossible.
> > 
> > If there are N chunks per device, then there are 5*N chunks on the whole
> > array, and there are are two copies of each data chunk, so
> > 5*N/2 distinct data chunks, so that should be the size of the array.
(Continue reading)

Christian Balzer | 25 Jun 2012 08:06

Re: MD Raid10 recovery results in "attempt to access beyond end of device"


Hello Neil,

On Mon, 25 Jun 2012 14:07:54 +1000 NeilBrown wrote:

> On Fri, 22 Jun 2012 17:42:57 +0900 Christian Balzer <chibi <at> gol.com>
> wrote:
> 
> > 
> > Hello,
> > 
> > On Fri, 22 Jun 2012 18:07:48 +1000 NeilBrown wrote:
> > 
> > > On Fri, 22 Jun 2012 16:06:32 +0900 Christian Balzer <chibi <at> gol.com>
> > > wrote:
> > > 
> > > > 
> > > > Hello,
> > > > 
> > > > the basics first:
> > > > Debian Squeeze, custom 3.2.18 kernel.
> > > > 
> > > > The Raid(s) in question are:
> > > > ---
> > > > Personalities : [raid1] [raid10] 
> > > > md4 : active raid10 sdd1[0] sdb4[5](S) sdl1[4] sdk1[3] sdj1[2]
> > > > sdi1[1] 3662836224 blocks super 1.2 512K chunks 2 near-copies [5/5]
> > > > [UUUUU]
> > > 
> > > I'm stumped by this.  It shouldn't be possible.
(Continue reading)

Christian Balzer | 26 Jun 2012 16:48

Re: MD Raid10 recovery results in "attempt to access beyond end of device"


Hello,

On Mon, 25 Jun 2012 15:06:51 +0900 Christian Balzer wrote:

> 
> Hello Neil,
> 
> On Mon, 25 Jun 2012 14:07:54 +1000 NeilBrown wrote:
> 
> > On Fri, 22 Jun 2012 17:42:57 +0900 Christian Balzer <chibi <at> gol.com>
> > wrote:
> > 
> > > 
> > > Hello,
> > > 
> > > On Fri, 22 Jun 2012 18:07:48 +1000 NeilBrown wrote:
> > > 
> > > > On Fri, 22 Jun 2012 16:06:32 +0900 Christian Balzer <chibi <at> gol.com>
> > > > wrote:
> > > > 
> > > > > 
> > > > > Hello,
> > > > > 
> > > > > the basics first:
> > > > > Debian Squeeze, custom 3.2.18 kernel.
> > > > > 
> > > > > The Raid(s) in question are:
> > > > > ---
> > > > > Personalities : [raid1] [raid10] 
(Continue reading)

NeilBrown | 3 Jul 2012 03:46
Picon
Gravatar

Re: MD Raid10 recovery results in "attempt to access beyond end of device"

On Tue, 26 Jun 2012 23:48:45 +0900 Christian Balzer <chibi <at> gol.com> wrote:

> 

> The patch worked fine:
> ---
> [  105.872117] md: recovery of RAID array md3
> [28981.157157] md: md3: recovery done.
> ---
> 
> Thanks a bunch and I'd suggest to include this patch in any and all
> feasible backports and future kernels of course.
> 
> Regards,
> 
> Christian
> 

Thanks for the confirmation.  I've added your "tested-by" and will forward to
Linus and -stable later today.

Thanks,
NeilBrown

Gmane