Ken Smith | 22 Apr 2012 17:50

Software Raid 5 MD0 just stopped working

I'm helping a friend with an old FC6 system I set up for him ages ago.

It has a Logical Lolume made from MD0 and MD1 that in turn are two three 
disk raid 5 sets.

One day MD0 decided not to play any more. When I looked at the system 
MD0 was no longer mentioned in /proc/mdstat. And the VG was showing that 
it was made of an unknown device and MD1.

I reassembled MD0 and the Raid appeared to be happy and re-established 
the uuid of MD0 and the LV was found again but the ext3 filesystem on 
the LV was a shambles. Its all backed up so it can all be put back.

The machine runs smartctl -a on its disks daily and I have the records 
of that going back for over a year. MD0 is made of two Western Digital 
500G's  and a Seagate 500G. All the smartctl data looks fine. Except 
that the Seagate is showing:-

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x000f   114   099   006    Pre-fail  Always       -       70478277
   3 Spin_Up_Time            0x0003   094   093   000    Pre-fail  Always       -       0
   4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       26
   5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
   7 Seek_Error_Rate         0x000f   081   060   030    Pre-fail  Always       -       126904381
   9 Power_On_Hours          0x0032   061   061   000    Old_age   Always       -       34946
  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       26
187 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
189 Unknown_Attribute       0x003a   100   100   000    Old_age   Always       -       0
190 Temperature_Celsius     0x0022   069   060   045    Old_age   Always       -       554958879
(Continue reading)

Philip Hands | 29 Apr 2012 23:30
Favicon
Gravatar

Re: Software Raid 5 MD0 just stopped working

On Sun, 22 Apr 2012 16:50:00 +0100, Ken Smith <kens <at> kensnet.org> wrote:
...
> I'm trying to fathom why MD0 just packed up a went home. Noting in the 
> log files to give a clue.

Is it mentioned in mdadm.conf, and is what's in mdadm.conf the same as
what:

  mdadm --scan --examine

spits out?

I'm not a Fedora user, but various upgrades of mdadm in Debian have
broken things either because the device names changed, or I saw one
system were the UUIDs changed (for no reason I could work out) and there
have occasionally been issues with initrds not being rebuilt with all
the raids listed to be started (but that would only stop it appearing at
boot time, rather than make it not be there at all.

Cheers, Phil.
--

-- 
|)|  Philip Hands [+44 (0)20 8530 9560]    http://www.hands.com/
|-|  HANDS.COM Ltd.                    http://www.uk.debian.org/
|(|  10 Onslow Gardens, South Woodford, London  E18 1NE  ENGLAND
--
Gllug mailing list  -  Gllug <at> gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
(Continue reading)

David Damerell | 3 May 2012 01:23
Picon

Re: Software Raid 5 MD0 just stopped working

On Sunday, 29 Apr 2012, Philip Hands wrote:
>I'm not a Fedora user, but various upgrades of mdadm in Debian have
>broken things either because the device names changed, or I saw one
>system were the UUIDs changed (for no reason I could work out)

I saw that once because mke2fs -S had been used (don't ask why, I
never did find that out), which also assigns a new UUID, but leaves
the filesystem otherwise intact.

--

-- 
David Damerell <damerell <at> chiark.greenend.org.uk>
And now, a seemingly inexplicable shot of a passing train.
Today is First Wednesday, May.
Tomorrow will be First Thursday, May.
--
Gllug mailing list  -  Gllug <at> gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug


Gmane