Mark Lord | 1 Feb 2008 15:06
Picon
Favicon

Re: libata pm

dusty <at> gmx.li wrote:
>>> Sorry for my late answer, but i had to sort this out first.
>>> After replacing the first PSU with a new Corsair 650W the power no
>>> longer fluctuated more than 0,01 V (and this only when booting up the
>>> drives...) I did a full resync on both raid arrays and got no more
>>> errors or resets, but there were some inconsitencies during sync and the
>>> xfs filesystem on both arrays had to be repaired. Are these problems
>>> caused by the pm resets ?
>>>
>> libata EH won't lose any data as long as the hardware doesn't.  If power
>> fluctuates causing your drive to briefly power down - this does happen and
>> you can hear the drive doing emergency unload when it happens, the data in
>> write buffer can be lost.  On coming back, all that libata can know is
>> that the PHY suffered brief connection loss, so it resets the device and
>> goes on, so the data in the cache is lost now.  It's basically pulling the
>> power plug from the harddrive while write is going on and connecting it
>> back quickly.  You're bound to lose data.
>>
> After I got the new PSU and the raid was in full sync without any error
> for 48h, I thought all problems were gone. Today the sata errors
> reappeared and whenever the load is high enough I get the following:
..

What exact brand/model drives are those again (hdparm --Istdout, please) ?

If I have a similar unit here, I may try to reproduce this.

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
(Continue reading)

dusty@gmx.li | 1 Feb 2008 16:12
Picon

Re: libata pm

Am Fr, 1.02.2008, 14:06, schrieb Mark Lord:
> dusty <at> gmx.li wrote:
>>>> Sorry for my late answer, but i had to sort this out first.
>>>> After replacing the first PSU with a new Corsair 650W the power no
>>>> longer fluctuated more than 0,01 V (and this only when booting up
>>>> the drives...) I did a full resync on both raid arrays and got no
>>>> more errors or resets, but there were some inconsitencies during
>>>> sync and the xfs filesystem on both arrays had to be repaired. Are
>>>> these problems caused by the pm resets ?
>>>>
>>> libata EH won't lose any data as long as the hardware doesn't.  If
>>> power fluctuates causing your drive to briefly power down - this does
>>> happen and you can hear the drive doing emergency unload when it
>>> happens, the data in write buffer can be lost.  On coming back, all
>>> that libata can know is that the PHY suffered brief connection loss,
>>> so it resets the device and goes on, so the data in the cache is lost
>>> now.  It's basically pulling the power plug from the harddrive while
>>> write is going on and connecting it back quickly.  You're bound to
>>> lose data.
>>>
>> After I got the new PSU and the raid was in full sync without any error
>>  for 48h, I thought all problems were gone. Today the sata errors
>> reappeared and whenever the load is high enough I get the following:
> ..
>
>
> What exact brand/model drives are those again (hdparm --Istdout, please)
> ?
>
>
(Continue reading)

Tejun Heo | 1 Feb 2008 16:05
Picon

Re: libata pm

Mark Lord wrote:
>> After I got the new PSU and the raid was in full sync without any error
>> for 48h, I thought all problems were gone. Today the sata errors
>> reappeared and whenever the load is high enough I get the following:
> ..
> 
> What exact brand/model drives are those again (hdparm --Istdout, please) ?
> 
> If I have a similar unit here, I may try to reproduce this.

Dusty, can you please provide the info for Mark?  Let's see if Mark can
reproduce this.

Thanks.

--

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

dusty@gmx.li | 5 Feb 2008 19:50
Picon

Re: libata pm

Hi everybody,

it looks like this will be a never ending story...

>>> After I got the new PSU and the raid was in full sync without any
>>> error for 48h, I thought all problems were gone. Today the sata errors
>>>  reappeared and whenever the load is high enough I get the following:
>>>
I exchanged two (probably failing) of the eight harddrives with new ones.
All remaining disks have a good smart state and are fully readable when
the raid is not active. As long as I mount the filesystem on the raid
readonly there wont happen any error, but the moment I mount it rw and try
to copy something to the fs on the raid I get the already known timeout.
At least I get a little bit desperate now...

ata10.00: failed to read SCR 1 (Emask=0x40)
ata10.01: failed to read SCR 1 (Emask=0x40)
ata10.02: failed to read SCR 1 (Emask=0x40)
ata10.03: failed to read SCR 1 (Emask=0x40)
ata10.04: failed to read SCR 1 (Emask=0x40)
ata10.05: failed to read SCR 1 (Emask=0x40)
ata10.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.01: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 1 cdb 0x0 data 0
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata10.01: status: { DRDY }
ata10.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.03: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.04: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
ata10.05: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
(Continue reading)


Gmane