Rumen Telbizov | 26 Oct 22:04 2010
Picon

Degraded zpool cannot detach old/bad drive

Hello everyone,

After a few days of struggle with my degraded zpool on a backup server I
decided to ask for
help here or at least get some clues as to what might be wrong with it.
Here's the current state of the zpool:

# zpool status

  pool: tank
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME                          STATE     READ WRITE CKSUM
        tank                          DEGRADED     0     0     0
          raidz1                      DEGRADED     0     0     0
            spare                     DEGRADED     0     0     0
              replacing               DEGRADED     0     0     0
                17307041822177798519  UNAVAIL      0   299     0  was
/dev/gpt/disk-e1:s2
                gpt/newdisk-e1:s2     ONLINE       0     0     0
              gpt/disk-e2:s10         ONLINE       0     0     0
            gpt/disk-e1:s3            ONLINE      30     0     0
            gpt/disk-e1:s4            ONLINE       0     0     0
(Continue reading)

Rumen Telbizov | 28 Oct 01:22 2010
Picon

Re: Degraded zpool cannot detach old/bad drive

No ideas whatsoever?

On Tue, Oct 26, 2010 at 1:04 PM, Rumen Telbizov <telbizov <at> gmail.com> wrote:

> Hello everyone,
>
> After a few days of struggle with my degraded zpool on a backup server I
> decided to ask for
> help here or at least get some clues as to what might be wrong with it.
> Here's the current state of the zpool:
>
> # zpool status
>
>   pool: tank
>  state: DEGRADED
> status: One or more devices has experienced an error resulting in data
>         corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>         entire pool from backup.
>    see: http://www.sun.com/msg/ZFS-8000-8A
>  scrub: none requested
> config:
>
>         NAME                          STATE     READ WRITE CKSUM
>         tank                          DEGRADED     0     0     0
>           raidz1                      DEGRADED     0     0     0
>             spare                     DEGRADED     0     0     0
>               replacing               DEGRADED     0     0     0
>                 17307041822177798519  UNAVAIL      0   299     0  was
> /dev/gpt/disk-e1:s2
(Continue reading)

Artem Belevich | 28 Oct 02:22 2010

Re: Degraded zpool cannot detach old/bad drive

Are you interested in what's wrong or in how to fix it?

If fixing is the priority, I'd boot from OpenSolaris live CD and would
try importing the array there. Just make sure you don't upgrade ZFS to
a version that is newer than the one FreeBSD supports.

Opensolaris may be able to fix the array. Once it's done, export it,
boot back to FreeBSD and re-import it.

--Artem

On Wed, Oct 27, 2010 at 4:22 PM, Rumen Telbizov <telbizov <at> gmail.com> wrote:
> No ideas whatsoever?
>
> On Tue, Oct 26, 2010 at 1:04 PM, Rumen Telbizov <telbizov <at> gmail.com> wrote:
>
>> Hello everyone,
>>
>> After a few days of struggle with my degraded zpool on a backup server I
>> decided to ask for
>> help here or at least get some clues as to what might be wrong with it.
>> Here's the current state of the zpool:
>>
>> # zpool status
>>
>>   pool: tank
>>  state: DEGRADED
>> status: One or more devices has experienced an error resulting in data
>>         corruption.  Applications may be affected.
>> action: Restore the file in question if possible.  Otherwise restore the
(Continue reading)

Rumen Telbizov | 28 Oct 03:05 2010
Picon

Re: Degraded zpool cannot detach old/bad drive

Thanks Artem,

I am mainly concerned about fixing this immediate problem first and then if
I
can provide more information for the developers so that they look into this
problem
I'd be happy to.

I'll try OpenSolaris live CD and see how it goes. Either way I'll report
back here.

Cheers,
Rumen Telbizov

On Wed, Oct 27, 2010 at 5:22 PM, Artem Belevich <fbsdlist <at> src.cx> wrote:

> Are you interested in what's wrong or in how to fix it?
>
> If fixing is the priority, I'd boot from OpenSolaris live CD and would
> try importing the array there. Just make sure you don't upgrade ZFS to
> a version that is newer than the one FreeBSD supports.
>
> Opensolaris may be able to fix the array. Once it's done, export it,
> boot back to FreeBSD and re-import it.
>
> --Artem
>
>
>
> On Wed, Oct 27, 2010 at 4:22 PM, Rumen Telbizov <telbizov <at> gmail.com>
(Continue reading)

Rumen Telbizov | 29 Oct 04:15 2010
Picon

Re: Degraded zpool cannot detach old/bad drive

Hello Artem, everyone,

Here's an update on my case.
After following Artem's advice I downloaded OpenSolaris 2009 06 LiveCD and
booted (over IPMI share)
from it. Aliasing the proper disk driver I got access to all JBOD disks that
I had before. They had
different names (OpenSolaris style) but order and configuration seemed fine.
I was in fact able to
remove those old/nonexisting devices from OpenSolaris without a problem with
the same commands
that I was using under FreeBSD. The pool started resilvering which wasn't
that important at that stage.
So I exported the pool and rebooted back into FreeBSD.
FreeBSD saw the pool and I managed to mount it fine. All the data was there
and resilvering was initiated.

There is a problem though. Initially I used gpt labeled partitions to
construct the pool. They had names
like /dev/gpt/disk-e1:s15 for example and represented a gpt partition on top
of a mfidXX device underneat.
Now before I import the pool I do see them in /dev/gpt just fine like that:

# ls /dev/gpt

 disk-e1:s10 disk-e1:s11 disk-e1:s12 disk-e1:s13
 disk-e1:s14 disk-e1:s15 disk-e1:s16 disk-e1:s18
 disk-e1:s19 disk-e1:s20 disk-e1:s21 disk-e1:s22
 disk-e1:s23 disk-e1:s3 disk-e1:s4 disk-e1:s5
 disk-e1:s6 disk-e1:s7 disk-e1:s8 disk-e1:s9
(Continue reading)

Artem Belevich | 29 Oct 06:46 2010

Re: Degraded zpool cannot detach old/bad drive

> but only those 3 devices in /dev/gpt and absolutely nothing in /dev/gptid/
> So is there a way to bring all the gpt labeled partitions back into the pool
> instead of using the mfidXX devices?

Try re-importing the pool with "zpool import -d /dev/gpt". This will
tell ZFS to use only devices found within that path and your pool
should be using gpt labels again.

--Artem
_______________________________________________
freebsd-stable <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe <at> freebsd.org"

Rumen Telbizov | 29 Oct 07:51 2010
Picon

Re: Degraded zpool cannot detach old/bad drive

Hi Artem, everyone,

Thanks for your quick response. Unfortunately I already did try this
approach.
Applying -d /dev/gpt only limits the pool to the bare three remaining disks
which turns
pool completely unusable (no mfid devices). Maybe those labels are removed
shortly
they are being tried to be imported/accessed?

What I don't understand is what exactly makes those gpt labels disappear
when the pool
is imported and otherwise are just fine?!
Something to do with OpenSolaris ? On top of it all gpart show -l keeps
showing all
the labels right even while the pool is imported.

Any other clues would be appreciated.

Thank you,
Rumen Telbizov

On Thu, Oct 28, 2010 at 9:46 PM, Artem Belevich <fbsdlist <at> src.cx> wrote:

> > but only those 3 devices in /dev/gpt and absolutely nothing in
> /dev/gptid/
> > So is there a way to bring all the gpt labeled partitions back into the
> pool
> > instead of using the mfidXX devices?
>
(Continue reading)

Stefan Bethke | 29 Oct 08:32 2010
Picon

Re: Degraded zpool cannot detach old/bad drive


Am 29.10.2010 um 07:51 schrieb Rumen Telbizov:

> Thanks for your quick response. Unfortunately I already did try this
> approach. Applying -d /dev/gpt only limits the pool to the bare three remaining disks
> which turns pool completely unusable (no mfid devices). Maybe those labels are removed
> shortly they are being tried to be imported/accessed?
> 
> What I don't understand is what exactly makes those gpt labels disappear
> when the pool is imported and otherwise are just fine?!
> Something to do with OpenSolaris ? On top of it all gpart show -l keeps
> showing all the labels right even while the pool is imported.
> 
> Any other clues would be appreciated.

The labels are removed by glabel as soon as something opens the underlying provider, i. e. the disk device,
for writing.  Since that process could change the part of the disk that the label information is extracted
from, the label is removed.  glabel will re-taste the provider once the process closes it again.

Since you're using gpt labels, I would expect them to continue to be available, unless zpool import somehow
opens the disk devices (instead of the partition devices).

Stefan

--

-- 
Stefan Bethke <stb <at> lassitu.de>   Fon +49 151 14070811

_______________________________________________
freebsd-stable <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
(Continue reading)

Artem Belevich | 29 Oct 09:26 2010

Re: Degraded zpool cannot detach old/bad drive

On Thu, Oct 28, 2010 at 10:51 PM, Rumen Telbizov <telbizov <at> gmail.com> wrote:
> Hi Artem, everyone,
>
> Thanks for your quick response. Unfortunately I already did try this
> approach.
> Applying -d /dev/gpt only limits the pool to the bare three remaining disks
> which turns
> pool completely unusable (no mfid devices). Maybe those labels are removed
> shortly
> they are being tried to be imported/accessed?

In one of the previous emails you've clearly listed many devices in
/dev/gpt and said that they've disappeared after pool import.
Did you do "zpool import -d /dev/gpt" while /dev/gpt entries were present?

> What I don't understand is what exactly makes those gpt labels disappear
> when the pool is imported and otherwise are just fine?!

This is the way GEOM works. If something (ZFS in this case) uses raw
device, derived GEOM entities disappear.

Try exporting the pool. Your /dev/gpt entries should be back. Now try
to import with -d option and see if it works.

You may try bringing the labels back the hard way by detaching raw
drive and then re-attaching it via the label, but resilvering one
drive at a time will take a while.

--Artem
_______________________________________________
(Continue reading)

Rumen Telbizov | 29 Oct 20:34 2010
Picon

Re: Degraded zpool cannot detach old/bad drive

Hi Artem, everyone,

Thanks once again for your feedback and help.
Here's more information.

# zpool export tank

# ls /dev/gpt
disk-e1:s10 disk-e1:s11 disk-e1:s12 disk-e1:s13
disk-e1:s14 disk-e1:s15 disk-e1:s16 disk-e1:s18
disk-e1:s19 disk-e1:s20 disk-e1:s21 disk-e1:s22
disk-e1:s23 disk-e1:s3 disk-e1:s4 disk-e1:s5
disk-e1:s6 disk-e1:s7 disk-e1:s8 disk-e1:s9
disk-e2:s0 disk-e2:s1 disk-e2:s10 disk-e2:s11
disk-e2:s2 disk-e2:s3 disk-e2:s4 disk-e2:s5
disk-e2:s6 disk-e2:s7 disk-e2:s8 disk-e2:s9
newdisk-e1:s17
newdisk-e1:s2

All the disks are here! Same for /dev/gptid/. Now importing the pool back
like you suggested:

# zpool import -d /dev/gpt
  pool: tank
    id: 13504509992978610301
 state: UNAVAIL
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
   see: http://www.sun.com/msg/ZFS-8000-5E
config:
(Continue reading)

Artem Belevich | 29 Oct 22:36 2010

Re: Degraded zpool cannot detach old/bad drive

On Fri, Oct 29, 2010 at 11:34 AM, Rumen Telbizov <telbizov <at> gmail.com> wrote:
> The problem I think comes down to what I have written in the zpool.cache
> file.
> It stores the mfid path instead of the gpt/disk one.
>       children[0]
>              type='disk'
>              id=0
>              guid=1641394056824955485
>              path='/dev/mfid33p1'
>              phys_path='/pci <at> 0,0/pci8086,3b42 <at> 1c/pci15d9,c480 <at> 0/sd <at> 1,0:a'
>              whole_disk=0
>              DTL=55

Yes, phys_path does look like something that came from solaris.

> Compared to a disk from a partner server which is fine:
>       children[0]
>              type='disk'
>              id=0
>              guid=5513814503830705577
>              path='/dev/gpt/disk-e1:s6'
>              whole_disk=0

If you have old copy of /boot/zfs/zpool.cache you could try use "zpool
import -c old-cache-file".

I don't think zpool.cache is needed for import. Import should work
without it just fine. Just remove /boot/zfs/zpool.cache (or move it
somewhere else and then try importing with -d /dev/gpt again.

(Continue reading)

Rumen Telbizov | 29 Oct 23:19 2010
Picon

Re: Degraded zpool cannot detach old/bad drive

Artem,

> If you have old copy of /boot/zfs/zpool.cache you could try use "zpool
> import -c old-cache-file".
>

Unfortunately I don't :(
I'll make a habit of creating a copy from now on!

>
> I don't think zpool.cache is needed for import. Import should work
> without it just fine. Just remove /boot/zfs/zpool.cache (or move it
> somewhere else and then try importing with -d /dev/gpt again.
>

You're right. zpool export tank seems to remove the cache file so import has
nothing to consult so doesn't make any difference.

I guess my only chance at this point would be to somehow manually edit
the zpool configuration, via the zpool.cache file or not, and substitute
mfid with gpt/disk?!
Is there a way to do this?

Thanks,
--

-- 
Rumen Telbizov
http://telbizov.com
_______________________________________________
freebsd-stable <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
(Continue reading)

Artem Belevich | 30 Oct 01:06 2010

Re: Degraded zpool cannot detach old/bad drive

On Fri, Oct 29, 2010 at 2:19 PM, Rumen Telbizov <telbizov <at> gmail.com> wrote:
> You're right. zpool export tank seems to remove the cache file so import has
> nothing to consult so doesn't make any difference.
> I guess my only chance at this point would be to somehow manually edit
> the zpool configuration, via the zpool.cache file or not, and substitute
> mfid with gpt/disk?!
> Is there a way to do this?

I'm not aware of any tools to edit zpool.cache.

What's really puzzling is why GPT labels disappear in the middle of
zpool import. I'm fresh out of ideas why that would happen.

What FreeBSD version are you running. SVN revision of the sources
would be good, but date may also work.

--Artem
_______________________________________________
freebsd-stable <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe <at> freebsd.org"

Rumen Telbizov | 30 Oct 01:42 2010
Picon

Re: Degraded zpool cannot detach old/bad drive

Hi Artem,

What's really puzzling is why GPT labels disappear in the middle of
> zpool import. I'm fresh out of ideas why that would happen.
>

Thanks for your support anyway. Appreciated.

What FreeBSD version are you running. SVN revision of the sources
> would be good, but date may also work.
>

FreeBSD 8.1-STABLE #0: Sun Sep  5 00:22:45 PDT 2010

That's when I csuped and rebuilt world/kernel.
I can (and probably will) very easily csup it to the latest stable and try
to upgrade zfs to version
15 which was incorporated shortly after this build. See if that makes any
difference.

I wonder if there's a way to fix it over OpenSolaris LiveCD. Somehow load
the gpt labeled partitions and save
the cache file.

Regards,
--

-- 
Rumen Telbizov
http://telbizov.com
_______________________________________________
freebsd-stable <at> freebsd.org mailing list
(Continue reading)

Artem Belevich | 30 Oct 02:01 2010

Re: Degraded zpool cannot detach old/bad drive

On Fri, Oct 29, 2010 at 4:42 PM, Rumen Telbizov <telbizov <at> gmail.com> wrote:
> FreeBSD 8.1-STABLE #0: Sun Sep  5 00:22:45 PDT 2010
> That's when I csuped and rebuilt world/kernel.

There were a lot of ZFS-related MFCs since then. I'd suggest updating
to the most recent -stable and try again.

I've got another idea that may or may not work. Assuming that GPT
labels disappear because zpool opens one of the /dev/mfid* devices,
you can try to do "chmod a-rw /dev/mfid*" on them and then try
importing the pool again.

--Artem
_______________________________________________
freebsd-stable <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe <at> freebsd.org"

Rumen Telbizov | 30 Oct 03:24 2010
Picon

Re: Degraded zpool cannot detach old/bad drive

Thanks Artem,

I'll upgrade to latest stable and zfs 15 tomorrow or Sunday and I'll see if
that makes
it any better. If not I'll also try the chmod operation below.

Thanks for the suggestions. I'll report back here.

Regards,
Rumen Telbizov

On Fri, Oct 29, 2010 at 5:01 PM, Artem Belevich <fbsdlist <at> src.cx> wrote:

> On Fri, Oct 29, 2010 at 4:42 PM, Rumen Telbizov <telbizov <at> gmail.com>
> wrote:
> > FreeBSD 8.1-STABLE #0: Sun Sep  5 00:22:45 PDT 2010
> > That's when I csuped and rebuilt world/kernel.
>
> There were a lot of ZFS-related MFCs since then. I'd suggest updating
> to the most recent -stable and try again.
>
> I've got another idea that may or may not work. Assuming that GPT
> labels disappear because zpool opens one of the /dev/mfid* devices,
> you can try to do "chmod a-rw /dev/mfid*" on them and then try
> importing the pool again.
>
> --Artem
>

--

-- 
(Continue reading)

Rumen Telbizov | 31 Oct 20:53 2010
Picon

Re: Degraded zpool cannot detach old/bad drive

Hi Artem, everyone,

Here's the latest update on my case.
I did upgrade the system to the latest stable: 8.1-STABLE FreeBSD 8.1-STABLE
#0: Sun Oct 31 11:44:06 PDT 2010
After that I did zpool upgrade and zfs upgrade -r all the filesystems.
Currently I am running zpool 15 and zfs 4.
Everything went fine with the upgrade but unfortunately my problem still
persists. There's no difference in this aspect.
I still have mfid devices. I also tried chmod-ing as you suggested /dev/mfid
devices but zfs/zpool didn't seem to care and imported
the array regardless.

So at this point since no one else seems to have any ideas and we seem to be
stuck I am almost ready to declare defeat on this one.
Although the pool is usable I couldn't bring it back to exactly the same
state as it was before the disk replacements took place.
Disappointing indeed, although not a complete show stopper.

I still think that if there's a way to edit the cache file and change the
devices that might do the trick.

Thanks for all the help,
Rumen Telbizov

On Fri, Oct 29, 2010 at 5:01 PM, Artem Belevich <fbsdlist <at> src.cx> wrote:

> On Fri, Oct 29, 2010 at 4:42 PM, Rumen Telbizov <telbizov <at> gmail.com>
> wrote:
> > FreeBSD 8.1-STABLE #0: Sun Sep  5 00:22:45 PDT 2010
(Continue reading)

jhell | 7 Nov 04:59 2010
Picon

Re: Degraded zpool cannot detach old/bad drive

On 10/31/2010 15:53, Rumen Telbizov wrote:
> Hi Artem, everyone,
> 
> Here's the latest update on my case.
> I did upgrade the system to the latest stable: 8.1-STABLE FreeBSD 8.1-STABLE
> #0: Sun Oct 31 11:44:06 PDT 2010
> After that I did zpool upgrade and zfs upgrade -r all the filesystems.
> Currently I am running zpool 15 and zfs 4.
> Everything went fine with the upgrade but unfortunately my problem still
> persists. There's no difference in this aspect.
> I still have mfid devices. I also tried chmod-ing as you suggested /dev/mfid
> devices but zfs/zpool didn't seem to care and imported
> the array regardless.
> 
> So at this point since no one else seems to have any ideas and we seem to be
> stuck I am almost ready to declare defeat on this one.
> Although the pool is usable I couldn't bring it back to exactly the same
> state as it was before the disk replacements took place.
> Disappointing indeed, although not a complete show stopper.
> 
> I still think that if there's a way to edit the cache file and change the
> devices that might do the trick.
> 
> Thanks for all the help,
> Rumen Telbizov
> 
> 
> On Fri, Oct 29, 2010 at 5:01 PM, Artem Belevich <fbsdlist <at> src.cx> wrote:
> 
>> On Fri, Oct 29, 2010 at 4:42 PM, Rumen Telbizov <telbizov <at> gmail.com>
(Continue reading)

Rumen Telbizov | 16 Nov 22:15 2010
Picon

Re: Degraded zpool cannot detach old/bad drive

Hello everyone,

jhell thanks for the advice. I am sorry I couldn't try it earlier but the
server was pretty busy and
I just found a window to test this. So I think I'm pretty much there but
still having a problem.
So here's what I have:

I exported the pool.
I hid the individual disks (without mfid0 which is my root) in
/etc/devfs.rules like you suggested:

/etc/devfs.rules

add path 'mfid1' hide
add path 'mfid1p1' hide
...

Checked that those are gone from /dev/.
Then here's what happened when tried to import the pool

# zpool import
  pool: tank
    id: 13504509992978610301
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

        tank                                            ONLINE
          raidz1                                        ONLINE
(Continue reading)

jhell | 17 Nov 05:55 2010
Picon

Re: Degraded zpool cannot detach old/bad drive

On 11/16/2010 16:15, Rumen Telbizov wrote:
> It seems like *kern.geom.label.gptid.enable: 0 *does not work anymore? I am
> pretty sure I was able to hide all the /dev/gptid/* entries with this
> sysctl variable before but now it doesn't quite work for me.

I could be wrong but I believe that is more of a loader tuneable than a
sysctl that should be modified at run-time. Rebooting with this set to 0
will disable showing the /dev/gptid directory.

This makes me wonder if those sysctl's should be marked read-only at
run-time. Though you could even rm -rf /dev/gptid ;)

--

-- 

 jhell,v
_______________________________________________
freebsd-stable <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe <at> freebsd.org"

Rumen Telbizov | 18 Nov 03:16 2010
Picon

Re: Degraded zpool cannot detach old/bad drive

Hi jhell, everyone,

Thanks for your feedback and support everyone.
Indeed after successfully disabling /dev/gptid/* zfs managed to find all the
gpt/ labels
without a problem and the array looked exactly the way it did in the very
beginning.
So at that point I could say that I was able to fully recover the array
without data
loss to exactly the state it was in the beginning of its creation. Not
without adventure though ;)

Ironically due to some other reasons just after I fully recovered it I had
to destroy it
and rebuild from scratch with raidz2 vdevs (of 8 disks) rather than raidz1s
(of 4 disks) ;)
Basically I need better redundancy so that I can handle double disk failure
in a vdev. Seems
like the chance of a second disk failing while rebuilding the zpool for like
15 hours on those
2TB disks is quite significant.

I wonder if this conversion will reduce the IOPs of the pool in half ...

Anyway, thank you once again. Highly appreciated. I hope this is a helpful
piece of
discussion for other people having similar problems.

Cheers,
Rumen Telbizov
(Continue reading)


Gmane