Lasse Kliemann | 9 Mar 2008 22:45

Re: refuse to update certain files upon extraction

* Message by -Lasse Kliemann- from Sun 2008-03-09:

> There was no error message (tried -vv, but not -debug). Some time passed 
> since, and I do not have those files anymore. But I found a way to reproduce 
> this, or at least something similar. How to do this is slightly complicated, 
> and I do not understand it fully myself, so I go right to the results. There 
> is a level 0 dump and a level 1 dump, and a file F that changed between the 
> dumps. If the level 1 dump is extracted with 'star -xpU', then the file F is 
> restored to the state it was when the dump was taken. However, if the two 
> dumps are extracted onto each other with 'star -xpU -restore', mysteriously F 
> is made a hardlink to some other file, and hence of course is not restored to 
> its correct state.
> 
> Now, the dumps are taken off from a Linux LVM filesystem snapshot. As you 
> know, I already discovered irregularities with those (but could not yet
> investigate it further satisfactorily due to time constraints). Would you 
> suggest that the above problem is caused by a faulty filesystem snapshot 
> implementation, or might this be a problem in star? You may inspect the 
> tarfiles if you wish, it's just about 2 MB; I uploaded them to 
> 
> http://unix.plastictree.net/tmp/20080309/dump0.tar
> http://unix.plastictree.net/tmp/20080309/dump1.tar
> 
> The file to look out for is `send-backup-test/supervise/pid'. It is made a 
> hardlink to `send-backup-test/log/supervise/pid' upon restore as described 
> above.

I've tracked this thing down further, and now I have a simple way to 
reproduce this, and it doesn't even involve snapshots. Attached is a small 
program rename.c. Assume that this program is available under the command 
(Continue reading)

Joerg Schilling | 18 Mar 2008 15:45
Picon
Favicon

Re: refuse to update certain files upon extraction

Lasse Kliemann <lasse-list-star-users <at> mail.plastictree.net> wrote:

> * Message by -Lasse Kliemann- from Sun 2008-03-09:
>  
> > There was no error message (tried -vv, but not -debug). Some time passed 
> > since, and I do not have those files anymore. But I found a way to reproduce 
> > this, or at least something similar. How to do this is slightly complicated, 
> > and I do not understand it fully myself, so I go right to the results. There 
> > is a level 0 dump and a level 1 dump, and a file F that changed between the 
> > dumps. If the level 1 dump is extracted with 'star -xpU', then the file F is 
> > restored to the state it was when the dump was taken. However, if the two 
> > dumps are extracted onto each other with 'star -xpU -restore', mysteriously F 
> > is made a hardlink to some other file, and hence of course is not restored to 
> > its correct state.
> > 
> > Now, the dumps are taken off from a Linux LVM filesystem snapshot. As you 
> > know, I already discovered irregularities with those (but could not yet
> > investigate it further satisfactorily due to time constraints). Would you 
> > suggest that the above problem is caused by a faulty filesystem snapshot 
> > implementation, or might this be a problem in star? You may inspect the 
> > tarfiles if you wish, it's just about 2 MB; I uploaded them to 
> > 
> > http://unix.plastictree.net/tmp/20080309/dump0.tar
> > http://unix.plastictree.net/tmp/20080309/dump1.tar
> > 
> > The file to look out for is `send-backup-test/supervise/pid'. It is made a 
> > hardlink to `send-backup-test/log/supervise/pid' upon restore as described 
> > above.

Well, I was afraid that there might have been a problem in star short time 
(Continue reading)

Lasse Kliemann | 18 Mar 2008 16:23

Re: refuse to update certain files upon extraction

* Message by -Joerg Schilling- from Tue 2008-03-18:

> The problem is a Linux kernel bug.
> 
> Any other backup tool (like e.g. dump/restore) will run into the same problem
> as Linux did not update the timestamp from the directory 
> "send-backup-test/log/supervise/".
> 
> You need to send a bug-report against Linux and get a fix for this problem....

I've already prepared a minimal example to prove this, see the attached 
files. The ctime of `sub' changes the first time that `rename' is run, but 
not the second. I also tried this on Solaris, and there the ctime is 
updated after the second run of `rename'.

Lasse

#!/bin/sh

rm -fr sub new &&
mkdir sub &&
stat sub &&
sleep 2 &&
echo running rename &&
./rename a &&
stat sub &&
sleep 2 &&
echo running rename again &&
(Continue reading)

Joerg Schilling | 18 Mar 2008 16:22
Picon
Favicon

Re: refuse to update certain files upon extraction

Lasse Kliemann <lasse-list-star-users <at> mail.plastictree.net> wrote:

> * Message by -Joerg Schilling- from Tue 2008-03-18:
>
> > The problem is a Linux kernel bug.
> > 
> > Any other backup tool (like e.g. dump/restore) will run into the same problem
> > as Linux did not update the timestamp from the directory 
> > "send-backup-test/log/supervise/".
> > 
> > You need to send a bug-report against Linux and get a fix for this problem....
>
> I've already prepared a minimal example to prove this, see the attached 
> files. The ctime of `sub' changes the first time that `rename' is run, but 
> not the second. I also tried this on Solaris, and there the ctime is 
> updated after the second run of `rename'.

Do you know whether your original problem was caused by a rename or rather a 
unlink/create chain?

From looking at inode numbers and file content, a rename is less probable than 
a unlink/create chain together with a inode algorithm that reusess old inodes 
too fast.

Jörg

--

-- 
 EMail:joerg <at> schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
       js <at> cs.tu-berlin.de                (uni)  
       schilling <at> fokus.fraunhofer.de     (work) Blog: http://schily.blogspot.com/
(Continue reading)

Lasse Kliemann | 18 Mar 2008 16:49

Re: refuse to update certain files upon extraction

* Message by -Joerg Schilling- from Tue 2008-03-18:
> Lasse Kliemann <lasse-list-star-users <at> mail.plastictree.net> wrote:
> 
> > * Message by -Joerg Schilling- from Tue 2008-03-18:
> >
> > > The problem is a Linux kernel bug.
> > > 
> > > Any other backup tool (like e.g. dump/restore) will run into the same problem
> > > as Linux did not update the timestamp from the directory 
> > > "send-backup-test/log/supervise/".
> > > 
> > > You need to send a bug-report against Linux and get a fix for this problem....
> >
> > I've already prepared a minimal example to prove this, see the attached 
> > files. The ctime of `sub' changes the first time that `rename' is run, but 
> > not the second. I also tried this on Solaris, and there the ctime is 
> > updated after the second run of `rename'.
> 
> Do you know whether your original problem was caused by a rename or rather a 
> unlink/create chain?

It was caused by a rename, as in the attached rename.c (that's the one from 
my old message <20080309214556.GK2723 <at> lasse.mail.plastictre.net> and not the 
one I sent minutes ago).

This rename.c is more complicated, but it was the simplest thing I managed to 
construct to reproduce the problem in conjunction with incremental dumps. 

By the way, I just did some research on the rename-ctime-thing on Linux, and 
it appears that something similar was noticed back in 2004 already. There are 
(Continue reading)

Lasse Kliemann | 18 Mar 2008 17:45

Re: refuse to update certain files upon extraction

* Message by -Lasse Kliemann- from Tue 2008-03-18:

> By the way, I just did some research on the rename-ctime-thing on Linux, and 
> it appears that something similar was noticed back in 2004 already. There are 
> close to no responses to the patches sent in at that time. Someone even wrote 
> that this behavior is okay according to the SuS, but I am not completely sure 
> that he refers to the same thing. The current Linux Bug Tracker does not give 
> any hits for 'ctime rename', so I will file a bug report there now. Let's see 
> what happens.

I'm getting really close now.

Remember that I wrote last week that I could reproduce this on ext2, ext3, 
xfs, and reiserfs? It turned out that on each of them -save ext3- the problem 
was due to that 'rapid dump' thing. If I insert a sleep 1 after the dump, the 
problem only persists on ext3. Now we also have the explanation why the 
problem disappeared on Solaris after I inserted the sleep 1.

The idea occurred to me after noticing that I could not reproduce the ctime 
problem on xfs.

Hence I filed that as an ext3 bug now on the Linux Bug Tracker.

_______________________________________________
Star-users mailing list
Star-users <at> lists.berlios.de
https://lists.berlios.de/mailman/listinfo/star-users
(Continue reading)

Joerg Schilling | 18 Mar 2008 17:45
Picon
Favicon

Re: refuse to update certain files upon extraction

Lasse Kliemann <lasse-list-star-users <at> mail.plastictree.net> wrote:

> I'm getting really close now.
>
> Remember that I wrote last week that I could reproduce this on ext2, ext3, 
> xfs, and reiserfs? It turned out that on each of them -save ext3- the problem 
> was due to that 'rapid dump' thing. If I insert a sleep 1 after the dump, the 
> problem only persists on ext3. Now we also have the explanation why the 
> problem disappeared on Solaris after I inserted the sleep 1.

So do you propose to add a sleep(1) into star.c before it starts with the
incremental dump and to use >= for the time compare?

Jörg

--

-- 
 EMail:joerg <at> schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
       js <at> cs.tu-berlin.de                (uni)  
       schilling <at> fokus.fraunhofer.de     (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Lasse Kliemann | 18 Mar 2008 20:05

Re: refuse to update certain files upon extraction

* Message by -Joerg Schilling- from Tue 2008-03-18:
> Lasse Kliemann <lasse-list-star-users <at> mail.plastictree.net> wrote:
> 
> > I'm getting really close now.
> >
> > Remember that I wrote last week that I could reproduce this on ext2, ext3, 
> > xfs, and reiserfs? It turned out that on each of them -save ext3- the problem 
> > was due to that 'rapid dump' thing. If I insert a sleep 1 after the dump, the 
> > problem only persists on ext3. Now we also have the explanation why the 
> > problem disappeared on Solaris after I inserted the sleep 1.
> 
> 
> So do you propose to add a sleep(1) into star.c before it starts with the
> incremental dump and to use >= for the time compare?

Yes.

At first, I could not see how sleep(1) could prevent the same file to get 
archived twice. It will in fact not make any difference when using snapshots 
of live filesystems. But if there is a cycle: change files, dump, change 
files, dump, ..., then it will prevent files to be archived multiple times.

And *maybe*, as I wrote in my first message, there should be < in the 
statement further down. I actually suggested > there in my first message, but 
that's clearly wrong. But I am not sure about < either, because I don't know 
what that case is for. It does not seem to be relevant for dumplevels larger 
than 0 anyway.

Lasse
(Continue reading)

Joerg Schilling | 19 Mar 2008 17:01
Picon
Favicon

Re: refuse to update certain files upon extraction

Lasse Kliemann <lasse-list-star-users <at> mail.plastictree.net> wrote:

> At first, I could not see how sleep(1) could prevent the same file to get 
> archived twice. It will in fact not make any difference when using snapshots 
> of live filesystems. But if there is a cycle: change files, dump, change 
> files, dump, ..., then it will prevent files to be archived multiple times.

I am no longer sure how to do the sleep(1) correctly..... I'll fall back to
the >= solution that has the disadvantage of a probability to archive a file
when it is not really needed. 

My example from the man page

     echo > /tmp/snapstamp

     mount -r `fssnap -F ufs -o \
         backing-store=/var/tmp/EXPORT-NFS.snap /export/nfs` /mnt

     star -c -xdev -sparse -acl -link-dirs level=0 -wtardumps \
         f=archive-name dumpdate=/tmp/snapstamp \

will also archive some files twice although not needed because  /tmp/snapstamp is
older than the snapshot itself.

Jörg

--

-- 
 EMail:joerg <at> schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
       js <at> cs.tu-berlin.de                (uni)  
       schilling <at> fokus.fraunhofer.de     (work) Blog: http://schily.blogspot.com/
(Continue reading)

Lasse Kliemann | 19 Mar 2008 18:05

Re: refuse to update certain files upon extraction

* Message by -Joerg Schilling- from Wed 2008-03-19:
> Lasse Kliemann <lasse-list-star-users <at> mail.plastictree.net> wrote:
> 
> > At first, I could not see how sleep(1) could prevent the same file to get 
> > archived twice. It will in fact not make any difference when using snapshots 
> > of live filesystems. But if there is a cycle: change files, dump, change 
> > files, dump, ..., then it will prevent files to be archived multiple times.
> 
> I am no longer sure how to do the sleep(1) correctly..... I'll fall back to
> the >= solution that has the disadvantage of a probability to archive a file
> when it is not really needed. 

I have no objection to this.

Users can always put a sleep in their scripts if they wish.

> My example from the man page
> 
> 
>      echo > /tmp/snapstamp
> 
>      mount -r `fssnap -F ufs -o \
>          backing-store=/var/tmp/EXPORT-NFS.snap /export/nfs` /mnt
> 
>      star -c -xdev -sparse -acl -link-dirs level=0 -wtardumps \
>          f=archive-name dumpdate=/tmp/snapstamp \
> 
> 
> will also archive some files twice although not needed because  /tmp/snapstamp is
> older than the snapshot itself.
(Continue reading)

Joerg Schilling | 20 Mar 2008 12:08
Picon
Favicon

(no subject)

Lasse Kliemann <lasse-list-star-users <at> mail.plastictree.net> wrote:

> > My example from the man page
> > 
> > 
> >      echo > /tmp/snapstamp
> > 
> >      mount -r `fssnap -F ufs -o \
> >          backing-store=/var/tmp/EXPORT-NFS.snap /export/nfs` /mnt
> > 
> >      star -c -xdev -sparse -acl -link-dirs level=0 -wtardumps \
> >          f=archive-name dumpdate=/tmp/snapstamp \
> > 
> > 
> > will also archive some files twice although not needed because  /tmp/snapstamp is
> > older than the snapshot itself.
>
> Yes, and a sleep would have no effect here (save for the whole procedure 
> taking one second longer then necessary :-).
>
>
> By the way, this three-step-method (touch file, take fs snapshot, dump) can 
> in fact lead to a significant amount of data archived twice, at least on 
> Linux and under extreme conditions. I did tests once with blogbench running 

It may do even on Solaris, as it takes more than a minute to set up a snapshot
for a big filesystem. If this is really a problem is hard to tell as fssnap(1) 
first applies a write lock to the filesystem...

> If one does that rsync-mirror-method I described earlier (keep a mirror of 
(Continue reading)

Joerg Schilling | 18 Mar 2008 17:37
Picon
Favicon

Re: refuse to update certain files upon extraction

Lasse Kliemann <lasse-list-star-users <at> mail.plastictree.net> wrote:

> It was caused by a rename, as in the attached rename.c (that's the one from 
> my old message <20080309214556.GK2723 <at> lasse.mail.plastictre.net> and not the 
> one I sent minutes ago).
>
> This rename.c is more complicated, but it was the simplest thing I managed to 
> construct to reproduce the problem in conjunction with incremental dumps. 
>
> By the way, I just did some research on the rename-ctime-thing on Linux, and 
> it appears that something similar was noticed back in 2004 already. There are 
> close to no responses to the patches sent in at that time. Someone even wrote 
> that this behavior is okay according to the SuS, but I am not completely sure 
> that he refers to the same thing. The current Linux Bug Tracker does not give 
> any hits for 'ctime rename', so I will file a bug report there now. Let's see 
> what happens.

The behavior is not OK as neither mtime nor ctime are updated.

BTW: The Linux discussions I am aware of mention that Linux "only" updates ctime
which would be OK for star.

I am not able to say more now as opengroup.org is not working....

Jörg

--

-- 
 EMail:joerg <at> schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
       js <at> cs.tu-berlin.de                (uni)  
       schilling <at> fokus.fraunhofer.de     (work) Blog: http://schily.blogspot.com/
(Continue reading)

Lasse Kliemann | 18 Mar 2008 16:58

Re: refuse to update certain files upon extraction

* Message by -Lasse Kliemann- from Tue 2008-03-18:
> * Message by -Joerg Schilling- from Tue 2008-03-18:
> > Lasse Kliemann <lasse-list-star-users <at> mail.plastictree.net> wrote:
> > 
> > > * Message by -Joerg Schilling- from Tue 2008-03-18:
> > >
> > > > The problem is a Linux kernel bug.
> > > > 
> > > > Any other backup tool (like e.g. dump/restore) will run into the same problem
> > > > as Linux did not update the timestamp from the directory 
> > > > "send-backup-test/log/supervise/".
> > > > 
> > > > You need to send a bug-report against Linux and get a fix for this problem....
> > >
> > > I've already prepared a minimal example to prove this, see the attached 
> > > files. The ctime of `sub' changes the first time that `rename' is run, but 
> > > not the second. I also tried this on Solaris, and there the ctime is 
> > > updated after the second run of `rename'.
> > 
> > Do you know whether your original problem was caused by a rename or rather a 
> > unlink/create chain?
>  
> It was caused by a rename, as in the attached rename.c (that's the one from 
> my old message <20080309214556.GK2723 <at> lasse.mail.plastictre.net> and not the 
> one I sent minutes ago).

Oh, of course this also includes unlink and create, I think, for the file is
opened with O_TRUNC and O_CREAT.

By the way, the files you examined show how the problem can be reproduced 
(Continue reading)


Gmane