afters | 25 Nov 12:16 2010
Picon

Incremental backups

hi folks,

As I'm about to implement it myself, I'm curious to know how people handle
incremental backups for their DB's.

The straight-forward way, as I see it, is to shutdown couch and use a tool
like rsync or duplicity to backup db files. It should do the job well, and
as an added bonus, it could also be used to backup views.

Does anyone know if a similar backup could be done while the couch is still
on (and the db is being updated)?

Does anyone do incremental backups using replication?

 a.
Robert Newson | 25 Nov 12:21 2010
Picon

Re: Incremental backups

just copy the file, there's no need to stop couchdb. Replication would
be another way, of course.

On Thu, Nov 25, 2010 at 11:16 AM, afters <afters.mail@...> wrote:
> hi folks,
>
> As I'm about to implement it myself, I'm curious to know how people handle
> incremental backups for their DB's.
>
> The straight-forward way, as I see it, is to shutdown couch and use a tool
> like rsync or duplicity to backup db files. It should do the job well, and
> as an added bonus, it could also be used to backup views.
>
> Does anyone know if a similar backup could be done while the couch is still
> on (and the db is being updated)?
>
> Does anyone do incremental backups using replication?
>
>  a.
>

afters | 25 Nov 12:41 2010
Picon

Re: Incremental backups

I guess I'm not sure as to what's going on with the file as it is being
copied and changed at the same time. I assume that the file must be
append-only for this to work, but are ALL db changes indeed append only? I
remember reading that updating a tree-node also updates all its ancestors
(to track latest seq) and I wonder if those changes are also append only.

On 25 November 2010 13:21, Robert Newson <robert.newson@...> wrote:

> just copy the file, there's no need to stop couchdb. Replication would
> be another way, of course.
>
> On Thu, Nov 25, 2010 at 11:16 AM, afters <afters.mail@...> wrote:
> > hi folks,
> >
> > As I'm about to implement it myself, I'm curious to know how people
> handle
> > incremental backups for their DB's.
> >
> > The straight-forward way, as I see it, is to shutdown couch and use a
> tool
> > like rsync or duplicity to backup db files. It should do the job well,
> and
> > as an added bonus, it could also be used to backup views.
> >
> > Does anyone know if a similar backup could be done while the couch is
> still
> > on (and the db is being updated)?
> >
> > Does anyone do incremental backups using replication?
> >
(Continue reading)

Robert Newson | 25 Nov 13:36 2010
Picon

Re: Incremental backups

"but are ALL db changes indeed append only?"

Yes. the file is opened with O_APPEND, we *can't* write anywhere but
the end of the file.

Earlier versions of couchdb would update the header (the first 4k of
the file) but that is no longer the case. It's strictly append only.
Any truncation of the .couch file will yield a consistent database
(obviously missing the most recent changes".

B.

On Thu, Nov 25, 2010 at 11:41 AM, afters <afters.mail@...> wrote:
> I guess I'm not sure as to what's going on with the file as it is being
> copied and changed at the same time. I assume that the file must be
> append-only for this to work, but are ALL db changes indeed append only? I
> remember reading that updating a tree-node also updates all its ancestors
> (to track latest seq) and I wonder if those changes are also append only.
>
> On 25 November 2010 13:21, Robert Newson <robert.newson@...> wrote:
>
>> just copy the file, there's no need to stop couchdb. Replication would
>> be another way, of course.
>>
>> On Thu, Nov 25, 2010 at 11:16 AM, afters <afters.mail@...> wrote:
>> > hi folks,
>> >
>> > As I'm about to implement it myself, I'm curious to know how people
>> handle
>> > incremental backups for their DB's.
(Continue reading)

Robert Newson | 25 Nov 13:38 2010
Picon

Re: Incremental backups

The best way to backup is via replication, in my opinion. You can do
this continuously so your backup will be very close to the original.

On Thu, Nov 25, 2010 at 12:36 PM, Robert Newson
<robert.newson@...> wrote:
> "but are ALL db changes indeed append only?"
>
> Yes. the file is opened with O_APPEND, we *can't* write anywhere but
> the end of the file.
>
> Earlier versions of couchdb would update the header (the first 4k of
> the file) but that is no longer the case. It's strictly append only.
> Any truncation of the .couch file will yield a consistent database
> (obviously missing the most recent changes".
>
> B.
>
>
> On Thu, Nov 25, 2010 at 11:41 AM, afters <afters.mail@...> wrote:
>> I guess I'm not sure as to what's going on with the file as it is being
>> copied and changed at the same time. I assume that the file must be
>> append-only for this to work, but are ALL db changes indeed append only? I
>> remember reading that updating a tree-node also updates all its ancestors
>> (to track latest seq) and I wonder if those changes are also append only.
>>
>> On 25 November 2010 13:21, Robert Newson
<robert.newson@...> wrote:
>>
>>> just copy the file, there's no need to stop couchdb. Replication would
>>> be another way, of course.
(Continue reading)

afters | 25 Nov 14:48 2010
Picon

Re: Incremental backups

This way I could recover from failure, but I couldn't roll back to a
previous point in time as I could with incremental backups.

Maybe theoretically, I could make a roll back on any db, simply by chopping
off some bytes  from the end?

On 25 November 2010 14:38, Robert Newson <robert.newson@...> wrote:

> The best way to backup is via replication, in my opinion. You can do
> this continuously so your backup will be very close to the original.
>
> On Thu, Nov 25, 2010 at 12:36 PM, Robert Newson <robert.newson@...>
> wrote:
> > "but are ALL db changes indeed append only?"
> >
> > Yes. the file is opened with O_APPEND, we *can't* write anywhere but
> > the end of the file.
> >
> > Earlier versions of couchdb would update the header (the first 4k of
> > the file) but that is no longer the case. It's strictly append only.
> > Any truncation of the .couch file will yield a consistent database
> > (obviously missing the most recent changes".
> >
> > B.
> >
> >
> > On Thu, Nov 25, 2010 at 11:41 AM, afters <afters.mail@...> wrote:
> >> I guess I'm not sure as to what's going on with the file as it is being
> >> copied and changed at the same time. I assume that the file must be
> >> append-only for this to work, but are ALL db changes indeed append only?
(Continue reading)

Robert Newson | 25 Nov 14:55 2010
Picon

Re: Incremental backups

"Maybe theoretically, I could make a roll back on any db, simply by chopping
off some bytes  from the end?"

Yup, you can do that. CouchDB will seek backward from the end of the
file for the latest db header and then carry on as normal.

The real question is why you would want to? This scheme also prevents
you from ever compacting, as this will remove the previous revisions
of your documents.

B.

On Thu, Nov 25, 2010 at 1:48 PM, afters <afters.mail@...> wrote:
> This way I could recover from failure, but I couldn't roll back to a
> previous point in time as I could with incremental backups.
>
> Maybe theoretically, I could make a roll back on any db, simply by chopping
> off some bytes  from the end?
>
> On 25 November 2010 14:38, Robert Newson <robert.newson@...> wrote:
>
>> The best way to backup is via replication, in my opinion. You can do
>> this continuously so your backup will be very close to the original.
>>
>> On Thu, Nov 25, 2010 at 12:36 PM, Robert Newson <robert.newson@...>
>> wrote:
>> > "but are ALL db changes indeed append only?"
>> >
>> > Yes. the file is opened with O_APPEND, we *can't* write anywhere but
>> > the end of the file.
(Continue reading)

afters | 25 Nov 15:17 2010
Picon

Re: Incremental backups

I could do incremental backups for, say, every day of the week, and at the
end of the week compact my DB and do a full backup. This way I could roll
back do any day during the last week.

If I could precisely roll back a DB (something like what Dirkjan said:
roll-back-n-seqs, or roll-back-t-seconds), I could skip the incremental
backup and simply replicate. That way I would only need to do a backup
before compaction.

On 25 November 2010 15:55, Robert Newson <robert.newson@...> wrote:

> "Maybe theoretically, I could make a roll back on any db, simply by
> chopping
> off some bytes  from the end?"
>
> Yup, you can do that. CouchDB will seek backward from the end of the
> file for the latest db header and then carry on as normal.
>
> The real question is why you would want to? This scheme also prevents
> you from ever compacting, as this will remove the previous revisions
> of your documents.
>
> B.
>
> On Thu, Nov 25, 2010 at 1:48 PM, afters <afters.mail@...> wrote:
> > This way I could recover from failure, but I couldn't roll back to a
> > previous point in time as I could with incremental backups.
> >
> > Maybe theoretically, I could make a roll back on any db, simply by
> chopping
(Continue reading)

Nils Breunese | 25 Nov 17:31 2010
Picon

Re: Incremental backups

afters wrote:

> I could do incremental backups for, say, every day of the week, and at the
> end of the week compact my DB and do a full backup. This way I could roll
> back do any day during the last week.
>
> If I could precisely roll back a DB (something like what Dirkjan said:
> roll-back-n-seqs, or roll-back-t-seconds), I could skip the incremental
> backup and simply replicate. That way I would only need to do a backup
> before compaction.

I'd look into using a tool like rdiff-backup or the more involved (but great) BackupPC. It'll only
copy/transfer a diff from the previous backup and you'll have multiple points in time you can go back to.

Nils.
------------------------------------------------------------------------
 VPRO
 phone:  +31(0)356712911
 e-mail: info@...
 web:    www.vpro.nl
------------------------------------------------------------------------

afters | 25 Nov 17:43 2010
Picon

Re: Incremental backups

Thanks, that's what I'm working on currently (another possible tool, that
also encrypts, is Duplicity), and it seems like a decent solution now that I
know I could do it without stopping couch.

On 25 November 2010 18:31, Nils Breunese <N.Breunese@...> wrote:

> afters wrote:
>
> > I could do incremental backups for, say, every day of the week, and at
> the
> > end of the week compact my DB and do a full backup. This way I could roll
> > back do any day during the last week.
> >
> > If I could precisely roll back a DB (something like what Dirkjan said:
> > roll-back-n-seqs, or roll-back-t-seconds), I could skip the incremental
> > backup and simply replicate. That way I would only need to do a backup
> > before compaction.
>
> I'd look into using a tool like rdiff-backup or the more involved (but
> great) BackupPC. It'll only copy/transfer a diff from the previous backup
> and you'll have multiple points in time you can go back to.
>
> Nils.
> ------------------------------------------------------------------------
>  VPRO
>  phone:  +31(0)356712911
>  e-mail: info@...
>  web:    www.vpro.nl
> ------------------------------------------------------------------------
>
(Continue reading)

Nils Breunese | 25 Nov 12:39 2010
Picon

Re: Incremental backups

afters wrote:

> The straight-forward way, as I see it, is to shutdown couch and use a tool
> like rsync or duplicity to backup db files. It should do the job well, and
> as an added bonus, it could also be used to backup views.
>
> Does anyone know if a similar backup could be done while the couch is still
> on (and the db is being updated)?

Yes, you can just rsync or copy the files while CouchDB is running. Thanks to CouchDB's append-only file
structure this is safe.

Nils.
------------------------------------------------------------------------
 VPRO
 phone:  +31(0)356712911
 e-mail: info@...
 web:    www.vpro.nl
------------------------------------------------------------------------

Dirkjan Ochtman | 25 Nov 12:40 2010
Picon

Re: Incremental backups

On Thu, Nov 25, 2010 at 12:16, afters <afters.mail@...> wrote:
> Does anyone do incremental backups using replication?

Yeah, we mostly just replicate our databases from our local LAN server
to a server in a data center (for geographic redundancy). We use
continuous replication for this in several places, too. In fact, we'll
probably do it all using continuous replication as soon as we get 1.1
set up (which has the _replicator database, to keep continuous
replication going in the face of trouble).

Cheers,

Dirkjan

afters | 25 Nov 13:35 2010
Picon

Re: Incremental backups

Could you elaborate a bit? this sounds like a full backup to me?

On 25 November 2010 13:40, Dirkjan Ochtman <dirkjan@...> wrote:

> On Thu, Nov 25, 2010 at 12:16, afters <afters.mail@...> wrote:
> > Does anyone do incremental backups using replication?
>
> Yeah, we mostly just replicate our databases from our local LAN server
> to a server in a data center (for geographic redundancy). We use
> continuous replication for this in several places, too. In fact, we'll
> probably do it all using continuous replication as soon as we get 1.1
> set up (which has the _replicator database, to keep continuous
> replication going in the face of trouble).
>
> Cheers,
>
> Dirkjan
>
Dirkjan Ochtman | 25 Nov 13:39 2010
Picon

Re: Incremental backups

On Thu, Nov 25, 2010 at 13:35, afters <afters.mail@...> wrote:
> Could you elaborate a bit? this sounds like a full backup to me?

(please post your reply below the text you're replying to, to
facilitate linear reading)

No, replication only propagates document revisions that are new in the
source database. You can see in Futon how a database maintains an
"Update Seq", which is an identifier for the full contents of the
database at the time. By only transmitting only the updates after a
seq the target database already has, replication is fairly efficient.

Cheers,

Dirkjan

Jan Lehnardt | 25 Nov 14:37 2010
Picon

Re: Incremental backups


On 25 Nov 2010, at 13:39, Dirkjan Ochtman wrote:

> On Thu, Nov 25, 2010 at 13:35, afters <afters.mail@...> wrote:
>> Could you elaborate a bit? this sounds like a full backup to me?
> 
> (please post your reply below the text you're replying to, to
> facilitate linear reading)

Please don't rip out most of the discussion so I know where this
belongs in context.

(not seriously, but making a point: everybody has different taste in
 email etiquette and no one quoting style is correct for everybody.
 If anything, please ask nicely if the poster would consider changing
 his style.)

FWIW, I don't care much either way as I get top and bottom quote mixed
emails all the time. I even write both :)

Cheers
Jan
--

-- 

> 
> No, replication only propagates document revisions that are new in the
> source database. You can see in Futon how a database maintains an
> "Update Seq", which is an identifier for the full contents of the
> database at the time. By only transmitting only the updates after a
> seq the target database already has, replication is fairly efficient.
(Continue reading)

Dirkjan Ochtman | 25 Nov 14:47 2010
Picon

Re: Incremental backups

On Thu, Nov 25, 2010 at 14:37, Jan Lehnardt <jan@...> wrote:
> (not seriously, but making a point: everybody has different taste in
>  email etiquette and no one quoting style is correct for everybody.
>  If anything, please ask nicely if the poster would consider changing
>  his style.)

I understand the point, and agree with it. I think what I wrote was
fairly friendly, but maybe next time I'll include a question mark. :)

Cheers,

Dirkjan

afters | 25 Nov 14:41 2010
Picon

Re: Incremental backups

On 25 November 2010 14:39, Dirkjan Ochtman <dirkjan@...> wrote:

> On Thu, Nov 25, 2010 at 13:35, afters <afters.mail@...> wrote:
> > Could you elaborate a bit? this sounds like a full backup to me?
>
> (please post your reply below the text you're replying to, to
> facilitate linear reading)
>
> No, replication only propagates document revisions that are new in the
> source database. You can see in Futon how a database maintains an
> "Update Seq", which is an identifier for the full contents of the
> database at the time. By only transmitting only the updates after a
> seq the target database already has, replication is fairly efficient.
>

I should have been clearer. I'd like to know if in this method you can
recover not only the from the latest state of the db, but also from previous
states. If I simply backup a file with rsync once every day, I can recover
the state of the db from each of day.

>
> Cheers,
>
> Dirkjan
>
Dirkjan Ochtman | 25 Nov 14:48 2010
Picon

Re: Incremental backups

On Thu, Nov 25, 2010 at 14:41, afters <afters.mail@...> wrote:
> I should have been clearer. I'd like to know if in this method you can
> recover not only the from the latest state of the db, but also from previous
> states. If I simply backup a file with rsync once every day, I can recover
> the state of the db from each of day.

Well, as long as you don't compact the database, you can always
technically recover the state of the db at any time from the full
current database. I'm not sure there's any friendly interface for
that, though (replicate-up-to-x, anyone)?

Cheers,

Dirkjan

Robert Newson | 25 Nov 14:49 2010
Picon

Re: Incremental backups

You can copy the .couch file at any time and this will yield a usable
snapshot of the database.

B.

On Thu, Nov 25, 2010 at 1:41 PM, afters <afters.mail@...> wrote:
> On 25 November 2010 14:39, Dirkjan Ochtman <dirkjan@...> wrote:
>
>> On Thu, Nov 25, 2010 at 13:35, afters <afters.mail@...> wrote:
>> > Could you elaborate a bit? this sounds like a full backup to me?
>>
>> (please post your reply below the text you're replying to, to
>> facilitate linear reading)
>>
>> No, replication only propagates document revisions that are new in the
>> source database. You can see in Futon how a database maintains an
>> "Update Seq", which is an identifier for the full contents of the
>> database at the time. By only transmitting only the updates after a
>> seq the target database already has, replication is fairly efficient.
>>
>
> I should have been clearer. I'd like to know if in this method you can
> recover not only the from the latest state of the db, but also from previous
> states. If I simply backup a file with rsync once every day, I can recover
> the state of the db from each of day.
>
>
>
>>
>> Cheers,
(Continue reading)

Kimberlad | 29 Feb 08:49 2012

Re: Incremental backups

Robert Newson <robert.newson <at> ...> writes:

> 
> You can copy the .couch file at any time and this will yield a usable
> snapshot of the database.
> 
> B.
> 
> On Thu, Nov 25, 2010 at 1:41 PM, afters <afters.mail <at> ...> wrote:
> > On 25 November 2010 14:39, Dirkjan Ochtman <dirkjan <at> ...> wrote:
> >
> >> On Thu, Nov 25, 2010 at 13:35, afters <afters.mail <at> ...> wrote:
> >> > Could you elaborate a bit? this sounds like a full backup to me?
> >>
> >> (please post your reply below the text you're replying to, to
> >> facilitate linear reading)
> >>
> >> No, replication only propagates document revisions that are new in the
> >> source database. You can see in Futon how a database maintains an
> >> "Update Seq", which is an identifier for the full contents of the
> >> database at the time. By only transmitting only the updates after a
> >> seq the target database already has, replication is fairly efficient.
> >>
> >
> > I should have been clearer. I'd like to know if in this method you can
> > recover not only the from the latest state of the db, but also from previous
> > states. If I simply backup a file with rsync once every day, I can recover
> > the state of the db from each of day.
> >
> >
(Continue reading)

Mark Hahn | 29 Feb 09:45 2012

Re: Incremental backups

rdiff should work great.  As a matter of fact I think I'll try using it.
Jason Smith | 29 Feb 09:58 2012

Re: Incremental backups

On Wed, Feb 29, 2012 at 3:45 PM, Mark Hahn <mark@...> wrote:
> rdiff should work great.  As a matter of fact I think I'll try using it.

Another reason replication is not quite a backup is that it does not
get _local docs, or the _security object. Also, in some situations (if
you have good storage capacity and bandwidth) you would want to back
up views too which is much easier from the OS layer, not the CouchDB
layer.

I wrote about this in a similar discussion here:
http://mail-archives.apache.org/mod_mbox/couchdb-user/201012.mbox/%3CAANLkTimSfPnDUHNd4YqH5xMcaQPU9a3RFxj=mq-prMX7-JsoAwUIsXosN+BqQ9rBEUg <at> public.gmane.org%3E

--

-- 
Iris Couch

Jens Alfke | 29 Feb 23:02 2012

Re: Incremental backups


On Feb 28, 2012, at 11:49 PM, Kimberlad wrote:

So I'm none the wiser as to how I scale a Couchdb backup
other than I can possibly use something like rdiff
to do incrementals?

I’m not familiar with rdiff, but rsync would be a good approach, maybe combined with an SCM (e.g. rsync
into a git repository and then commit the changes.)

CouchDB files are pretty delta-friendly because they’re append-only, so most of the time nothing will
change in the file except for new data added at the end. When a database is compacted the file is completely
rewritten, though.

—Jens
Robert Newson | 29 Feb 23:15 2012
Picon

Re: Incremental backups

I assumed rdiff-backup was the tool in question. Should work nicely
until you compact, which you should be doing frequently on active
databases.

There's definitely a gap for a solid backup tool that works
incrementally even across compactions.

B.

On 29 February 2012 22:02, Jens Alfke <jens@...> wrote:
>
> On Feb 28, 2012, at 11:49 PM, Kimberlad wrote:
>
> So I'm none the wiser as to how I scale a Couchdb backup
> other than I can possibly use something like rdiff
> to do incrementals?
>
> I’m not familiar with rdiff, but rsync would be a good approach, maybe combined with an SCM (e.g. rsync
into a git repository and then commit the changes.)
>
> CouchDB files are pretty delta-friendly because they’re append-only, so most of the time nothing will
change in the file except for new data added at the end. When a database is compacted the file is completely
rewritten, though.
>
> —Jens


Gmane