Mickaël CANÉVET | 24 Jan 10:05 2012
Picon

zfs send recv without uncompressing data stream

Hi,

Unless I misunderstood something, zfs send of a volume that has
compression activated, uncompress it. So if I do a zfs send|zfs receive
from a compressed volume to a compressed volume, my data are
uncompressed and compressed again. Right ?

Is there a more effective way to do it (without decompression and
recompression) ?

Cheers,
Mickaël
_______________________________________________
zfs-discuss mailing list
zfs-discuss <at> opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Jim Klimov | 24 Jan 16:52 2012
Picon

Re: zfs send recv without uncompressing data stream

2012-01-24 13:05, Mickaël CANÉVET wrote:
> Hi,
>
> Unless I misunderstood something, zfs send of a volume that has
> compression activated, uncompress it. So if I do a zfs send|zfs receive
> from a compressed volume to a compressed volume, my data are
> uncompressed and compressed again. Right ?
>
> Is there a more effective way to do it (without decompression and
> recompression) ?

While I can not confirm or deny this statement, it was my
impression as well. Rationale being that the two systems
might demand different compression (i.e. "lzjb" or "none"
on the original system and "gzip-9" on the backup one).
Just like you probably have different VDEV layouts, etc.
Or perhaps even different encryption or dedup settings.

Compression, like many other components, lives on the
layer "under" logical storage (userdata blocks), and
gets applied to newly written blocks only (i.e. your
datasets can have a mix of different compression levels
for different files or even blocks within a file, if
you switched the methods during dataset lifetime).

Actually I would not be surprised if zfs-send userdata
stream is even above the block level (i.e. it would seem
normal to me if many small userdata blocks of original
pool might become one big block on the recipient).

(Continue reading)

Richard Elling | 24 Jan 18:53 2012
Picon

Re: zfs send recv without uncompressing data stream

On Jan 24, 2012, at 7:52 AM, Jim Klimov wrote:
2012-01-24 13:05, Mickaël CANÉVET wrote:
Hi,

Unless I misunderstood something, zfs send of a volume that has
compression activated, uncompress it. So if I do a zfs send|zfs receive
from a compressed volume to a compressed volume, my data are
uncompressed and compressed again. Right ?

correct


Is there a more effective way to do it (without decompression and
recompression) ?


While I can not confirm or deny this statement, it was my
impression as well. Rationale being that the two systems
might demand different compression (i.e. "lzjb" or "none"
on the original system and "gzip-9" on the backup one).
Just like you probably have different VDEV layouts, etc.
Or perhaps even different encryption or dedup settings.

that "feature" falls out of the implementation.


Compression, like many other components, lives on the
layer "under" logical storage (userdata blocks), and
gets applied to newly written blocks only (i.e. your
datasets can have a mix of different compression levels
for different files or even blocks within a file, if
you switched the methods during dataset lifetime).

Actually I would not be surprised if zfs-send userdata
stream is even above the block level (i.e. it would seem
normal to me if many small userdata blocks of original
pool might become one big block on the recipient).

So while some optimizations are possible, I think they
would violate layering quite much.

data in the ARC is uncompressed. compression/decompression 
occurs in the ZIO pipeline layer below the DSL.


But, for example, it might make sense for zfs-send to
include the original compression algorithm information
into the sent stream and send the compressed data (less
network traffic or intermediate storage requirement,
to say the least - at zero price of recompression to
something perhaps more efficient), and if the recipient
dataset's algorithm differs - unpack and recompress it
on the receiving side.

If that's not done already :)

the compression parameter value is sent, but as you mentioned
above, blocks in a snapshot can be compressed with different
algorithms, so you only actually get the last setting at time of
snapshot.


So far my over-the-net zfs sends are piped into gzip
or pigz, ssh and gunzip, and that often speeds up the
overall transfer. Probably can be done with less overhead
by "ssh -C" for implementations that have it.

the UNIX philosophy is in play here :-) Sending the data uncompressed
to stdout allows you to pipe it into various transport or transform programs.
 -- richard

--
ZFS Performance and Training
+1-760-896-4422



_______________________________________________
zfs-discuss mailing list
zfs-discuss <at> opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Jim Klimov | 24 Jan 19:37 2012
Picon

Re: zfs send recv without uncompressing data stream

2012-01-24 19:52, Jim Klimov wrote:
> 2012-01-24 13:05, Mickaël CANÉVET wrote:
>> Hi,
>>
>> Unless I misunderstood something, zfs send of a volume that has
>> compression activated, uncompress it. So if I do a zfs send|zfs receive
>> from a compressed volume to a compressed volume, my data are
>> uncompressed and compressed again. Right ?
>>
>> Is there a more effective way to do it (without decompression and
>> recompression) ?
>
>
> Rationale being that the two systems
> might demand different compression (i.e. "lzjb" or "none"
> on the original system and "gzip-9" on the backup one).

One more rationale - compatibility, including future-proof
somewhat (the zfs-send format explicitly does not guarantee
that it won't change incompatibly). I mean stransfer of data
between systems that do not implement the same set of
compression algoritms in ZFS.

Say, as a developer I find a way to use bzip2 or 7zip to
compress my local system's blocks (just like gzip appeared
recently, after there were only lzjb and none). If I zfs-send
the compressed blocks as they are, another system won't be
able to interpret them unless it supports the same algorithm
and format. And since zfs-send can be used via files (i.e.
distribution media with flar-like archives), there is no
way of dialog between zfs-sender and zfs-recipient to agree
on a common format, beside using a fixed predefined one -
uncompressed.

Using external programs to wrap that in the Unix way gets
out of ZFS's scope and can be arranged by other software
on the OSes.

HTH,
//Jim
David Magda | 24 Jan 20:16 2012
Picon

Re: zfs send recv without uncompressing data stream

On Tue, January 24, 2012 13:37, Jim Klimov wrote:

> One more rationale - compatibility, including future-proof
> somewhat (the zfs-send format explicitly does not guarantee
> that it won't change incompatibly). I mean stransfer of data
> between systems that do not implement the same set of
> compression algoritms in ZFS.

The format of 'zfs send' has now been committed:

> The format of the stream is committed. You will be able to receive your'
> streams on future versions of ZFS.

    http://docs.oracle.com/cd/E19253-01/816-5166/zfs-1m/index.html

This was fixed in some update of Solaris 10, though I can't find the exact
one.

    http://hub.opensolaris.org/bin/view/Community+Group+on/2008042301
Edward Ned Harvey | 25 Jan 15:05 2012

Re: zfs send recv without uncompressing data stream

> From: zfs-discuss-bounces <at> opensolaris.org [mailto:zfs-discuss-
> bounces <at> opensolaris.org] On Behalf Of Mickaël CANÉVET
> 
> Unless I misunderstood something, zfs send of a volume that has
> compression activated, uncompress it. So if I do a zfs send|zfs receive
> from a compressed volume to a compressed volume, my data are
> uncompressed and compressed again. Right ?
> 
> Is there a more effective way to do it (without decompression and
> recompression) ?

Better yet, zfs send is decompressing, and then you're probably piping to gzip or lzop or something, and
piping to ssh, so it got re-compressed and encrypted.  Then at the receiving end, it gets decrypted,
decompressed, and recompressed.   ;-)

While there are lots of reasons behind this, I think you'll find usually it doesn't matter.  Only if you have
really fast disks, or underpowered processor, or super-duper massive compression (like gzip, or
worse... gzip-9) then it matters.  Default compression is very fast and lightweight.

_______________________________________________
zfs-discuss mailing list
zfs-discuss <at> opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Gmane