Liu Bo | 1 May 18:27 2013
Picon

[RFC PATCH v3 0/2] Online data deduplication

NOTE: This leads to a FORMAT CHANGE, DO NOT use it on real data!

Data deduplication is a specialized data compression technique for eliminating
duplicate copies of repeating data.[1]

This patch set is also related to "Content based storage" in project ideas[2].

PATCH 1 is a hang fix when deduplication is on, but it's also useful with no
deduplication in practice use.

For more implementation details, please refer to PATCH 2.

TODO:
* a bit-to-bit comparison callback.

All comments are welcome!

[1]: http://en.wikipedia.org/wiki/Data_deduplication
[2]: https://btrfs.wiki.kernel.org/index.php/Project_ideas#Content_based_storage

v3:
  * add COMPRESS support
  * add a real ioctl to enable dedup feature
  * change the maximum allowed dedup blocksize to 128k because of compressed
    range limit
v2:
  * To avoid enlarging the file extent item's size, add another index key used
    for freeing dedup extent.
  * Freeing dedup extent is now like how we delete checksum.
  * Add support for alternative deduplicatin blocksize larger than PAGESIZE.
(Continue reading)

Liu Bo | 1 May 18:27 2013
Picon

[PATCH v3 1/2] Btrfs: skip merge part for delayed data refs

When we have data deduplication on, we'll hang on the merge part
because it needs to verify every queued delayed data refs related to
this disk offset.

And in the case of delayed data refs, we don't usually have too much
data refs to merge.

So it's safe to shut it down for data refs.

Signed-off-by: Liu Bo <bo.li.liu <at> oracle.com>
---
 fs/btrfs/delayed-ref.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index b7a0641..34670c8 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
 <at>  <at>  -316,6 +316,13  <at>  <at>  void btrfs_merge_delayed_refs(struct btrfs_trans_handle *trans,
 	struct rb_node *node;
 	u64 seq = 0;

+	/*
+	 * We don't have too much refs to merge in the case of delayed data
+	 * refs.
+	 */
+	if (head->is_data)
+		return;
+
 	spin_lock(&fs_info->tree_mod_seq_lock);
(Continue reading)

Liu Bo | 1 May 18:27 2013
Picon

[PATCH v3 2/2] Btrfs: online data deduplication

(NOTE: This leads to a FORMAT CHANGE, DO NOT use it on real data.)

This introduce the online data deduplication feature for btrfs.

(1) WHY do we need deduplication?
    To improve our storage effiency.

(2) WHAT is deduplication?
    Two key ways for practical deduplication implementations,
    *  When the data is deduplicated
       (inband vs background)
    *  The granularity of the deduplication.
       (block level vs file level)

    For btrfs, we choose
    *  inband(synchronous)
    *  block level

    We choose them because of the same reason as how zfs does.
    a)  To get an immediate benefit.
    b)  To remove redundant parts within a file.

    So we have an inband, block level data deduplication here.

(3) HOW does deduplication works?
    This makes full use of file extent back reference, the same way as
    IOCTL_CLONE, which lets us easily store multiple copies of a set of
    data as a single copy along with an index of references to the copy.

    Here we have
(Continue reading)

Josef Bacik | 1 May 19:30 2013

Re: [PATCH v3 2/2] Btrfs: online data deduplication

On Wed, May 01, 2013 at 10:27:38AM -0600, Liu Bo wrote:
> (NOTE: This leads to a FORMAT CHANGE, DO NOT use it on real data.)
> 
> This introduce the online data deduplication feature for btrfs.
> 
> (1) WHY do we need deduplication?
>     To improve our storage effiency.
> 
> (2) WHAT is deduplication?
>     Two key ways for practical deduplication implementations,
>     *  When the data is deduplicated
>        (inband vs background)
>     *  The granularity of the deduplication.
>        (block level vs file level)
> 
>     For btrfs, we choose
>     *  inband(synchronous)
>     *  block level
> 
>     We choose them because of the same reason as how zfs does.
>     a)  To get an immediate benefit.
>     b)  To remove redundant parts within a file.
> 
>     So we have an inband, block level data deduplication here.
> 
> (3) HOW does deduplication works?
>     This makes full use of file extent back reference, the same way as
>     IOCTL_CLONE, which lets us easily store multiple copies of a set of
>     data as a single copy along with an index of references to the copy.
> 
(Continue reading)

Gabriel de Perthuis | 1 May 20:07 2013
Picon

Re: [PATCH v3 2/2] Btrfs: online data deduplication

>  #define BTRFS_IOC_DEV_REPLACE _IOWR(BTRFS_IOCTL_MAGIC, 53, \
>  				    struct btrfs_ioctl_dev_replace_args)
> +#define BTRFS_IOC_DEDUP_REGISTER	_IO(BTRFS_IOCTL_MAGIC, 54)

This number has already been used by the offline dedup patches.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo | 1 May 18:27 2013
Picon

[PATCH] Btrfs-progs: add dedup register

This aims to add 'btrfs filesystem dedup-register', and it can be used
to enable dedup on a filesystem.

Signed-off-by: Liu Bo <bo.li.liu <at> oracle.com>
---
 cmds-filesystem.c |   36 ++++++++++++++++++++++++++++++++++++
 ioctl.h           |    1 +
 2 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 3f386e2..aa21cf6 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
 <at>  <at>  -515,6 +515,41  <at>  <at>  static int cmd_label(int argc, char **argv)
 		return get_label(argv[1]);
 }

+static const char * const cmd_dedup_usage[] = {
+	"btrfs filesystem dedup-register <path>",
+	"Register a dedup tree",
+	NULL
+};
+
+static int cmd_dedup(int argc, char **argv)
+{
+	int 	fd, res, e;
+	char	*path;
+
+	if (check_argc_exact(argc, 2))
+		usage(cmd_dedup_usage);
(Continue reading)

David Sterba | 13 May 17:55 2013
Picon

Re: [PATCH] Btrfs-progs: add dedup register

On Thu, May 02, 2013 at 12:27:39AM +0800, Liu Bo wrote:
> +static int cmd_dedup(int argc, char **argv)
> +{
> +	int 	fd, res, e;
> +	char	*path;
> +
> +	if (check_argc_exact(argc, 2))
> +		usage(cmd_dedup_usage);
> +
> +	path = argv[1];
> +
> +	fd = open_file_or_dir(path);
> +	if (fd < 0) {
> +		fprintf(stderr, "ERROR: can't access to '%s'\n", path);
> +		return 12;

Please do not introduce the strange return values in new code.

> +	}
> +
> +	printf("register dedup on '%s'\n", path);
> +	res = ioctl(fd, BTRFS_IOC_DEDUP_REGISTER);
> +	e = errno;
> +	close(fd);
> +	if( res < 0 ){
> +		fprintf(stderr, "ERROR: unable to register dedup '%s' - %s\n", 
> +			path, strerror(e));
> +		return 32;

dtto
(Continue reading)

Liu Bo | 14 May 02:29 2013
Picon

Re: [PATCH] Btrfs-progs: add dedup register

On Mon, May 13, 2013 at 05:55:10PM +0200, David Sterba wrote:
> On Thu, May 02, 2013 at 12:27:39AM +0800, Liu Bo wrote:
> > +static int cmd_dedup(int argc, char **argv)
> > +{
> > +	int 	fd, res, e;
> > +	char	*path;
> > +
> > +	if (check_argc_exact(argc, 2))
> > +		usage(cmd_dedup_usage);
> > +
> > +	path = argv[1];
> > +
> > +	fd = open_file_or_dir(path);
> > +	if (fd < 0) {
> > +		fprintf(stderr, "ERROR: can't access to '%s'\n", path);
> > +		return 12;
> 
> Please do not introduce the strange return values in new code.

Okay, actually I was hesitating if it's right at that time..

> 
> > +	}
> > +
> > +	printf("register dedup on '%s'\n", path);
> > +	res = ioctl(fd, BTRFS_IOC_DEDUP_REGISTER);
> > +	e = errno;
> > +	close(fd);
> > +	if( res < 0 ){
> > +		fprintf(stderr, "ERROR: unable to register dedup '%s' - %s\n", 
(Continue reading)

Josef Bacik | 1 May 19:37 2013

Re: [RFC PATCH v3 0/2] Online data deduplication

On Wed, May 01, 2013 at 10:27:36AM -0600, Liu Bo wrote:
> NOTE: This leads to a FORMAT CHANGE, DO NOT use it on real data!
> 
> Data deduplication is a specialized data compression technique for eliminating
> duplicate copies of repeating data.[1]
> 
> This patch set is also related to "Content based storage" in project ideas[2].
> 
> PATCH 1 is a hang fix when deduplication is on, but it's also useful with no
> deduplication in practice use.
> 
> For more implementation details, please refer to PATCH 2.
> 
> TODO:
> * a bit-to-bit comparison callback.
> 
> All comments are welcome!
> 
> [1]: http://en.wikipedia.org/wiki/Data_deduplication
> [2]: https://btrfs.wiki.kernel.org/index.php/Project_ideas#Content_based_storage
> 
> 
> v3:
>   * add COMPRESS support
>   * add a real ioctl to enable dedup feature
>   * change the maximum allowed dedup blocksize to 128k because of compressed
>     range limit
> v2:
>   * To avoid enlarging the file extent item's size, add another index key used
>     for freeing dedup extent.
(Continue reading)

Liu Bo | 3 May 09:54 2013
Picon

Re: [RFC PATCH v3 0/2] Online data deduplication

> You didn't use an INCOPMAT option for this so you need to deal with a user
> mounting the file system with an older kernel or even forgetting to use mount -o
> dedup.  Otherwise your dedup tree will become out of date and you could corrupt
> peoples data.  So if you aren't going to use an INCOMPAT flag you need to at
> least use a COMPAT flag so we know the option has been used at all and then you
> need to have a mechanism to know if you need to invalidate the hash tree.
> 
> Users are also going to make the mistake of thinking dedup will make their
> workload awesome, and when it doesn't they need a way to turn it off.  If you do
> an INCOMPAT option then you need to have a way to delete the hash tree and unset
> the INCOMPAT flag.  If you do the COMPAT route then you get this for free since
> the user just needs to stop using -o dedup, but you'll probably also want to
> provide a mechanism to delete the tree to free up space.  Thanks,
> 
> Josef

I made a few mistakes on this, yeah I should also provide a dedup disable way
and I'm going to use INCOMPAT.

But forgetting to use mount -o dedup will not get dedup tree to be out of date,
because dedup tree is loaded if we have it, no matter whether using 'mount -o
dedup'.

Thanks for the nice reminder, Josef :)

thanks,
liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo <at> vger.kernel.org
(Continue reading)


Gmane