Markus Silpala | 12 Dec 18:19 2011
Picon

How to change existing data from one back-end to another

Greetings again.


Another likely-quick question after my weekend of doco-diving: is there any trick or are there any steps needed to change an existing cluster from one back-end to another? Judging by the wiki page for the Multi back-end it would appear that one just changes the app.config or PUTs a change to bucket properties, bounces the node, and riak converts all existing data automatically. Is it really that simple?

Are there any considerations around rolling updates vs taking downtime to update the whole cluster? Any concern about large data sets during the conversion? Is the node available during the conversion? Any gotchas around when 2i actually becomes usable in the updated bucket?

Okay—only the first question was likely to be quick. :-)

For us it's really unfortunate that we can't use 2i through a Multi back-end. We don't use either today. We may want to use Multi to cause certain buckets to expire their data sooner than the rest; but we very likely want to adopt 2i for certain cases where buckets are related and other cases where a simple equality- or range-based search will be needed. Choosing between the two is a real bummer.

Is 2i support through Multi in the plan for future releases?

Thanks again,

-Markus
_______________________________________________
riak-users mailing list
riak-users <at> lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Ryan Zezeski | 12 Dec 23:29 2011

Re: How to change existing data from one back-end to another

Markus,


Riak will _not_ automatically convert the data for you.  If you change the backend and bounce the node that node will no longer see the data and it has no notion of converting data from another backend.  Remember, you are simply changing a value in the config file.  Riak is not aware that you were once using another backend and that your intention is to migrate the data.  Riak does not pretend to be smart in that regard and that's probably a good thing as some people may change backends with the intention of _not_ migrating data. What would happen is that the one node would be using a new backend with no data while the other nodes would still be using the old backend with it's current data.  As data is read Riak would notice the missing data on the new backend and performed read repair.  You could take advantage of this and "migrate" data by performing a streaming list keys + GET to read repair all data.  However, down that road lies madness.

I see two ways to go about this:

1) Do a rolling backup/stop/change config/start/restore.

2) Join new nodes using new backend, let claim and handoffs move data over, then leave nodes using old backend.

I like the first method because it doesn't require join/leave which can cause massive partition shifts with the current default claim algorithm.  The second method is easier to execute but requires at least one additional machine and use of the new claim algorithm.

There is an outstanding pull request for 2i/multi back-end support [1].

-Ryan

[1]: https://github.com/basho/riak_kv/pull/258

On Mon, Dec 12, 2011 at 12:19 PM, Markus Silpala <msilpala <at> gmail.com> wrote:
Greetings again.

Another likely-quick question after my weekend of doco-diving: is there any trick or are there any steps needed to change an existing cluster from one back-end to another? Judging by the wiki page for the Multi back-end it would appear that one just changes the app.config or PUTs a change to bucket properties, bounces the node, and riak converts all existing data automatically. Is it really that simple? 

Are there any considerations around rolling updates vs taking downtime to update the whole cluster? Any concern about large data sets during the conversion? Is the node available during the conversion? Any gotchas around when 2i actually becomes usable in the updated bucket?

Okay—only the first question was likely to be quick. :-)

For us it's really unfortunate that we can't use 2i through a Multi back-end. We don't use either today. We may want to use Multi to cause certain buckets to expire their data sooner than the rest; but we very likely want to adopt 2i for certain cases where buckets are related and other cases where a simple equality- or range-based search will be needed. Choosing between the two is a real bummer.

Is 2i support through Multi in the plan for future releases?

Thanks again,

-Markus

_______________________________________________
riak-users mailing list
riak-users <at> lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users <at> lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Markus Silpala | 20 Dec 02:59 2011
Picon

Re: How to change existing data from one back-end to another

Ryan,


That's actually quite a relief that Riak doesn't automagically convert; that would be been a frightening volume of work to keep so well hidden!

For that Option #1 when you describe doing a "restore," are you referring to the "riak-admin backup" and "riak-admin restore"? The reason I ask is that we've been avoiding the use of riak-admin backup because (IIRC) it didn't properly back up search index files (or something along those lines). Has that issue been resolved now to where we can and should rely on riak-admin to do our backups even if we use riak-search?

(Or did I just pick up the wrong pipe this morning before remembering that issue with the old riaksearch-admin?)

Thanks again,

-Markus

On Mon, Dec 12, 2011 at 4:29 PM, Ryan Zezeski <rzezeski <at> basho.com> wrote:
Markus,

Riak will _not_ automatically convert the data for you.  If you change the backend and bounce the node that node will no longer see the data and it has no notion of converting data from another backend.  Remember, you are simply changing a value in the config file.  Riak is not aware that you were once using another backend and that your intention is to migrate the data.  Riak does not pretend to be smart in that regard and that's probably a good thing as some people may change backends with the intention of _not_ migrating data. What would happen is that the one node would be using a new backend with no data while the other nodes would still be using the old backend with it's current data.  As data is read Riak would notice the missing data on the new backend and performed read repair.  You could take advantage of this and "migrate" data by performing a streaming list keys + GET to read repair all data.  However, down that road lies madness.

I see two ways to go about this:

1) Do a rolling backup/stop/change config/start/restore.

2) Join new nodes using new backend, let claim and handoffs move data over, then leave nodes using old backend.

I like the first method because it doesn't require join/leave which can cause massive partition shifts with the current default claim algorithm.  The second method is easier to execute but requires at least one additional machine and use of the new claim algorithm.

There is an outstanding pull request for 2i/multi back-end support [1].

-Ryan

[1]: https://github.com/basho/riak_kv/pull/258

On Mon, Dec 12, 2011 at 12:19 PM, Markus Silpala <msilpala <at> gmail.com> wrote:
Greetings again.

Another likely-quick question after my weekend of doco-diving: is there any trick or are there any steps needed to change an existing cluster from one back-end to another? Judging by the wiki page for the Multi back-end it would appear that one just changes the app.config or PUTs a change to bucket properties, bounces the node, and riak converts all existing data automatically. Is it really that simple? 

Are there any considerations around rolling updates vs taking downtime to update the whole cluster? Any concern about large data sets during the conversion? Is the node available during the conversion? Any gotchas around when 2i actually becomes usable in the updated bucket?

Okay—only the first question was likely to be quick. :-)

For us it's really unfortunate that we can't use 2i through a Multi back-end. We don't use either today. We may want to use Multi to cause certain buckets to expire their data sooner than the rest; but we very likely want to adopt 2i for certain cases where buckets are related and other cases where a simple equality- or range-based search will be needed. Choosing between the two is a real bummer.

Is 2i support through Multi in the plan for future releases?

Thanks again,

-Markus

_______________________________________________
riak-users mailing list
riak-users <at> lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



_______________________________________________
riak-users mailing list
riak-users <at> lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Ryan Zezeski | 23 Dec 03:34 2011

Re: How to change existing data from one back-end to another


Markus,

You're correct in that backup/restore currently doesn't work with search.  You could still use that method but would have to reindex everything which would require list keys.  In our next minor version method #2 will get better but it will require you to upgrade.  Another method is to write some code to directly convert one backend's data to another.  Then you would take each node down one at a time, run custom code to convert, switch backends in config file, then restart node.

-Ryan

On Mon, Dec 19, 2011 at 8:59 PM, Markus Silpala <msilpala <at> gmail.com> wrote:
Ryan,

That's actually quite a relief that Riak doesn't automagically convert; that would be been a frightening volume of work to keep so well hidden!

For that Option #1 when you describe doing a "restore," are you referring to the "riak-admin backup" and "riak-admin restore"? The reason I ask is that we've been avoiding the use of riak-admin backup because (IIRC) it didn't properly back up search index files (or something along those lines). Has that issue been resolved now to where we can and should rely on riak-admin to do our backups even if we use riak-search?

(Or did I just pick up the wrong pipe this morning before remembering that issue with the old riaksearch-admin?)

Thanks again,

-Markus

On Mon, Dec 12, 2011 at 4:29 PM, Ryan Zezeski <rzezeski <at> basho.com> wrote:
Markus,

Riak will _not_ automatically convert the data for you.  If you change the backend and bounce the node that node will no longer see the data and it has no notion of converting data from another backend.  Remember, you are simply changing a value in the config file.  Riak is not aware that you were once using another backend and that your intention is to migrate the data.  Riak does not pretend to be smart in that regard and that's probably a good thing as some people may change backends with the intention of _not_ migrating data. What would happen is that the one node would be using a new backend with no data while the other nodes would still be using the old backend with it's current data.  As data is read Riak would notice the missing data on the new backend and performed read repair.  You could take advantage of this and "migrate" data by performing a streaming list keys + GET to read repair all data.  However, down that road lies madness.

I see two ways to go about this:

1) Do a rolling backup/stop/change config/start/restore.

2) Join new nodes using new backend, let claim and handoffs move data over, then leave nodes using old backend.

I like the first method because it doesn't require join/leave which can cause massive partition shifts with the current default claim algorithm.  The second method is easier to execute but requires at least one additional machine and use of the new claim algorithm.

There is an outstanding pull request for 2i/multi back-end support [1].

-Ryan

[1]: https://github.com/basho/riak_kv/pull/258

On Mon, Dec 12, 2011 at 12:19 PM, Markus Silpala <msilpala <at> gmail.com> wrote:
Greetings again.

Another likely-quick question after my weekend of doco-diving: is there any trick or are there any steps needed to change an existing cluster from one back-end to another? Judging by the wiki page for the Multi back-end it would appear that one just changes the app.config or PUTs a change to bucket properties, bounces the node, and riak converts all existing data automatically. Is it really that simple? 

Are there any considerations around rolling updates vs taking downtime to update the whole cluster? Any concern about large data sets during the conversion? Is the node available during the conversion? Any gotchas around when 2i actually becomes usable in the updated bucket?

Okay—only the first question was likely to be quick. :-)

For us it's really unfortunate that we can't use 2i through a Multi back-end. We don't use either today. We may want to use Multi to cause certain buckets to expire their data sooner than the rest; but we very likely want to adopt 2i for certain cases where buckets are related and other cases where a simple equality- or range-based search will be needed. Choosing between the two is a real bummer.

Is 2i support through Multi in the plan for future releases?

Thanks again,

-Markus

_______________________________________________
riak-users mailing list
riak-users <at> lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




_______________________________________________
riak-users mailing list
riak-users <at> lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Gmane