Conan Cook | 8 May 18:51 2012

Keyspace lost after restart

Hi Cassandra Folk,


We've experienced a problem a couple of times where Cassandra nodes lose a keyspace after a restart.  We've restarted 2 out of 3 nodes, and they have both experienced this problem; clearly we're doing something wrong, but don't know what.  The data files are all still there, as before, but the node can't see the keyspace (we only have one).  Tthe nodetool still says that each one is responsible for 33% of the keys, but the disk usage has dropped to a tiny amount on the nodes that we've restarted.  I saw this:


Seems to be exactly our problem, but we have not modified the cassandra.yaml - we have overwritten it through an automated process, and that happened just before restarting, but the contents did not change.

Any ideas as to what might cause this, or how the keyspace can be restored (like I say, the data is all still in the data directory).

We're running in AWS.

Thanks,


Conan
Conan Cook | 9 May 15:04 2012

Re: Keyspace lost after restart

Sorry, forgot to mention we're running Cassandra 1.1.


Conan

On 8 May 2012 17:51, Conan Cook <conan.cook <at> amee.com> wrote:
Hi Cassandra Folk,

We've experienced a problem a couple of times where Cassandra nodes lose a keyspace after a restart.  We've restarted 2 out of 3 nodes, and they have both experienced this problem; clearly we're doing something wrong, but don't know what.  The data files are all still there, as before, but the node can't see the keyspace (we only have one).  Tthe nodetool still says that each one is responsible for 33% of the keys, but the disk usage has dropped to a tiny amount on the nodes that we've restarted.  I saw this:


Seems to be exactly our problem, but we have not modified the cassandra.yaml - we have overwritten it through an automated process, and that happened just before restarting, but the contents did not change.

Any ideas as to what might cause this, or how the keyspace can be restored (like I say, the data is all still in the data directory).

We're running in AWS.

Thanks,


Conan

aaron morton | 10 May 12:43 2012

Re: Keyspace lost after restart

Was this a schema that was created prior to 1.1 ?

What process are you using to create the schema ? 

Can you share the logs from system startup ? Up until it logs "Listening for thrift clients". (if they are long please link to them)

Cheers

-----------------
Aaron Morton
Freelance Developer
<at> aaronmorton

On 10/05/2012, at 1:04 AM, Conan Cook wrote:

Sorry, forgot to mention we're running Cassandra 1.1.

Conan

On 8 May 2012 17:51, Conan Cook <conan.cook <at> amee.com> wrote:
Hi Cassandra Folk,

We've experienced a problem a couple of times where Cassandra nodes lose a keyspace after a restart.  We've restarted 2 out of 3 nodes, and they have both experienced this problem; clearly we're doing something wrong, but don't know what.  The data files are all still there, as before, but the node can't see the keyspace (we only have one).  Tthe nodetool still says that each one is responsible for 33% of the keys, but the disk usage has dropped to a tiny amount on the nodes that we've restarted.  I saw this:


Seems to be exactly our problem, but we have not modified the cassandra.yaml - we have overwritten it through an automated process, and that happened just before restarting, but the contents did not change.

Any ideas as to what might cause this, or how the keyspace can be restored (like I say, the data is all still in the data directory).

We're running in AWS.

Thanks,


Conan


Conan Cook | 10 May 18:15 2012

Re: Keyspace lost after restart

Hi Aaron,


Thanks for getting back to me!  Yes, I believe our keyspace was created prior to 1.1, and I think I also understand why you're asking that, having found this:


Here's our startup log:


There isn't much in there of interest however.  It may well be the case that we created our keyspace, dropped it, then created it again.  The dev responsible for setting it up is ill today, but I'll get back to you tomorrow with exact details of how it was originally created and whether we did definitely drop and re-create it.

Ta,

Conan


On 10 May 2012 11:43, aaron morton <aaron <at> thelastpickle.com> wrote:
Was this a schema that was created prior to 1.1 ?

What process are you using to create the schema ? 

Can you share the logs from system startup ? Up until it logs "Listening for thrift clients". (if they are long please link to them)

Cheers

-----------------
Aaron Morton
Freelance Developer
<at> aaronmorton

On 10/05/2012, at 1:04 AM, Conan Cook wrote:

Sorry, forgot to mention we're running Cassandra 1.1.

Conan

On 8 May 2012 17:51, Conan Cook <conan.cook <at> amee.com> wrote:
Hi Cassandra Folk,

We've experienced a problem a couple of times where Cassandra nodes lose a keyspace after a restart.  We've restarted 2 out of 3 nodes, and they have both experienced this problem; clearly we're doing something wrong, but don't know what.  The data files are all still there, as before, but the node can't see the keyspace (we only have one).  Tthe nodetool still says that each one is responsible for 33% of the keys, but the disk usage has dropped to a tiny amount on the nodes that we've restarted.  I saw this:


Seems to be exactly our problem, but we have not modified the cassandra.yaml - we have overwritten it through an automated process, and that happened just before restarting, but the contents did not change.

Any ideas as to what might cause this, or how the keyspace can be restored (like I say, the data is all still in the data directory).

We're running in AWS.

Thanks,


Conan



Conan Cook | 11 May 10:40 2012

Re: Keyspace lost after restart

Hi,


OK we're pretty sure we dropped and re-created the keyspace before restarting the Cassandra nodes during some testing (we've been migrating to a new cluster).  The keyspace was created via the cli:

create keyspace m7
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {us-east: 3}
  and durable_writes = true;

I'm pretty confident that it's a result of the issue I spotted before:


Does anyone know whether this also affected versions before 1.1.0?  If not then we can just roll back until there's a fix; we're not using our cluster in production so we can afford to just bin it all and load it again.  +1 for this being a major issue though, the fact that you can't see it until you restart a node makes it quite dangerous, and that node is lost when it occurs (I also haven't been able to restore the schema in any way).

Thanks very much,


Conan



On 10 May 2012 17:15, Conan Cook <conan.cook <at> amee.com> wrote:
Hi Aaron,

Thanks for getting back to me!  Yes, I believe our keyspace was created prior to 1.1, and I think I also understand why you're asking that, having found this:


Here's our startup log:


There isn't much in there of interest however.  It may well be the case that we created our keyspace, dropped it, then created it again.  The dev responsible for setting it up is ill today, but I'll get back to you tomorrow with exact details of how it was originally created and whether we did definitely drop and re-create it.

Ta,

Conan


On 10 May 2012 11:43, aaron morton <aaron <at> thelastpickle.com> wrote:
Was this a schema that was created prior to 1.1 ?

What process are you using to create the schema ? 

Can you share the logs from system startup ? Up until it logs "Listening for thrift clients". (if they are long please link to them)

Cheers

-----------------
Aaron Morton
Freelance Developer
<at> aaronmorton

On 10/05/2012, at 1:04 AM, Conan Cook wrote:

Sorry, forgot to mention we're running Cassandra 1.1.

Conan

On 8 May 2012 17:51, Conan Cook <conan.cook <at> amee.com> wrote:
Hi Cassandra Folk,

We've experienced a problem a couple of times where Cassandra nodes lose a keyspace after a restart.  We've restarted 2 out of 3 nodes, and they have both experienced this problem; clearly we're doing something wrong, but don't know what.  The data files are all still there, as before, but the node can't see the keyspace (we only have one).  Tthe nodetool still says that each one is responsible for 33% of the keys, but the disk usage has dropped to a tiny amount on the nodes that we've restarted.  I saw this:


Seems to be exactly our problem, but we have not modified the cassandra.yaml - we have overwritten it through an automated process, and that happened just before restarting, but the contents did not change.

Any ideas as to what might cause this, or how the keyspace can be restored (like I say, the data is all still in the data directory).

We're running in AWS.

Thanks,


Conan




Jeff Williams | 11 May 11:18 2012

Re: Keyspace lost after restart

Conan,

Good to see I'm not alone in this! I just set up a fresh test cluster. I first did a fresh install of 1.1.0 and was able to replicate the issue. I then did a fresh install using 1.0.10 and didn't see the issue. So it looks like rolling back to 1.0.10 could be the answer for now.

Jeff

On May 11, 2012, at 10:40 AM, Conan Cook wrote:

Hi,

OK we're pretty sure we dropped and re-created the keyspace before restarting the Cassandra nodes during some testing (we've been migrating to a new cluster).  The keyspace wa s created via the cli:

create keyspace m7
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {us-east: 3}
  and durable_writes = true;

I'm pretty confident that it's a result of the issue I spotted before:


Does anyone know whether this also affected versions before 1.1.0?  If not then we can just roll back until there's a fix; we're not using our cluster in production so we can afford to just bin it all and load it again.  +1 for this being a major issue though, the fact that you can't see it until you restart a node makes it quite dangerous, and that node is lost when it occurs (I also haven't been able to restore the schema in any way).

Thanks very much,


Conan



On 10 May 2012 17:15, Conan Cook <conan.cook <at> amee.com> wrote:
Hi Aaron,

Thanks for getting back to me!  Yes, I believe our keyspace was created prior to 1.1, and I think I also understand why you're asking that, having found this:


Here's our startup log:


There isn't much in there of interest however.  It may well be the case that we created our keyspace, dropped it, then created it again.  The dev responsible for setting it up is ill today, but I'll get back to you tomorrow with exact details of how it was originally created and whether we did definitely drop and re-create it.

Ta,

Conan


On 10 May 2012 11:43, aaron morton <aaron <at> thelastpickle.com> wrote:
Was this a schema that was created prior to 1.1 ?

What process are you using to create the schema ? 

Can you share the logs from system startup ? Up until it logs "Listening for thrift clients". (if they are long please link to them)

Cheers

-----------------
Aaron Morton
Freelance Developer
<at> aaronmorton

On 10/05/2012, at 1:04 AM, Conan Cook wrote:

Sorry, forgot to mention we're running Cassandra 1.1.

Conan

On 8 May 2012 17:51, Conan Cook <conan.cook <at> amee.com> wrote:
Hi Cassandra Folk,

We've experienced a problem a couple of times where Cassandra nodes lose a keyspace after a restart.  We've restarted 2 out of 3 nodes, and they have both experienced this problem; clearly we're doing something wrong, but don't know what.  The data files are all still there, as before, but the node can't see the keyspace (we only have one).  Tthe nodetool still says that each one is responsible for 33% of the keys, but the disk usage has dropped to a tiny amount on the nodes that we've restarted.  I saw this:


Seems to be exactly our problem, but we have not modified the cassandra.yaml - we have overwritten it through an automated process, and that happened just before restarting, but the contents did not change.

Any ideas as to what might cause this, or how the keyspace can be restored (like I say, the data is all still in the data directory).

We're running in AWS.

Thanks,


Conan





Conan Cook | 11 May 11:25 2012

Re: Keyspace lost after restart

Hi Jeff,


Great!  We'll roll back for now, thanks for letting me know.

Conan

On 11 May 2012 10:18, Jeff Williams <jeffw <at> wherethebitsroam.com> wrote:
Conan,

Good to see I'm not alone in this! I just set up a fresh test cluster. I first did a fresh install of 1.1.0 and was able to replicate the issue. I then did a fresh install using 1.0.10 and didn't see the issue. So it looks like rolling back to 1.0.10 could be the answer for now.

Jeff

On May 11, 2012, at 10:40 AM, Conan Cook wrote:

Hi,

OK we're pretty sure we dropped and re-created the keyspace before restarting the Cassandra nodes during some testing (we've been migrating to a new cluster).  The keyspace was created via the cli:

create keyspace m7
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {us-east: 3}
  and durable_writes = true;

I'm pretty confident that it's a result of the issue I spotted before:


Does anyone know whether this also affected versions before 1.1.0?  If not then we can just roll back until there's a fix; we're not using our cluster in production so we can afford to just bin it all and load it again.  +1 for this being a major issue though, the fact that you can't see it until you restart a node makes it quite dangerous, and that node is lost when it occurs (I also haven't been able to restore the schema in any way).

Thanks very much,


Conan



On 10 May 2012 17:15, Conan Cook <conan.cook <at> amee.com> wrote:
Hi Aaron,

Thanks for getting back to me!  Yes, I believe our keyspace was created prior to 1.1, and I think I also understand why you're asking that, having found this:


Here's our startup log:


There isn't much in there of interest however.  It may well be the case that we created our keyspace, dropped it, then created it again.  The dev responsible for setting it up is ill today, but I'll get back to you tomorrow with exact details of how it was originally created and whether we did definitely drop and re-create it.

Ta,

Conan


On 10 May 2012 11:43, aaron morton <aaron <at> thelastpickle.com> wrote:
Was this a schema that was created prior to 1.1 ?

What process are you using to create the schema ? 

Can you share the logs from system startup ? Up until it logs "Listening for thrift clients". (if they are long please link to them)

Cheers

-----------------
Aaron Morton
Freelance Developer
<at> aaronmorton

On 10/05/2012, at 1:04 AM, Conan Cook wrote:

Sorry, forgot to mention we're running Cassandra 1.1.

Conan

On 8 May 2012 17:51, Conan Cook <conan.cook <at> amee.com> wrote:
Hi Cassandra Folk,

We've experienced a problem a couple of times where Cassandra nodes lose a keyspace after a restart.  We've restarted 2 out of 3 nodes, and they have both experienced this problem; clearly we're doing something wrong, but don't know what.  The data files are all still there, as before, but the node can't see the keyspace (we only have one).  Tthe nodetool still says that each one is responsible for 33% of the keys, but the disk usage has dropped to a tiny amount on the nodes that we've restarted.  I saw this:


Seems to be exactly our problem, but we have not modified the cassandra.yaml - we have overwritten it through an automated process, and that happened just before restarting, but the contents did not change.

Any ideas as to what might cause this, or how the keyspace can be restored (like I say, the data is all still in the data directory).

We're running in AWS.

Thanks,


Conan







Gmane