Oleg Dulin | 10 Sep 19:37 2012
Picon

JVM 7, Cass 1.1.1 and G1 garbage collector

I am currently profiling a Cassandra 1.1.1 set up using G1 and JVM 7.

It is my feeble attempt to reduce Full GC pauses.

Has anyone had any experience with this ? Anyone tried it ?

--

-- 
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/

Peter Schuller | 11 Sep 05:26 2012

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

> I am currently profiling a Cassandra 1.1.1 set up using G1 and JVM 7.
>
> It is my feeble attempt to reduce Full GC pauses.
>
> Has anyone had any experience with this ? Anyone tried it ?

Have tried; for some workloads it's looking promising. This is without
key cache and row cache and with a pretty large young gen.

The main think you'll want to look for is whether your post-mixed mode
collection heap usage remains stable or keeps growing. The main issue
with G1 that causes fallbacks to full GC is regions becoming
effectively uncollectable due to high remembered set scanning costs
(driven by inter-region pointers). If you can avoid that, one might
hope to avoid full gc:s all-together.

The jury is still out on my side; but like I said, I've seen promising
indications.

--

-- 
/ Peter Schuller ( <at> scode, http://worldmodscode.wordpress.com)

Jonathan Ellis | 11 Sep 17:48 2012
Picon

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

Relatedly, I'd love to learn how to reliably reproduce full GC pauses
on C* 1.1+.

On Mon, Sep 10, 2012 at 12:37 PM, Oleg Dulin <oleg.dulin <at> gmail.com> wrote:
> I am currently profiling a Cassandra 1.1.1 set up using G1 and JVM 7.
>
> It is my feeble attempt to reduce Full GC pauses.
>
> Has anyone had any experience with this ? Anyone tried it ?
>
> --
> Regards,
> Oleg Dulin
> NYC Java Big Data Engineer
> http://www.olegdulin.com/
>
>

--

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Peter Schuller | 12 Sep 09:11 2012

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

> Relatedly, I'd love to learn how to reliably reproduce full GC pauses
> on C* 1.1+.

Our full gc:s are typically not very frequent. Few days or even weeks
in between, depending on cluster. But it happens on several clusters;
I'm guessing most (but I haven't done a systematic analysis). The only
question is how often. But given the lack of handling of such failure
modes, the effect on clients is huge. Recommend data reads by default
to mitigate this and a slew of other sources of problems (and for
counter increments, we're rolling out least-active-request routing).

--

-- 
/ Peter Schuller ( <at> scode, http://worldmodscode.wordpress.com)

Peter Schuller | 12 Sep 09:13 2012

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

> Our full gc:s are typically not very frequent. Few days or even weeks
> in between, depending on cluster.

*PER NODE* that is. On a cluster of hundreds of nodes, that's pretty
often (and all it takes is a single node).

--

-- 
/ Peter Schuller ( <at> scode, http://worldmodscode.wordpress.com)

Edward Capriolo | 15 Sep 18:18 2012
Picon

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

Generally tuning the garbage collector is a waste of time. Just follow someone else's recommendation and use that.

The problem with tuning is that workloads change then you have to tune again and again. New garbage collectors come out and you have to tune again and again. Someone at your company reads a blog about some new jvm and its awesomeness and you tune again and again, cassandra adds off heap caching you tune again and again.

All this work takes a lot of time and usually results in  negligible returns. Garbage collectors and tuning is not magic bullets.

On Wednesday, September 12, 2012, Peter Schuller <peter.schuller <at> infidyne.com> wrote:
>> Our full gc:s are typically not very frequent. Few days or even weeks
>> in between, depending on cluster.
>
> *PER NODE* that is. On a cluster of hundreds of nodes, that's pretty
> often (and all it takes is a single node).
>
> --
> / Peter Schuller ( <at> scode, http://worldmodscode.wordpress.com)
>

Peter Schuller | 15 Sep 20:24 2012

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

> Generally tuning the garbage collector is a waste of time.

Sorry, that's BS. It can be absolutely critical, when done right, and
only "useless" when done wrong. There's a spectrum in between.

> Just follow
> someone else's recommendation and use that.

No, don't.

Most recommendations out there are completely useless in the general
case because someone did some very specific benchmark under very
specific circumstances and then recommends some particular combination
of options. In order to understand whether a particular recommendation
applies to you, you need to know enough about your use-case that I
suspect you're better of just reading up on the available options and
figuring things out. Of course, randomly trying various different
settings to see which seems to work well may be realistic - but you
loose predictability (in the face of changing patterns of traffic for
example) if you don't know why it's behaving like it is.

If you care about GC related behavior you want to understand how the
application behaves, how the garbage collector behaves, what your
requirements are, and select settings based on those requirements and
how the application and GC behavior combine to produce emergent
behavior. The "best" GC options may vary *wildly* depending on the
nature of your cluster and your goals. There are also non-GC settings
(in the specific case of Cassandra) that affect the interaction with
the garbage collector, like whether you're using row/key caching, or
things like phi conviction threshold and/or timeouts. It's very hard
for anyone to give generalized recommendations. If it weren't,
Cassandra would ship with The One True set of settings that are always
the best and there would be no discussion.

It's very unfortunate that the state of GC in the freely available
JVM:s is at this point given that there exists known and working
algorithms (and at least one practical implementation) that avoids it,
mostly. But, it's the situation we're in. The only way around it that
I know of if you're on Hotspot, is to have the application behave in
such a way that it avoids the causes of un-predictable behavior w.r.t.
GC by being careful about it's memory allocation and *retention*
profile. For the specific case of avoiding *ever* seeing a full gc, it
gets even more complex.

--

-- 
/ Peter Schuller ( <at> scode, http://worldmodscode.wordpress.com)

Edward Capriolo | 25 Sep 02:02 2012
Picon

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

Haha Ok.
It is not a total waste, but practically your time is better spent in other places. The problem is just about everything is a moving target, schema, request rate, hardware. Generally tuning nudges a couple variables in one direction or the other and you see some decent returns. But each nudge takes a restart and a warm up period, and with how Cassandra distributes requests you likely have to flip several nodes or all of them before you can see the change! By the time you do that its probably a different day or week. Essentially finding our if one setting is better then the other is like a 3 day test in production. 

Before c* I used to deal with this in tomcat. Once in a while we would get a dev that read some article about tuning, something about a new jvm, or collector. With bright eyed enthusiasm they would want to try tuning our current cluster. They spend a couple days and measure something and say it was good "lower memory usage". Meanwhile someone else would come to me and say "higher 95th response time". More short pauses, fewer long pauses, great taste, less filing.

Most people just want to roflscale their huroku cloud. Tuning stuff is sysadmin work and the cloud has taught us that the cost of sysadmins are needless waste of money.

Just kidding !

But I do believe the default cassandra settings are reasonable and typically I find that most who look at tuning GC usually need more hardware and actually need to be tuning something somewhere else.

G1 is the perfect example of a time suck. Claims low pause latency for big heaps, and delivers something regarded by the Cassandra community (and hbase as well) that works worse then CMS. If you spent 3 hours switching tuning knobs and analysing, that is 3 hours of your life you will never get back.

Better to let SUN and other people worry about tuning (at least from where I sit)

On Saturday, September 15, 2012, Peter Schuller <peter.schuller <at> infidyne.com> wrote:
>> Generally tuning the garbage collector is a waste of time.
>
> Sorry, that's BS. It can be absolutely critical, when done right, and
> only "useless" when done wrong. There's a spectrum in between.
>
>> Just follow
>> someone else's recommendation and use that.
>
> No, don't.
>
> Most recommendations out there are completely useless in the general
> case because someone did some very specific benchmark under very
> specific circumstances and then recommends some particular combination
> of options. In order to understand whether a particular recommendation
> applies to you, you need to know enough about your use-case that I
> suspect you're better of just reading up on the available options and
> figuring things out. Of course, randomly trying various different
> settings to see which seems to work well may be realistic - but you
> loose predictability (in the face of changing patterns of traffic for
> example) if you don't know why it's behaving like it is.
>
> If you care about GC related behavior you want to understand how the
> application behaves, how the garbage collector behaves, what your
> requirements are, and select settings based on those requirements and
> how the application and GC behavior combine to produce emergent
> behavior. The "best" GC options may vary *wildly* depending on the
> nature of your cluster and your goals. There are also non-GC settings
> (in the specific case of Cassandra) that affect the interaction with
> the garbage collector, like whether you're using row/key caching, or
> things like phi conviction threshold and/or timeouts. It's very hard
> for anyone to give generalized recommendations. If it weren't,
> Cassandra would ship with The One True set of settings that are always
> the best and there would be no discussion.
>
> It's very unfortunate that the state of GC in the freely available
> JVM:s is at this point given that there exists known and working
> algorithms (and at least one practical implementation) that avoids it,
> mostly. But, it's the situation we're in. The only way around it that
> I know of if you're on Hotspot, is to have the application behave in
> such a way that it avoids the causes of un-predictable behavior w.r.t.
> GC by being careful about it's memory allocation and *retention*
> profile. For the specific case of avoiding *ever* seeing a full gc, it
> gets even more complex.
>
> --
> / Peter Schuller ( <at> scode, http://worldmodscode.wordpress.com)
>

Peter Schuller | 25 Sep 06:22 2012

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

> It is not a total waste, but practically your time is better spent in other
> places. The problem is just about everything is a moving target, schema,
> request rate, hardware. Generally tuning nudges a couple variables in one
> direction or the other and you see some decent returns. But each nudge takes
> a restart and a warm up period, and with how Cassandra distributes requests
> you likely have to flip several nodes or all of them before you can see the
> change! By the time you do that its probably a different day or week.
> Essentially finding our if one setting is better then the other is like a 3
> day test in production.
>
> Before c* I used to deal with this in tomcat. Once in a while we would get a
> dev that read some article about tuning, something about a new jvm, or
> collector. With bright eyed enthusiasm they would want to try tuning our
> current cluster. They spend a couple days and measure something and say it
> was good "lower memory usage". Meanwhile someone else would come to me and
> say "higher 95th response time". More short pauses, fewer long pauses, great
> taste, less filing.

That's why blind blackbox testing isn't the way to go. Understanding
what the application does, what the GC does, and the goals you have in
mind is more fruitful. For example, are you trying to improve p99?
Maybe you want to improve p999 at the cost of worse p99? What about
failure modes (non-happy cases)? Perhaps you don't care about
few-hundred-ms pauses but want to avoid full gc:s? There's lots of
different goals one might have, and workloads.

Testing is key, but only in combination with some directed choice of
what to tweak. Especially since it's hard to test for for the
non-happy cases (e.g., node takes a burst of traffic and starts
promoting everything into old-gen prior to processing a request,
resulting in a death spiral).

> G1 is the perfect example of a time suck. Claims low pause latency for big
> heaps, and delivers something regarded by the Cassandra community (and hbase
> as well) that works worse then CMS. If you spent 3 hours switching tuning
> knobs and analysing, that is 3 hours of your life you will never get back.

This is similar to saying that someone told you to switch to CMS (or,
use some particular flag, etc), you tried it, and it didn't have the
result you expected.

G1 and CMS have different trade-offs. Nether one will consistently
result in better latencies across the board. It's all about the
details.

> Better to let SUN and other people worry about tuning (at least from where I
> sit)

They're not tuning. They are providing very general purpose default
behavior, including things that make *no* sense at all with Cassandra.
For example, the default behavior with CMS is to try to make the
marking phase run as late as possible so that it finishes just prior
to heap exhaustion, in order to "optimize" for throughput; except
that's not a good idea for many cases because is exacerbates
fragmentation problems in old-gen by pushing usage very high
repeatedly, and it increases the chance of full gc because marking
started too late (even if you don't hit promotion failures due to
fragmentation). Sudden changes in workloads (e.g., compaction kicks
in) also makes it harder for CMS's mark triggering heuristics to work
well.

As such, default options for Cassandra are use certain settings that
diverge from that of the default behavior of the JVM, because
Cassandra-in-general is much more specific a use-case than the
completely general target audience of the JVM. Similarly, a particular
cluster (with certain workloads/goals/etc) is a yet more specific
use-case than Cassandra-in-general and may be better served by
settings that differ from that of default Cassandra.

But, I certainly agree with this (which I think roughly matches what
you're saying): Don't randomly pick options someone claims is good in
a blog post and expect it to just make things better. If it were that
easy, it would be the default behavior for obvious reasons. The reason
it's not, is likely that it depends on the situation. Further, even if
you do play the lottery and win - if you don't know *why*, how are you
able to extrapolate the behavior of the system with slightly changed
workloads? It's very hard to blackbox-test GC settings, which is
probably why GC tuning can be perceived as a useless game of
whack-a-mole.

--

-- 
/ Peter Schuller ( <at> scode, http://worldmodscode.wordpress.com)

Shahryar Sedghi | 11 Sep 18:04 2012
Picon

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

I was able to run IBM Java 7 with Cassandra (could not do it with 1.6 because of snappy). It has a new Garbage collection policy (called balanced)  that is good for very large heap size (over 8 GB), documented here that is so promising with Cassandra. I have not tried it but I like to see how it is in action.

Regrads

Shahryar

On Mon, Sep 10, 2012 at 1:37 PM, Oleg Dulin <oleg.dulin <at> gmail.com> wrote:
I am currently profiling a Cassandra 1.1.1 set up using G1 and JVM 7.

It is my feeble attempt to reduce Full GC pauses.

Has anyone had any experience with this ? Anyone tried it ?

--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/




Peter Schuller | 12 Sep 09:12 2012

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

> I was able to run IBM Java 7 with Cassandra (could not do it with 1.6
> because of snappy). It has a new Garbage collection policy (called balanced)
> that is good for very large heap size (over 8 GB), documented here that is
> so promising with Cassandra. I have not tried it but I like to see how it is
> in action.

FWIW, J9's "balanced" collector is very similar to G1 in it's design.

--

-- 
/ Peter Schuller ( <at> scode, http://worldmodscode.wordpress.com)


Gmane