Wil Schultz | 1 Jul 03:53 2012

[outages] Java apps around the globe are crashing...

Leap second bug. *sigh*

-wil
Darius Jahandarie | 1 Jul 04:20 2012
Picon

Re: [outages] Java apps around the globe are crashing...

On Sat, Jun 30, 2012 at 9:53 PM, Wil Schultz <wschultz@...> wrote:
> Leap second bug. *sigh*

However, it does not seem to be a Java bug -- so far, it looks like
something is causing futex() to timeout, instead of telling the thread
to sleep [1], causing issues on anything that uses it (e.g., java,
chrome, mysql).

It's not clear exactly what variable (i.e., kernel versopm, distro)
causes boxes to go haywire. It may just be a race condition which some
people hit due to bad luck.

But it is certainly related to the leap second.

[1] https://lkml.org/lkml/2012/6/30/122

--

-- 
Darius Jahandarie
Geoffrey Mina | 1 Jul 04:43 2012

Re: [outages] Java apps around the globe are crashing...

Looks like upgrading the JVM probably won't be the answer. Anyone know of a fix and/or workaround?

Geoff Mina
Founder/CTO
Connect First Inc.
720.335.5924
888.410.3071
gmina@...

Sent from my iPhone

On Jun 30, 2012, at 8:25 PM, "Darius Jahandarie"
<djahandarie@...> wrote:

> On Sat, Jun 30, 2012 at 9:53 PM, Wil Schultz <wschultz@...> wrote:
>> Leap second bug. *sigh*
> 
> However, it does not seem to be a Java bug -- so far, it looks like
> something is causing futex() to timeout, instead of telling the thread
> to sleep [1], causing issues on anything that uses it (e.g., java,
> chrome, mysql).
> 
> It's not clear exactly what variable (i.e., kernel versopm, distro)
> causes boxes to go haywire. It may just be a race condition which some
> people hit due to bad luck.
> 
> But it is certainly related to the leap second.
> 
> 
> [1] https://lkml.org/lkml/2012/6/30/122
(Continue reading)

Darius Jahandarie | 1 Jul 04:48 2012
Picon

Re: [outages] Java apps around the globe are crashing...

On Sat, Jun 30, 2012 at 10:43 PM, Geoffrey Mina <gmina@...> wrote:
> Looks like upgrading the JVM probably won't be the answer. Anyone know of a fix and/or workaround?

Yes, this seems to be the workaround:

/etc/init.d/ntp stop; date; date `date +"%m%d%H%M%C%y.%S"`; date;

Then restart the offending process. I think starting ntp back up is also fine.

--

-- 
Darius Jahandarie
Geoffrey Mina | 1 Jul 04:46 2012

Re: [outages] Java apps around the globe are crashing...

Thanks. 

Geoff Mina
Founder/CTO
Connect First Inc.
720.335.5924
888.410.3071
gmina@...

Sent from my iPhone

On Jun 30, 2012, at 8:46 PM, "Darius Jahandarie"
<djahandarie@...> wrote:

> On Sat, Jun 30, 2012 at 10:43 PM, Geoffrey Mina <gmina@...> wrote:
>> Looks like upgrading the JVM probably won't be the answer. Anyone know of a fix and/or workaround?
> 
> Yes, this seems to be the workaround:
> 
> /etc/init.d/ntp stop; date; date `date +"%m%d%H%M%C%y.%S"`; date;
> 
> Then restart the offending process. I think starting ntp back up is also fine.
> 
> -- 
> Darius Jahandarie

Geoffrey Mina | 1 Jul 04:52 2012

Re: [outages] Java apps around the globe are crashing...

FYI. We have restarted our applications multiple times and it seems to have no impact. We just go back to
excessive load warnings. 

Geoff Mina
Founder/CTO
Connect First Inc.
720.335.5924
888.410.3071
gmina@...

Sent from my iPhone

On Jun 30, 2012, at 8:50 PM, "Geoffrey Mina" <gmina@...> wrote:

> Thanks. 
> 
> Geoff Mina
> Founder/CTO
> Connect First Inc.
> 720.335.5924
> 888.410.3071
> gmina@...
> 
> Sent from my iPhone
> 
> On Jun 30, 2012, at 8:46 PM, "Darius Jahandarie"
<djahandarie@...> wrote:
> 
>> On Sat, Jun 30, 2012 at 10:43 PM, Geoffrey Mina <gmina@...> wrote:
>>> Looks like upgrading the JVM probably won't be the answer. Anyone know of a fix and/or workaround?
(Continue reading)

David Coulson | 1 Jul 04:56 2012
Picon

Re: [outages] Java apps around the globe are crashing...

My boxes were okay after a reboot - Cycling the JVM wasn't good enough

On 6/30/12 10:52 PM, Geoffrey Mina wrote:
> FYI. We have restarted our applications multiple times and it seems to have no impact. We just go back to
excessive load warnings.
>
> Geoff Mina
> Founder/CTO
> Connect First Inc.
> 720.335.5924
> 888.410.3071
> gmina@...
>
> Sent from my iPhone
>
> On Jun 30, 2012, at 8:50 PM, "Geoffrey Mina" <gmina@...> wrote:
>
>> Thanks.
>>
>> Geoff Mina
>> Founder/CTO
>> Connect First Inc.
>> 720.335.5924
>> 888.410.3071
>> gmina@...
>>
>> Sent from my iPhone
>>
>> On Jun 30, 2012, at 8:46 PM, "Darius Jahandarie"
<djahandarie@...> wrote:
(Continue reading)

Amanda Machutta | 1 Jul 04:53 2012

Re: [outages] Java apps around the globe are crashing...

This work around works perfectly. Thanks.

Thanks,

Amanda MacHUTTA
---------------------------------
VP of Technology
Connect First Inc. 
CCI * VB * 
P: 678.905.0673
T: 888.410.3071
F: 678.265.1158
E: amachutta@...
---------------------------------
www.connectfirst.com

-----Original Message-----
From: outages-bounces@...
[mailto:outages-bounces@...] On Behalf Of Darius Jahandarie
Sent: Saturday, June 30, 2012 8:48 PM
To: Geoffrey Mina
Cc: outages@...
Subject: Re: [outages] Java apps around the globe are crashing...

On Sat, Jun 30, 2012 at 10:43 PM, Geoffrey Mina <gmina@...> wrote:
> Looks like upgrading the JVM probably won't be the answer. Anyone know of a fix and/or workaround?

Yes, this seems to be the workaround:

/etc/init.d/ntp stop; date; date `date +"%m%d%H%M%C%y.%S"`; date;
(Continue reading)

Jason Hellenthal | 1 Jul 05:23 2012
Picon

Re: [outages] Java apps around the globe are crashing...


Would have replied to a later message but none looked to be just perfect
so Ill reply here.

clock.nyc.he.net LOCAL(0)
clock.fmt.he.net LOCAL(0)
clock.sjc.he.net LOCAL(0)

All went to a LOCAL(0) refid sometime recently after being in .CDMA. for
the longest time.

Maybe this is a small part of it but thought I would share the note.

On Sat, Jun 30, 2012 at 06:53:53PM -0700, Wil Schultz wrote:
> Leap second bug. *sigh*
> 

--

-- 

 - (2^(N-1))
Colin Johnston | 1 Jul 05:31 2012
Picon

Re: [outages] Java apps around the globe are crashing...

all seemed to be ok via astaro vm
2012:07:01-00:59:59 VM406 kernel: Clock: inserting leap second 23:59:60 UTC

Colin

On 1 Jul 2012, at 04:23, Jason Hellenthal wrote:

> 
> Would have replied to a later message but none looked to be just perfect
> so Ill reply here.
> 
> clock.nyc.he.net LOCAL(0)
> clock.fmt.he.net LOCAL(0)
> clock.sjc.he.net LOCAL(0)
> 
> All went to a LOCAL(0) refid sometime recently after being in .CDMA. for
> the longest time.
> 
> Maybe this is a small part of it but thought I would share the note.
> 
> On Sat, Jun 30, 2012 at 06:53:53PM -0700, Wil Schultz wrote:
>> Leap second bug. *sigh*
>> 
> 
> 
> -- 
> 
> - (2^(N-1))
> _______________________________________________
> Outages mailing list
(Continue reading)

Colin Johnston | 1 Jul 10:09 2012
Picon

Re: [outages] Java apps around the globe are crashing...

Did however see some weird ntp error messages though

Jun 30 01:07:05 [192.168.0.1.128.94] 2012:06:30-01:07:05 ntpd[29409]: kernel time sync status change 0011
Jun 30 14:42:25 [192.168.0.1.128.94] 2012:06:30-14:42:25 ntpd[29409]: kernel time sync error 0011
Jun 30 18:10:21 [192.168.0.1.128.94] 2012:06:30-18:10:21 ntpd[29409]: kernel time sync error 0011
Jul  1 01:08:22 [192.168.0.1.128.95] 2012:07:01-01:08:22 ntpd[29409]: kernel time sync status change 0001

On 1 Jul 2012, at 04:31, Colin Johnston wrote:

> all seemed to be ok via astaro vm
> 2012:07:01-00:59:59 VM406 kernel: Clock: inserting leap second 23:59:60 UTC
> 
> 
> Colin
> 
> On 1 Jul 2012, at 04:23, Jason Hellenthal wrote:
> 
>> 
>> Would have replied to a later message but none looked to be just perfect
>> so Ill reply here.
>> 
>> clock.nyc.he.net LOCAL(0)
>> clock.fmt.he.net LOCAL(0)
>> clock.sjc.he.net LOCAL(0)
>> 
>> All went to a LOCAL(0) refid sometime recently after being in .CDMA. for
>> the longest time.
>> 
>> Maybe this is a small part of it but thought I would share the note.
>> 
(Continue reading)

Jeremy Chadwick | 1 Jul 11:48 2012

Re: [outages] Java apps around the globe are crashing...

I don't know what operating system you're running there, but I'm going
to assume some Linux distribution.

You also didn't provide the timezones those systems are in, so your
logging messages aren't as useful (to me) as they could be.  I'm going
to guess UTC+1 (keep reading for how I determined that :-) ):

I'll explain a bit more about this, but since I use FreeBSD the system
and behaviour may be a bit different.  I imagine the ntpd and kernel
time bits are similar/identical though.

This year's leap second occurred on 06/30 at 23:59:60 UTC or
thereabouts.  That means it works like so:

2012/06/30 23:59:59 UTC
2012/06/30 23:59:60 UTC -- leap second occurs here
2012/07/01 00:00:00 UTC

My below logs are taken from two FreeBSD systems (I can check more if
need be), which are UTC-7 (PDT) timezone:

Jun 28 06:39:55 sys1 ntpd[1560]: kernel time sync status change 6001
Jun 29 17:24:31 sys1 ntpd[1560]: kernel time sync status change 2011
Jun 30 17:04:57 sys1 ntpd[1560]: kernel time sync status change 2001

Jun 29 06:33:38 sys2 ntpd[77470]: kernel time sync status change 2001
Jun 29 17:14:34 sys2 ntpd[77470]: kernel time sync status change 2011
Jun 30 05:59:24 sys2 ntpd[77470]: kernel time sync status change 6011
Jun 30 06:07:58 sys2 ntpd[77470]: kernel time sync status change 2011
Jun 30 17:01:31 sys2 ntpd[77470]: kernel time sync status change 2001
(Continue reading)


Gmane