Willy Tarreau | 29 Mar 19:01 2013
Picon

possible crashes on linux with recent glibc

Hi,

Chris Allen and Jeff Zellner reported a similar issue at the same
time on two different versions : 1.4.20 and 1.5-dev17. The symptom
is always the same, haproxy suddenly started to crash under load
while it did not in the past.

When looking deeper into the traces and core files, it happens that
both versions were built with TARGET=generic, so haproxy was using
select() to poll for new events.

The issue was tracked down to a recent update to glibc which now
verifies that the file descriptor number passed to FD_SET/FD_CLR/
FD_ISSET is comprised between 0 and FD_SETSIZE-1 (1023) :

     http://repo.or.cz/w/glibc.git/commitdiff/a0f33f996

I believe it was merged into glibc 2.16 and backported in the glibc
2.15 as shipped with Ubuntu 12.04.

This introduces a regression on something which has worked for ages an
was not clearly documented in the past. At least on Linux, Solaris,
FreeBSD and OpenBSD, I've been used to successfully pass select() with
FD sets which were arrays of fd_set[] in order to support more than
FD_SETSIZE fds. I've been doing this for 18 years without issues and
older man pages did not suggest this was not the expected way to use
them. Now we get some warnings in updated Linux man pages :

       An  fd_set  is  a  fixed  size buffer. Executing FD_CLR or
       FD_SET with a value of fd that is negative or is equal  to
(Continue reading)

Bryan Talbot | 2 Apr 04:11 2013

Re: possible crashes on linux with recent glibc

On Fri, Mar 29, 2013 at 11:01 AM, Willy Tarreau <w <at> 1wt.eu> wrote:
Hi,


For the medium term, I'm going to prepare the following changes :

  - make poll() rely solely on bit fields without using FD_* macros
  - add a start up warning when select() is used with a maxconn leading
    to more than FD_SETSIZE fds, followed by a runtime test to make it
    crash in glibc while parsing the config if needed instead of reserving
    a friday evening surprize for you.
  - enable poll() by default in the generic target, as it's supported on
    all platforms where haproxy is known to build
 


haproxy built with macports on OSX seems to only have support for select() and not poll().  I don't have any suggestions but is this environment impacted by your proposed changes?

Not running haproxy on osx for anything other than localhost development mode of course, but keeping it working on osx would be great.


$> /opt/local/sbin/haproxy -vv
HA-Proxy version 1.4.22 2012/08/09
Copyright 2000-2012 Willy Tarreau <w <at> 1wt.eu>

Build options :
  TARGET  = osx
  CPU     = generic
  CC      = /usr/bin/clang -arch x86_64
  CFLAGS  = -O2 -g -fno-strict-aliasing
  OPTIONS = USE_LIBCRYPT=1 USE_REGPARM=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes

Available polling systems :
     select : pref=150,  test result OK
Total: 1 (1 usable), will use select.

 
Willy Tarreau | 2 Apr 08:00 2013
Picon

Re: possible crashes on linux with recent glibc

Hi Brian,

On Mon, Apr 01, 2013 at 07:11:25PM -0700, Bryan Talbot wrote:
> haproxy built with macports on OSX seems to only have support for select()
> and not poll().  I don't have any suggestions but is this environment
> impacted by your proposed changes?

It's a Makefile issue, OSX supports select(), poll() and kqueue(). And BTW,
OSX is one of those causing issues with select() and fd >= 1024 according
to the man page. In fact all operating systems where haproxy may be built
support poll().

> Not running haproxy on osx for anything other than localhost development
> mode of course, but keeping it working on osx would be great.

You're right, I'm going to fix the makefile right now.

> $> /opt/local/sbin/haproxy -vv
> HA-Proxy version 1.4.22 2012/08/09
> Copyright 2000-2012 Willy Tarreau <w <at> 1wt.eu>
> 
> Build options :
>   TARGET  = osx
              ^^^
The issue is here above. The "osx" target is not defined so no option
is taken. So first I'll define such a target because it makes sense to
have it, and second, I'll enable POLL by default when the target is
unknown.

And this way you'll get a better development platform :-)

Thanks,
Willy

Lukas Tribus | 5 Mar 19:38 2014
Picon

RE: possible crashes on linux with recent glibc

Hi Willy,

> Chris Allen and Jeff Zellner reported a similar issue at the same
> time on two different versions : 1.4.20 and 1.5-dev17. The symptom
> is always the same, haproxy suddenly started to crash under load
> while it did not in the past.
>
> When looking deeper into the traces and core files, it happens that
> both versions were built with TARGET=generic, so haproxy was using
> select() to poll for new events.
>
> The issue was tracked down to a recent update to glibc which now
> verifies that the file descriptor number passed to FD_SET/FD_CLR/
> FD_ISSET is comprised between 0 and FD_SETSIZE-1 (1023) :
>
> http://repo.or.cz/w/glibc.git/commitdiff/a0f33f996
>
> I believe it was merged into glibc 2.16 and backported in the glibc
> 2.15 as shipped with Ubuntu 12.04.

Sorry to wakeup this one year old thread, I just hit a crash while
playing with older code and want to confirm that its 'only' this
particular (known) problem I'm hitting, not a hidden bug (or whatever).

Your description corresponds with my configuration (using select() with
glibc 2.15 on ubuntu crashing with some load).

On the terminal I see (which is what confuses a bit):
*** buffer overflow detected ***: ./haproxy terminated

and the backtrace looks like this:
(gdb) backtrace full
#0  0xb76e2424 in __kernel_vsyscall ()
No symbol table info available.
#1  0xb755b1df in raise () from /lib/i386-linux-gnu/libc.so.6
No symbol table info available.
#2  0xb755e825 in abort () from /lib/i386-linux-gnu/libc.so.6
No symbol table info available.
#3  0xb759839a in ?? () from /lib/i386-linux-gnu/libc.so.6
No symbol table info available.
#4  0xb76310e5 in __fortify_fail () from /lib/i386-linux-gnu/libc.so.6
No symbol table info available.
#5  0xb762feba in __chk_fail () from /lib/i386-linux-gnu/libc.so.6
No symbol table info available.
#6  0xb763107a in __fdelt_warn () from /lib/i386-linux-gnu/libc.so.6
No symbol table info available.
#7  0x0809ad3f in _do_poll (p=0x80ce0e0, exp=-1820950388) at src/ev_select.c:65

I'm quite sure its exactly this problem, but I prefer to double check with
you.

Thanks,

Lukas 		 	   		  

Willy Tarreau | 6 Mar 08:28 2014
Picon

Re: possible crashes on linux with recent glibc

Hi Lukas,

On Wed, Mar 05, 2014 at 07:38:42PM +0100, Lukas Tribus wrote:
> Hi Willy,
> 
> 
> > Chris Allen and Jeff Zellner reported a similar issue at the same
> > time on two different versions : 1.4.20 and 1.5-dev17. The symptom
> > is always the same, haproxy suddenly started to crash under load
> > while it did not in the past.
> >
> > When looking deeper into the traces and core files, it happens that
> > both versions were built with TARGET=generic, so haproxy was using
> > select() to poll for new events.
> >
> > The issue was tracked down to a recent update to glibc which now
> > verifies that the file descriptor number passed to FD_SET/FD_CLR/
> > FD_ISSET is comprised between 0 and FD_SETSIZE-1 (1023) :
> >
> > http://repo.or.cz/w/glibc.git/commitdiff/a0f33f996
> >
> > I believe it was merged into glibc 2.16 and backported in the glibc
> > 2.15 as shipped with Ubuntu 12.04.
> 
> 
> Sorry to wakeup this one year old thread, I just hit a crash while
> playing with older code and want to confirm that its 'only' this
> particular (known) problem I'm hitting, not a hidden bug (or whatever).
> 
> Your description corresponds with my configuration (using select() with
> glibc 2.15 on ubuntu crashing with some load).
> 
> 
> On the terminal I see (which is what confuses a bit):
> *** buffer overflow detected ***: ./haproxy terminated
> 
> and the backtrace looks like this:
> (gdb) backtrace full
> #0  0xb76e2424 in __kernel_vsyscall ()
> No symbol table info available.
> #1  0xb755b1df in raise () from /lib/i386-linux-gnu/libc.so.6
> No symbol table info available.
> #2  0xb755e825 in abort () from /lib/i386-linux-gnu/libc.so.6
> No symbol table info available.
> #3  0xb759839a in ?? () from /lib/i386-linux-gnu/libc.so.6
> No symbol table info available.
> #4  0xb76310e5 in __fortify_fail () from /lib/i386-linux-gnu/libc.so.6
> No symbol table info available.
> #5  0xb762feba in __chk_fail () from /lib/i386-linux-gnu/libc.so.6
> No symbol table info available.
> #6  0xb763107a in __fdelt_warn () from /lib/i386-linux-gnu/libc.so.6
> No symbol table info available.
> #7  0x0809ad3f in _do_poll (p=0x80ce0e0, exp=-1820950388) at src/ev_select.c:65
> 
> 
> 
> I'm quite sure its exactly this problem, but I prefer to double check with
> you.

Yes it was the exact same trace I used to get when using select() with
too large file descriptors. I really think that this glibc change will
break a large number of software...

Regards,
Willy

Lukas Tribus | 6 Mar 09:54 2014
Picon

RE: possible crashes on linux with recent glibc

Hi Willy,

>> Your description corresponds with my configuration (using select() with
>> glibc 2.15 on ubuntu crashing with some load).
>>
>>
>> On the terminal I see (which is what confuses a bit):
>> *** buffer overflow detected ***: ./haproxy terminated
>>
>> and the backtrace looks like this:
>> (gdb) backtrace full
>> #0 0xb76e2424 in __kernel_vsyscall ()
>> No symbol table info available.
>> #1 0xb755b1df in raise () from /lib/i386-linux-gnu/libc.so.6
>> No symbol table info available.
>> #2 0xb755e825 in abort () from /lib/i386-linux-gnu/libc.so.6
>> No symbol table info available.
>> #3 0xb759839a in ?? () from /lib/i386-linux-gnu/libc.so.6
>> No symbol table info available.
>> #4 0xb76310e5 in __fortify_fail () from /lib/i386-linux-gnu/libc.so.6
>> No symbol table info available.
>> #5 0xb762feba in __chk_fail () from /lib/i386-linux-gnu/libc.so.6
>> No symbol table info available.
>> #6 0xb763107a in __fdelt_warn () from /lib/i386-linux-gnu/libc.so.6
>> No symbol table info available.
>> #7 0x0809ad3f in _do_poll (p=0x80ce0e0, exp=-1820950388) at src/ev_select.c:65
>>
>>
>>
>> I'm quite sure its exactly this problem, but I prefer to double check with
>> you.
>
> Yes it was the exact same trace I used to get when using select() with
> too large file descriptors. I really think that this glibc change will
> break a large number of software...

Thanks. Yes, indeed.

I wonder why it doesn't crash without compiler optimization (-O0) though.

Anyway, thanks for confirming the backtrace.

Regards,

Lukas 		 	   		  

Willy Tarreau | 6 Mar 10:07 2014
Picon

Re: possible crashes on linux with recent glibc

Hi Lukas,

On Thu, Mar 06, 2014 at 09:54:44AM +0100, Lukas Tribus wrote:
> Hi Willy,
> 
> 
> >> Your description corresponds with my configuration (using select() with
> >> glibc 2.15 on ubuntu crashing with some load).
> >>
> >>
> >> On the terminal I see (which is what confuses a bit):
> >> *** buffer overflow detected ***: ./haproxy terminated
> >>
> >> and the backtrace looks like this:
> >> (gdb) backtrace full
> >> #0 0xb76e2424 in __kernel_vsyscall ()
> >> No symbol table info available.
> >> #1 0xb755b1df in raise () from /lib/i386-linux-gnu/libc.so.6
> >> No symbol table info available.
> >> #2 0xb755e825 in abort () from /lib/i386-linux-gnu/libc.so.6
> >> No symbol table info available.
> >> #3 0xb759839a in ?? () from /lib/i386-linux-gnu/libc.so.6
> >> No symbol table info available.
> >> #4 0xb76310e5 in __fortify_fail () from /lib/i386-linux-gnu/libc.so.6
> >> No symbol table info available.
> >> #5 0xb762feba in __chk_fail () from /lib/i386-linux-gnu/libc.so.6
> >> No symbol table info available.
> >> #6 0xb763107a in __fdelt_warn () from /lib/i386-linux-gnu/libc.so.6
> >> No symbol table info available.
> >> #7 0x0809ad3f in _do_poll (p=0x80ce0e0, exp=-1820950388) at src/ev_select.c:65
> >>
> >>
> >>
> >> I'm quite sure its exactly this problem, but I prefer to double check with
> >> you.
> >
> > Yes it was the exact same trace I used to get when using select() with
> > too large file descriptors. I really think that this glibc change will
> > break a large number of software...
> 
> Thanks. Yes, indeed.
> 
> I wonder why it doesn't crash without compiler optimization (-O0) though.

I suspect that the FD_SET macros might be declared as functions instead
of macros and that they check the parameter before dereferencing the
array. That's just a guess.

Willy


Gmane