Chris | 19 Jul 20:43 2011

Re: [Check_mk (english)] multisite, nagstamon and mod_python busy loop

On Sat, Jul 16, 2011 at 11:18:46AM +0100, Phil Spencer wrote:
> I mailed this out in the mailing list about 2 weeks ago. I am also
> having the same issue. It seems as tho it doesn't seem to close the
> connection between apache/nagstamon properly, as it's always waiting
> for something, and then it spawns, literally, hundreds of apache
> processes. There is 1 byte waiting in the queue to be sent/received.
> Have a look at the apache processes (turn on server-info /
> server-status under apache config and you can see there are tonnes of
> processes in the wait state) and also check netstat for the
> send/receive queue (netstat -n)  and see there is one byte waiting.
> 

Sorry for the late response, this managed to get sorted into the wrong folder.

Sure enough, that is what I'm seeing too.  Oddly, I can't reproduce it
consistently on the smaller server.  It happens every time on the big
server.  It appears to be client related though.  I can't cause it to happen
when loading the request via wget or a browser.  I suppose I should ask about
it wherever nagstamon questions are asked.

Thanks for the reply!

Chris
Dave Cundiff | 16 Oct 02:43 2011
Picon

Re: [Check_mk (english)] multisite, nagstamon and mod_python busy loop

On Tue, Jul 19, 2011 at 2:43 PM, Chris <chris@...> wrote:
> On Sat, Jul 16, 2011 at 11:18:46AM +0100, Phil Spencer wrote:
>> I mailed this out in the mailing list about 2 weeks ago. I am also
>> having the same issue. It seems as tho it doesn't seem to close the
>> connection between apache/nagstamon properly, as it's always waiting
>> for something, and then it spawns, literally, hundreds of apache
>> processes. There is 1 byte waiting in the queue to be sent/received.
>> Have a look at the apache processes (turn on server-info /
>> server-status under apache config and you can see there are tonnes of
>> processes in the wait state) and also check netstat for the
>> send/receive queue (netstat -n)  and see there is one byte waiting.
>>
>
> Sorry for the late response, this managed to get sorted into the wrong folder.
>
> Sure enough, that is what I'm seeing too.  Oddly, I can't reproduce it
> consistently on the smaller server.  It happens every time on the big
> server.  It appears to be client related though.  I can't cause it to happen
> when loading the request via wget or a browser.  I suppose I should ask about
> it wherever nagstamon questions are asked.
>
> Thanks for the reply!
>
> Chris
> _______________________________________________
> checkmk-en mailing list
> checkmk-en@...
> http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
>

(Continue reading)

Lars Michelsen | 19 Oct 11:01 2011
Picon

Re: [Check_mk (english)] multisite, nagstamon and mod_python busy loop

Hello List,

thanks for providing these information. I tried to reproduce the 
problems but never saw hanging processes over here with current versions.

Can you please provide some more information about your environment?

- Operating system / Distribution
- Apache version
- Python version
- mod_python versions
- Type of Nagios installation (OMD versions?)
- Number of hosts/services

Regards,
Lars

On 16/10/11 02:43, Dave Cundiff wrote:
> On Tue, Jul 19, 2011 at 2:43 PM, Chris<chris@...>  wrote:
>> On Sat, Jul 16, 2011 at 11:18:46AM +0100, Phil Spencer wrote:
>>> I mailed this out in the mailing list about 2 weeks ago. I am also
>>> having the same issue. It seems as tho it doesn't seem to close the
>>> connection between apache/nagstamon properly, as it's always waiting
>>> for something, and then it spawns, literally, hundreds of apache
>>> processes. There is 1 byte waiting in the queue to be sent/received.
>>> Have a look at the apache processes (turn on server-info /
>>> server-status under apache config and you can see there are tonnes of
>>> processes in the wait state) and also check netstat for the
>>> send/receive queue (netstat -n)  and see there is one byte waiting.
>>>
(Continue reading)

Dave Cundiff | 19 Oct 19:02 2011
Picon

Re: [Check_mk (english)] multisite, nagstamon and mod_python busy loop

On Wed, Oct 19, 2011 at 5:01 AM, Lars Michelsen
<lm@...> wrote:
> Hello List,
>
> thanks for providing these information. I tried to reproduce the problems
> but never saw hanging processes over here with current versions.
>
> Can you please provide some more information about your environment?
>
> - Operating system / Distribution
CentOS release 5.6 (Final)

> - Apache version
Server version: Apache/2.2.21 (Unix)
Server built:   Sep 22 2011 17:34:16
Server's Module Magic Number: 20051115:30
Server loaded:  APR 1.4.5, APR-Util 1.3.12
Compiled using: APR 1.4.5, APR-Util 1.3.12
Architecture:   64-bit
Server MPM:     Prefork
  threaded:     no
    forked:     yes (variable process count)

> - Python version
2.4.3

> - mod_python versions
3.3.1

> - Type of Nagios installation (OMD versions?)
(Continue reading)

Lars Michelsen | 21 Oct 15:10 2011
Picon

Re: [Check_mk (english)] multisite, nagstamon and mod_python busy loop

Hello List,

I really tried to reproduce the problem, but no chance.

I used a CentOS 5.6 installation similar to your setup. Also with 
multiple sites consolidated in one multisite instance.

I used the current OMD version with Check_MK 1.1.12b1. Can you please 
check it with this version?

Regards,
Lars

On 19/10/11 19:02, Dave Cundiff wrote:
> On Wed, Oct 19, 2011 at 5:01 AM, Lars
Michelsen<lm@...>  wrote:
>> Hello List,
>>
>> thanks for providing these information. I tried to reproduce the problems
>> but never saw hanging processes over here with current versions.
>>
>> Can you please provide some more information about your environment?
>>
>> - Operating system / Distribution
> CentOS release 5.6 (Final)
>
>> - Apache version
> Server version: Apache/2.2.21 (Unix)
> Server built:   Sep 22 2011 17:34:16
> Server's Module Magic Number: 20051115:30
(Continue reading)

Dave Cundiff | 23 Oct 17:16 2011
Picon

Re: [Check_mk (english)] multisite, nagstamon and mod_python busy loop

I'll give it a try.

In the meantime I backtraced a few of the dead httpds in case it
helps. They all seem to be getting stuck in unicode functions. I had
tried restarting Apache before the backtraces which is why all these
processes are trying unsuccessfully to shutdown. -9 is the only way to
get them to die. They would normally be stuck around #11, #12 before
receiving a kill signal.

Example #1:
#0  0x00002b419ef2dd01 in sem_wait () from /lib64/libpthread.so.0
#1  0x00002b41a8001e48 in PyThread_acquire_lock () from
/etc/httpd/modules/mod_python.so
#2  0x00002b41a7f8d044 in get_interpreter (name=0x2b41b23d5390
"a2hosting.com") at mod_python.c:240
#3  0x00002b41a7f909d9 in python_cleanup (data=<value optimized out>)
at mod_python.c:353
#4  0x00002b419ed0b69d in run_cleanups (cref=0x2b41b23e0288) at
memory/unix/apr_pools.c:2346
#5  0x00002b419ed0c11e in apr_pool_destroy (pool=0x2b41b23e0268) at
memory/unix/apr_pools.c:809
#6  0x00002b419ed0c10c in apr_pool_destroy (pool=0x2b41aecbc588) at
memory/unix/apr_pools.c:806
#7  0x00002b419ed0c10c in apr_pool_destroy (pool=0x2b41aecba578) at
memory/unix/apr_pools.c:806
#8  0x00002b419d66fb5e in clean_child_exit (code=0) at
/home/brewbuilder/rpms/BUILD/httpd-2.2.21/server/mpm/prefork/prefork.c:196
#9  0x00002b419d66fb8b in just_die (sig=<value optimized out>) at
/home/brewbuilder/rpms/BUILD/httpd-2.2.21/server/mpm/prefork/prefork.c:328
#10 <signal handler called>
(Continue reading)

Dave Cundiff | 23 Oct 17:44 2011
Picon

Re: [Check_mk (english)] multisite, nagstamon and mod_python busy loop

First I just want to say how unbelievably painless that upgrade was. :)

Unfortunately the issues still persists. The processes are getting
stuck in unicode functions. The below is generally what I get from the
stuck httpd. It will sometimes not make it to the memcpy.

Maybe a Python/mod_python issue? I'm assuming since OMD is available
as a redhat rpm it just uses the stock python/mod_python?

#0  0x00002ae7a4f5d352 in memcpy () from /lib64/libc.so.6
#1  0x00002ae7adb717df in PyUnicodeUCS4_FromUnicode () from
/etc/httpd/modules/mod_python.so
#2  0x00002ae7adb7f593 in PyEval_EvalFrame () from
/etc/httpd/modules/mod_python.so
#3  0x00002ae7adb7fbf6 in PyEval_EvalFrame () from
/etc/httpd/modules/mod_python.so
#4  0x00002ae7adb7fbf6 in PyEval_EvalFrame () from
/etc/httpd/modules/mod_python.so
#5  0x00002ae7adb81075 in PyEval_EvalCodeEx () from
/etc/httpd/modules/mod_python.so
#6  0x00002ae7adb7f7cf in PyEval_EvalFrame () from
/etc/httpd/modules/mod_python.so
#7  0x00002ae7adb7fbf6 in PyEval_EvalFrame () from
/etc/httpd/modules/mod_python.so
#8  0x00002ae7adb81075 in PyEval_EvalCodeEx () from
/etc/httpd/modules/mod_python.so
#9  0x00002ae7adb7f7cf in PyEval_EvalFrame () from
/etc/httpd/modules/mod_python.so
#10 0x00002ae7adb7fbf6 in PyEval_EvalFrame () from
/etc/httpd/modules/mod_python.so
(Continue reading)

Dave Cundiff | 30 Oct 20:13 2011
Picon

Re: [Check_mk (english)] multisite, nagstamon and mod_python busy loop

Well I did the footwork. :P I restored the views to default and found
your bug. Its in strip_tags of htmllib.py. I put a breakpoint in
before the break(included code below) and its never hit when the httpd
process gets stuck. The field it locks up on is always the icon field.
It may have something to do with the flap detection icon but I'm not
100% sure on it. The hosts it stops outputting on always seem to have
that icon.

I restored the default views and just removed the icon field and it
now works correctly as well. Seems like the icon field can sometimes
be formatted in such a way that its tags cause strip_tags to never
exit.

# remove all HTML-tags
def strip_tags(ht):
    while True:
        x = ht.find('<')
        if x == -1:
            pdb.set_trace()
            break
        y = ht.find('>')
        ht = ht[0:x] + ht[y+1:]
    return ht

(Pdb) n
> /usr/share/check_mk/web/htdocs/htmllib.py(153)strip_tags()
-> return ht
(Pdb)
--Return--
> /usr/share/check_mk/web/htdocs/htmllib.py(153)strip_tags()->u'Check_MK'
(Continue reading)


Gmane