23 Jun 2011 19:56
23 Jun 2011 20:37
Re: dtrace script for CLOSE_WAIT leaks ?
On 06/23/11 13:56, Vladimir Kotal wrote: > > Hi all, > > Does anyone have a dtrace script for observing connection/socket leaks ? > I.e. when application does not do shutdown()/close() on a socket which > it previously accept()'ed ? Ideally with IP address/port pairs printed > in the output. Not sure what you would be looking for. You could trace connections as they transition to CLOSE_WAIT, but that isn't what you want, is it? The problem is not when they transition into CLOSE_WAIT, but when they fail to transition out of CLOSE_WAIT. How could you trace that? -- -- blu Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. - Martin Golding -----------------------------------------------------------------------| Brian Utterback - Solaris RPE, Oracle Corporation. Ph:603-262-3916, Em:brian.utterback@...
23 Jun 2011 22:48
Re: dtrace script for CLOSE_WAIT leaks ?
On 23/06/2011 18:56, Vladimir Kotal wrote:
>
> Hi all,
>
> Does anyone have a dtrace script for observing connection/socket leaks
> ? I.e. when application does not do shutdown()/close() on a socket
> which it previously accept()'ed ? Ideally with IP address/port pairs
> printed in the output.
>
I've looked to do this kind of thing with dtrace in the past, and
concluded that it couldn't be done (which could well be incorrect!). The
problem with leaked stuff is that whilst capturing data, you can pair up
and eliminate the symmetric events, but the stuff you're interested in
is the bit left over.
It's fairly trivial to pair up the opens and closes, something like this
(pseudo-d-code):
transition-to-close-wait:entry
{
array[port] = 1;
}
delete-tcp-or-close-socket:entry
/array[port]/
{
array[port] = 0;
}
(Continue reading)
24 Jun 2011 15:12
Re: dtrace script for CLOSE_WAIT leaks ?
Assuming you could find a way to dump the array, doesn't this just give you a list of port whose connections are currently in CLOSE_WAIT? Wouldn't netstat give you the same info? Instead of setting the array value to 1, you could set it to the value of walltimestamp. That way when you dumped it out, you would have the time it went into CLOSE_WAIT, which would give you an indication of which ones were in the state the longest. I wonder if you could get an aggregation to work here? Hmm. On 06/23/11 16:48, Brian Ruthven wrote: > On 23/06/2011 18:56, Vladimir Kotal wrote: >> >> Hi all, >> >> Does anyone have a dtrace script for observing connection/socket leaks >> ? I.e. when application does not do shutdown()/close() on a socket >> which it previously accept()'ed ? Ideally with IP address/port pairs >> printed in the output. >> > > I've looked to do this kind of thing with dtrace in the past, and > concluded that it couldn't be done (which could well be incorrect!). The > problem with leaked stuff is that whilst capturing data, you can pair up > and eliminate the symmetric events, but the stuff you're interested in > is the bit left over. > > It's fairly trivial to pair up the opens and closes, something like this > (pseudo-d-code): >(Continue reading)
24 Jun 2011 15:37
Re: dtrace script for CLOSE_WAIT leaks ?
Brian Utterback wrote: > Assuming you could find a way to dump the array, doesn't this just give > you a list of port whose connections are currently in CLOSE_WAIT? > Wouldn't netstat give you the same info? > > Instead of setting the array value to 1, you could set it to the value > of walltimestamp. That way when you dumped it out, you would have the > time it went into CLOSE_WAIT, which would give you an indication of > which ones were in the state the longest. I wonder if you could get an > aggregation to work here? Hmm. In about 99 and 44/100ths percent of the cases I've looked at in the past, what appears to be a "leak" is actually something exacerbated by the OS. What I usually see is that the application opens a socket (via socket() or accept()), does some work, and then closes the socket normally. Unbeknownst to the application, part of that "work" involved a fork(), perhaps buried in a library somewhere. (The free fork() given out to users of syslog() employing LOG_CONS was once a possible cause, but there are others.) The fork() logic duplicates all of the open file descriptors, and the code calling fork() in this case doesn't "know" that there are descriptors that it shouldn't be copying so it can't easily close them afterwards. It's the new process -- possibly completely unknown to the main application -- that's still holding the socket open, allowing it to slip into CLOSE_WAIT state. For that reason, I think any CLOSE_WAIT diagnostic function should at(Continue reading)
24 Jun 2011 17:50
Re: dtrace script for CLOSE_WAIT leaks ?
On Fri, Jun 24, 2011 at 8:37 AM, James Carlson <carlsonj@...> wrote: > (Would be nice to have something like z/OS's FCTLCLOFORK or the > sometimes-discussed Linux FD_DONTINHERIT flag.) +1
24 Jun 2011 18:10
Re: dtrace script for CLOSE_WAIT leaks ?
On 06/24/11 09:37, James Carlson wrote: > The fork() logic duplicates all of the open file descriptors, and the > code calling fork() in this case doesn't "know" that there are > descriptors that it shouldn't be copying so it can't easily close them > afterwards. It's the new process -- possibly completely unknown to the > main application -- that's still holding the socket open, allowing it to > slip into CLOSE_WAIT state. > > For that reason, I think any CLOSE_WAIT diagnostic function should at > least track the fork() descriptor duplication and allow you to trace > back to the application that "leaked" descriptors by way of creating new > processes. > > (Would be nice to have something like z/OS's FCTLCLOFORK or the > sometimes-discussed Linux FD_DONTINHERIT flag.) > The most common cause of CLOSE_WAIT connections in my experience has been middleware code that indeed forked a child that inherited the FD when it was not intended, but which also forked a child that was intended to get the FD. So at least in that case the FD_DONTINHERIT and FCTLCLOFORK would not have helped. Perhaps we should recommend the use of shutdown instead of close? -- -- blu Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. - Martin Golding(Continue reading)
24 Jun 2011 19:03
Re: dtrace script for CLOSE_WAIT leaks ?
Brian Utterback wrote:
Rao.
On 06/24/11 09:37, James Carlson wrote:You really need both as close is needed to reduce the file descriptors.The fork() logic duplicates all of the open file descriptors, and the code calling fork() in this case doesn't "know" that there are descriptors that it shouldn't be copying so it can't easily close them afterwards. It's the new process -- possibly completely unknown to the main application -- that's still holding the socket open, allowing it to slip into CLOSE_WAIT state. For that reason, I think any CLOSE_WAIT diagnostic function should at least track the fork() descriptor duplication and allow you to trace back to the application that "leaked" descriptors by way of creating new processes. (Would be nice to have something like z/OS's FCTLCLOFORK or the sometimes-discussed Linux FD_DONTINHERIT flag.)The most common cause of CLOSE_WAIT connections in my experience has been middleware code that indeed forked a child that inherited the FD when it was not intended, but which also forked a child that was intended to get the FD. So at least in that case the FD_DONTINHERIT and FCTLCLOFORK would not have helped. Perhaps we should recommend the use of shutdown instead of close?
Rao.
<div> Brian Utterback wrote: <blockquote cite="mid:4E04B6DE.7080507@..." type="cite"> On 06/24/11 09:37, James Carlson wrote: <blockquote type="cite"> The fork() logic duplicates all of the open file descriptors, and the code calling fork() in this case doesn't "know" that there are descriptors that it shouldn't be copying so it can't easily close them afterwards. It's the new process -- possibly completely unknown to the main application -- that's still holding the socket open, allowing it to slip into CLOSE_WAIT state. For that reason, I think any CLOSE_WAIT diagnostic function should at least track the fork() descriptor duplication and allow you to trace back to the application that "leaked" descriptors by way of creating new processes. (Would be nice to have something like z/OS's FCTLCLOFORK or the sometimes-discussed Linux FD_DONTINHERIT flag.) </blockquote> The most common cause of CLOSE_WAIT connections in my experience has been middleware code that indeed forked a child that inherited the FD when it was not intended, but which also forked a child that was intended to get the FD. So at least in that case the FD_DONTINHERIT and FCTLCLOFORK would not have helped. Perhaps we should recommend the use of shutdown instead of close? </blockquote> You really need both as close is needed to reduce the file descriptors.<br><br> Rao.<br><br><br> </div>
RSS Feed