Bill Moran | 9 Oct 2007 16:16
Favicon

Mysterious jail lockups


Has anyone else seen this?

The symptoms are a jail that has no processes in it, and thus can not
be stopped/killed/whatever.  Only solution is to reboot the host system.
Trying to jexec into the jail results in an error, so new processes can't
be started therein.

It doesn't happen very often, and I've been unable to reproduce it on
demand.  What I'm looking for at this point are whether or not anyone
else has seen this, and advice on how to track it down/reproduce it, with
the eventual goal of fixing the problem.

It would be nice if there were a command, let's say "jkill" that killed
the _jail_.  There is a port called jkill that (allegedly) does this, but
looking at the perl code, all it does it loop through a ps listing
killing off processes.  In the event of a jail with no processes, this
doesn't help any.

Theoretically, this would be some sort of kernel bug, whereby the
reference counter to the jail is not properly decremented when processes
die and thus the jail never shuts down.  Given the infrequency of the
occurrence and my inability to produce a reproducible case, I expect
it to be challenging to track down.

Any advice?

--

-- 
Bill Moran
Collaborative Fusion Inc.
(Continue reading)

D Hill | 9 Oct 2007 21:39

Re: Mysterious jail lockups

On Tue, 9 Oct 2007 at 10:16 -0400, wmoran@... confabulated:

> Has anyone else seen this?
>
> The symptoms are a jail that has no processes in it, and thus can not
> be stopped/killed/whatever.  Only solution is to reboot the host system.
> Trying to jexec into the jail results in an error, so new processes can't
> be started therein.
>
> It doesn't happen very often, and I've been unable to reproduce it on
> demand.  What I'm looking for at this point are whether or not anyone
> else has seen this, and advice on how to track it down/reproduce it, with
> the eventual goal of fixing the problem.
>
> It would be nice if there were a command, let's say "jkill" that killed
> the _jail_.  There is a port called jkill that (allegedly) does this, but
> looking at the perl code, all it does it loop through a ps listing
> killing off processes.  In the event of a jail with no processes, this
> doesn't help any.
>
> Theoretically, this would be some sort of kernel bug, whereby the
> reference counter to the jail is not properly decremented when processes
> die and thus the jail never shuts down.  Given the infrequency of the
> occurrence and my inability to produce a reproducible case, I expect
> it to be challenging to track down.
>
> Any advice?

Same thing seen here running:

(Continue reading)


Gmane