cinap_lenrek | 4 Aug 2012 06:00
Picon
Picon
Gravatar

bind /net

Recently helped debugging a strange plan9 server problem. The
machine being a cpu/auth/file server basicly doing everything
from serving http with rc-httpd, accepting mail, serving dns
and running a bunch of cronjobs doing various things. the
machine is quite busy.

It worked quite well for a some time. Then, it would stop
accepting cpu logins. The clients cpu process would just hang
there. Http would continue serve fine for a while until
that will stop working too and finally, the machine will lockup
and reboot.

This happend like every 2 days or so.

After some time, we where able to get a picture of what seemed
to going on.

There would be many processes blocked opening /mnt/factotum/rpc.
Trying to ls /mnt will hang the ls... The machine would slowly
accumulate locked up processes until it reached the 2k process
limit...

Problem was that factotum seemed busy in some auth protocol.
(this really sucks. factotum is mounted directly on /mnt instead
of /mnt/factotum and is single threaded so when its doing some
auth business, noone can walk /mnt... this can even cause
deadlock with authsrv which tries to access /mnt/keys on the
same machine... but thats a different thing...)

But there was no tcp567 or authsrv processes arround (the machine
(Continue reading)

erik quanstrom | 4 Aug 2012 15:33
Favicon

Re: bind /net

> Problem was that factotum seemed busy in some auth protocol.
> (this really sucks. factotum is mounted directly on /mnt instead
> of /mnt/factotum and is single threaded so when its doing some
> auth business, noone can walk /mnt... this can even cause
> deadlock with authsrv which tries to access /mnt/keys on the
> same machine... but thats a different thing...)

the rsc factotum from 9atom is multithreaded.

- erik

cinap_lenrek | 5 Aug 2012 02:55
Picon
Picon
Gravatar

Re: bind /net

very good. i'll look into this. thanks :)

--
cinap


Gmane