4 Aug 2012 06:00
bind /net
<cinap_lenrek <at> gmx.de>
2012-08-04 04:00:40 GMT
2012-08-04 04:00:40 GMT
Recently helped debugging a strange plan9 server problem. The machine being a cpu/auth/file server basicly doing everything from serving http with rc-httpd, accepting mail, serving dns and running a bunch of cronjobs doing various things. the machine is quite busy. It worked quite well for a some time. Then, it would stop accepting cpu logins. The clients cpu process would just hang there. Http would continue serve fine for a while until that will stop working too and finally, the machine will lockup and reboot. This happend like every 2 days or so. After some time, we where able to get a picture of what seemed to going on. There would be many processes blocked opening /mnt/factotum/rpc. Trying to ls /mnt will hang the ls... The machine would slowly accumulate locked up processes until it reached the 2k process limit... Problem was that factotum seemed busy in some auth protocol. (this really sucks. factotum is mounted directly on /mnt instead of /mnt/factotum and is single threaded so when its doing some auth business, noone can walk /mnt... this can even cause deadlock with authsrv which tries to access /mnt/keys on the same machine... but thats a different thing...) But there was no tcp567 or authsrv processes arround (the machine(Continue reading)
RSS Feed