Helge Bahmann | 5 Dec 09:37 2005
Picon

NFS over localhost?

Hello list!

Setup: One volume ("/export") exported to 192.168.1.1 via kernel-space
NFSv3 server; imported from the same machine 192.168.1.1 (at "/import");
this setup is only for configuration testing purposes.

Symptoms: Under heavy write load ("dd if=/dev/zero of=/import/x") both NFS
client and NFS server deadlock reproducibly, various kernels 2.6.8 -
2.6.15-rc4 affected, both SMP and non-SMP. All accesses to "/import" hang
indefinitely, sometimes also the mount-point "/export" becomes unusable;
both reiserfs and ext3 as back-end filesystems are affected. This
situation happens as the "free" memory (as reported by vmstat)
approaches zero, so it may possibly be an out-of-memory-thing.

Question(s): Is this kind of deadlock "supposed" to happen? I mean, there
is no purpose in using NFS in this kind of way (except for configuration
testing), so I can imagine no one bothers to clean up the interaction
between NFS client and server in this scenario. However I am slightly
concerned that a similiar scenario might be possible with our "real" NFS
server, as it can receive data faster over the network than it can write
out to the disks. Is this possible?

Any comments? Thanks and best regards
--

-- 
Helge Bahmann <hcb <at> chaoticmind.net>                     /| \__
The past: Smart users in front of dumb terminals       /_|____\
                                                     _/\ |   __)
$ ./configure                                        \\ \|__/__|
checking whether build environment is sane... yes     \\/___/ |
checking for AIX... no (we already did this)            |
(Continue reading)

Trond Myklebust | 5 Dec 14:13 2005
Picon
Picon

Re: NFS over localhost?

On Mon, 2005-12-05 at 09:37 +0100, Helge Bahmann wrote:

> Symptoms: Under heavy write load ("dd if=/dev/zero of=/import/x") both NFS
> client and NFS server deadlock reproducibly, various kernels 2.6.8 -
> 2.6.15-rc4 affected, both SMP and non-SMP. All accesses to "/import" hang
> indefinitely, sometimes also the mount-point "/export" becomes unusable;
> both reiserfs and ext3 as back-end filesystems are affected. This
> situation happens as the "free" memory (as reported by vmstat)
> approaches zero, so it may possibly be an out-of-memory-thing.
> 
> Question(s): Is this kind of deadlock "supposed" to happen? I mean, there
> is no purpose in using NFS in this kind of way (except for configuration
> testing), so I can imagine no one bothers to clean up the interaction
> between NFS client and server in this scenario. However I am slightly
> concerned that a similiar scenario might be possible with our "real" NFS
> server, as it can receive data faster over the network than it can write
> out to the disks. Is this possible?

No, that kind of deadlock is not _supposed_ to happen, but it is very
hard to avoid. Doing loopback mounts will always confuse the VM.
The problem is that the client side can only deal with memory pressure
by writing out pages and then sending them to the server. Since the
server in this case is itself, that therefore actually increases the
memory pressure.
A setup in which client and server are not the same should be a lot more
stable, but there are scenarios where deadlocks can occur there too.

Note that 2.6.15-rc5 includes a fix that should help with one known
client-side out-of-memory scenario, but since that is more related to
shared mmap(), I doubt it will help you here.
(Continue reading)

Helge Bahmann | 5 Dec 16:41 2005
Picon

Re: NFS over localhost?

On Mon, 5 Dec 2005 Trond Myklebust wrote:

> No, that kind of deadlock is not _supposed_ to happen, but it is very
> hard to avoid. Doing loopback mounts will always confuse the VM.

okay I understand... "Don't do that" then, at least for the moment.

> A setup in which client and server are not the same should be a lot more
> stable, but there are scenarios where deadlocks can occur there too.

The way you phrased it really bothers me because it almost sounds like
"expect some completely random lockups and deal with them" :) Could you
narrow down a little bit more under what conditions such deadlocks can
occur?

e.g. something like "it can only happen if the clients are generating
writes faster than the server can commit to disk AND the server does have
insufficient RAM to handle a load spike" would be very reassuring.

Thanks and best regards
--

-- 
Helge Bahmann <hcb <at> chaoticmind.net>                     /| \__
The past: Smart users in front of dumb terminals       /_|____\
                                                     _/\ |   __)
$ ./configure                                        \\ \|__/__|
checking whether build environment is sane... yes     \\/___/ |
checking for AIX... no (we already did this)            |

-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
(Continue reading)

Trond Myklebust | 5 Dec 17:25 2005
Picon
Picon

Re: NFS over localhost?

On Mon, 2005-12-05 at 16:41 +0100, Helge Bahmann wrote:
> On Mon, 5 Dec 2005 Trond Myklebust wrote:
> 
> > No, that kind of deadlock is not _supposed_ to happen, but it is very
> > hard to avoid. Doing loopback mounts will always confuse the VM.
> 
> okay I understand... "Don't do that" then, at least for the moment.
> 
> > A setup in which client and server are not the same should be a lot more
> > stable, but there are scenarios where deadlocks can occur there too.
> 
> The way you phrased it really bothers me because it almost sounds like
> "expect some completely random lockups and deal with them" :) Could you
> narrow down a little bit more under what conditions such deadlocks can
> occur?
> 
> e.g. something like "it can only happen if the clients are generating
> writes faster than the server can commit to disk AND the server does have
> insufficient RAM to handle a load spike" would be very reassuring.

Problems should in principle be more or less independent of the speed at
which the client can generate writes. A modern CPU can probably write to
every page in memory before you've even got a reply from your server.

It is therefore more a question of "can I trick the VM into exhausting
some critical resource?". For NFS, the critical resources include things
like: memory for the networking stack, memory for the auxiliary daemons
rpc.gssd and rpc.idmapd, etc.

Most other things should be possible to treat by twiddling the knobs
(Continue reading)


Gmane