Myklebust, Trond | 13 Aug 2012 18:58
Picon

Re: [PATCH] NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done

On Mon, 2012-08-13 at 12:26 -0400, Trond Myklebust wrote:
> On Sun, 2012-08-12 at 20:36 +0300, Boaz Harrosh wrote:
> > On 08/09/2012 06:39 PM, Myklebust, Trond wrote:
> > > If the problem is that the DS is failing to respond, how does the client
> > > know that the in-flight I/O has ended?
> > 
> > For the client, the above DS in question, has timed-out, we have reset
> > it's session and closed it's sockets. And all it's RPC requests have
> > been, or are being, ended with a timeout-error. So the timed-out
> > DS is a no-op. All it's IO request will end very soon, if not already.
> > 
> > A DS time-out is just a very valid, and meaningful response, just like
> > an op-done-with-error. This was what Andy added to the RFC's errata
> > which I agree with.
> > 
> > > 
> > > No. It is using the layoutreturn to tell the MDS to fence off I/O to a
> > > data server that is not responding. It isn't attempting to use the
> > > layout after the layoutreturn: 
> > 
> > > the whole point is that we are attempting
> > > write-through-MDS after the attempt to write through the DS timed out.
> > > 
> > 
> > Trond STOP!!! this is pure bullshit. You guys took the opportunity of
> > me being in Hospital, and the rest of the bunch not having a clue. And
> > snuck in a patch that is totally wrong for everyone, not taking care of
> > any other LD *crashes* . And especially when this patch is wrong even for
> > files layout.
> > 
(Continue reading)


Gmane