Robert Gordon | 1 Aug 2007 11:29

Re: Device stateids


Okay, well I'm willing to take a stab at tweaking
the proposal.

----

The NFS client maintains a mapping of device ids contained
in a layout, to the corresponding data server address.
The state of the mapping is represented by a single
recallable device stateid (devstateid). There is at most
one devstateid per { client ID, layout type } pair.

GETDEVICEINFO and GETDEVICELIST each return the devstateid.

The mappings stay in force until the devstateid is
recalled (via CB_RECALL) or lease expiration.

All layouts using the device IDs remain in force, but the
server MUST fence the client from accessing the affected
data servers until the client has obtained the re-mapped
device IDs via GETDEVICEINFO or GETDEVICELIST.

If the server supports CB_NOTIFY and the Client has
requested device id notifications via GETDEVICEINFO
or GETDEVICELIST; The server can inform the client
of an add, modify, or delete mapping.

The CB_NOTIFY notifications are:

NOTIFY4_DELETE_DEVICE_ID
(Continue reading)

Black_David | 11 Aug 2007 18:50

Device stateids and fencing

This text:

> All layouts using the device IDs remain in force, but the
> server MUST fence the client from accessing the affected
> data servers until the client has obtained the re-mapped
> device IDs via GETDEVICEINFO or GETDEVICELIST.

Has the right intent, but is too conservative.  Likely
implementations for the block layout will make the client
responsible for some of the fencing (block layout has to
trust the client in situations where the other layouts
don't have to).

This text should say something like:

All layouts using the device IDs remain in force, but the
client MUST NOT access the affected data servers until the
client has obtained the re-mapped device IDs via
GETDEVICEINFO or GETDEVICELIST.  The client MAY be prevented
from accessing the affected data servers until this is done,
but this prevention (fencing) is OPTIONAL.

Thanks,
--David
----------------------------------------------------
David L. Black, Senior Technologist
EMC Corporation, 176 South St., Hopkinton, MA  01748
+1 (508) 293-7953             FAX: +1 (508) 293-7786
black_david <at> emc.com        Mobile: +1 (978) 394-7754
----------------------------------------------------
(Continue reading)

Noveck, Dave | 20 Aug 2007 17:25
Picon

RE: Device stateids and fencing

I have a great deal of concern about making this change.

The problem that requires data server fencing (at least in my view) is
that you can have situations in which the client can not be presumed to
be aware of the necessity for fencing.  It is not just that you are
worried about a rogue client which doesn't respect the requirements.  I
agree that if that were the case, then the argument that for blocks-type
layout, you are already subject to this issue would be convincing.  

But what we are talking about here happens in response to lease
expiration and to recall (which I interpret to mean when the server
*sends* a recall).  If you are going to make this the responsibility of
the client, how do you deal with cases where you cannot be assured that
the client has knowledge of the necessity for such fencing?  The fact
that lease expiration is one of the situations makes clear that we are
talking about situations in which reliable communication cannot be taken
for granted.

I think that we need a very clear story about how we can reliably (in
all cases) rely on the client to do this fencing before we contemplate
such a change.   

-----Original Message-----
From: Black_David <at> emc.com [mailto:Black_David <at> emc.com] 
Sent: Saturday, August 11, 2007 12:51 PM
To: rbg <at> openrbg.com; nfsv4 <at> ietf.org
Subject: [nfsv4] Device stateids and fencing

This text:

(Continue reading)

Black_David | 22 Aug 2007 02:01

RE: Device stateids and fencing

Dave,

> I have a great deal of concern about making this change.
> 
> The problem that requires data server fencing (at least in my view) is
> that you can have situations in which the client can not be presumed
to
> be aware of the necessity for fencing.  It is not just that you are
> worried about a rogue client which doesn't respect the requirements.
I
> agree that if that were the case, then the argument that for
blocks-type
> layout, you are already subject to this issue would be convincing.  
> 
> But what we are talking about here happens in response to lease
> expiration and to recall (which I interpret to mean when the server
> *sends* a recall).  If you are going to make this the responsibility
of
> the client, how do you deal with cases where you cannot be assured
that
> the client has knowledge of the necessity for such fencing?  The fact
> that lease expiration is one of the situations makes clear that we are
> talking about situations in which reliable communication cannot be
> taken for granted.

At least for lease expiration, I expect the block clients to track
the lease behavior and do the right thing.

> I think that we need a very clear story about how we can reliably (in
> all cases) rely on the client to do this fencing before we contemplate
(Continue reading)

Noveck, Dave | 22 Aug 2007 15:26
Picon

RE: Device stateids and fencing

> At least for lease expiration, I expect the block clients to track 
> the lease behavior and do the right thing.

Maybe that's what you expect but for this to work, the protocol has
to require that this implementation be done, and describe, at least
in general terms, how it is to be done.  Given transmission delays,
for the client to determine when it is possible that the server
may see a lease expiration, is not trivial.  

With your "MUST NOT", all clients would be forced implement this and
I don't think that is reasonable if only blocks needs it.  I guess 
I would be OK, if we make this layout-type specific, and either 
describe this in the blocks document (if only blocks needs this),
or describe it in the base document and then say that it is only
required if you are using a particular layout type that requires
it (and the statement about whether a particular layout type 
requires this would be stated in the document for the layout type).

> For recall and the like, the server waits out the lease expiration, 
> and carries on, relying on the client to observe the lease expiration 
> restrictions.

But there isn't necessarily going to be a lease expiration, that a
client could see, unless the server is going to do something drastic,
like not respond to any COMPOUNDs while a recall is outstanding.
If you respond to requests, the client has to assume that the lease
is being renewed, as it would be exected to be.

Suppose the server does a recall, and the request gets stuck in a 
buggy router, for a minute.  Meanwhile the client is doing stuff,
(Continue reading)

Black_David | 23 Aug 2007 17:08

RE: Device stateids and fencing

Dave,

> > At least for lease expiration, I expect the block clients to track 
> > the lease behavior and do the right thing.
> 
> Maybe that's what you expect but for this to work, the protocol has
> to require that this implementation be done, and describe, at least
> in general terms, how it is to be done.  Given transmission delays,
> for the client to determine when it is possible that the server
> may see a lease expiration, is not trivial.

I may be operating under a false assumption.  I had assumed that
clients would track lease expirations and know when the lease
expired locally.  The server then tracks that against its own clock
and adds a generous allowance for clock skew (this is an error case,
it just has to work, it doesn't have to be fast).  If clients aren't
already doing this, then you definitely have a point about the
additional implementation work:

> With your "MUST NOT", all clients would be forced implement this and
> I don't think that is reasonable if only blocks needs it.  I guess 
> I would be OK, if we make this layout-type specific, and either 
> describe this in the blocks document (if only blocks needs this),
> or describe it in the base document and then say that it is only
> required if you are using a particular layout type that requires
> it (and the statement about whether a particular layout type 
> requires this would be stated in the document for the layout type).

My concerns are definitely block-layout-specific.  If the base
document could describe a) fencing and b) client cessation of
(Continue reading)

Noveck, Dave | 23 Aug 2007 21:48
Picon

RE: Device stateids and fencing

> > > At least for lease expiration, I expect the block clients to track

> > > the lease behavior and do the right thing.
> > 
> > Maybe that's what you expect but for this to work, the protocol has
> > to require that this implementation be done, and describe, at least
> > in general terms, how it is to be done.  Given transmission delays,
> > for the client to determine when it is possible that the server
> > may see a lease expiration, is not trivial.

> I may be operating under a false assumption.  I had assumed that
> clients would track lease expirations and know when the lease
> expired locally. 

I don't know what it really means for the lease to expire locally.
Let's say I have this process that wakes up every lease-time minus
five seconds and a sends message if one hasn't been sent in the 
interim.  So assume that this process never, when the delay 
terminates, finds that it has been awoken more than five seconds
too late then, is it right to say that the lease never expires
"locally"?  If so, so what?  The point is that it matters when the
server receives the message, and you don't know that.  You only
know it happens some time between when you send it and when you 
get the reply.   

> The server then tracks that against its own clock
> and adds a generous allowance for clock skew (this is an error case,
> it just has to work, it doesn't have to be fast).  

You add the generous allowance (five seconds in this case) to make
(Continue reading)

Black_David | 25 Aug 2007 00:58

Client fencing

Dave,

This discussion seems to have morphed into trying to solve the
wrong problem:

> Given that you have N asynchronous parallel messages and you know
> the send time and the reply time of each, how do you safely (false
> positives are OK but false negatives are not) determine whether the 
> server may have a gap of L seconds between receiving successive
> requests, with the only constraint being that each request is 
> received after it is sent (no FTL-moving servers :-), and before 
> the reply is received?

The problem is not that the client needs to know if/when the 
server thinks the lease expired.  The problem is the server
needs to know how long it has to wait to be assured that an
uncommunicative client has stopped using its layouts, making
unilateral reclaim of layout resources safe.  That's a different
problem, and it can be solved in the following fashion:
- After some period of time <X> of inability to talk to the
	server, the client stops using layouts.  The client uses
	a heartbeat operation as needed (e.g., every <X>/2) to
	stay inside of this.  The client side rule is simple:
	if <X> has elapsed since sending of an operation for which
	a server reply was received, then layouts cannot be used
	(if <X> is large enough that possible communication delays
	 are noise with respect to it, it's sufficient to track
	 when replies arrive from the server).  <X> should be set
	reasonably large - 5 seconds strikes me as too short, but
	I don't know what we use in practice off the top of my head.
(Continue reading)

Noveck, Dave | 27 Aug 2007 03:58
Picon

RE: Client fencing

> The problem is not that the client needs to know if/when the server
thinks 
> the lease expired.  

I think it is, and trying to create a second problem that is almost the
same but different just adds to the complexity of the protocol and the
solution.

> The problem is the server needs to know how long it has to wait to be 
> assured that an uncommunicative client has stopped using its layouts, 
> making unilateral reclaim of layout resources safe.  That's a
different 
> problem, 

No, it is the same problem.  Leases are the way v4 determines whether a
client is uncommunicative.  If you create a second concept of
uncommunicativeness that is basically the same except that it replaces
the lease time by some other value <X>, you just make things more
complicated, for no good reason. 

For the client to stop using its layouts, it must be able to determine
that it is uncommunicative, or more precisely it must be able to
determine that it is not assured that the server believes it to be
communicative.  One server judgment about client communicativeness is
governed by lease expiration.  When a lease expires, the server has
judged that the client is uncommunicative.  There is no point in adding
a second concept of uncommunicativeness and then having to deal with
clients that are <X>-uncommunicative and still have a valid lease, or
those that have a lease expired but are not <X>-uncommunicative.  That
makes my head hurt and I suspect that as people try to implement and
(Continue reading)

Benny Halevy | 30 Aug 2007 10:52
Favicon

Re: RE: Client fencing

> I'd like to hear what other people have to say about this.  Not too many
> people have been part of the discussion.  Maybe they're all saying, "I
> wish the two Dave's would just shut up", but if so we'll hear that
> sooner or later.  I'd like to see if there is a group consensus about
> this (either about us shutting up or the underlying issue of what to do
> about this :-).  

Dave N. asked for more opinions so FWIW, here's mine. :)

Given the limitations of the block layout-type the server SHOULD fence off
clients from storage when it revokes the client's layout or deviceid state.
Specific layout types can and should require that as a MUST if possible.

The client SHOULD keep track of the lease expiration time and refrain from
directly accessing any storage device if the lease has not been successfully 
renewed to provide enough time for the I/O operation to be executed, given
the typical timing characteristics of the client and network implementation.
Or has there been an indication for state revocation in the SEQUENCE
results flag. [and there are more details on that as Dave N. mentioned. This
should probably be described in more details in the spec.]

I'm not sure whether specific layout types can require that as a MUST, even
if the client can determine guaranteed deterministic bounds on the I/O
execution time as unless it has a strict real time operating system,
I doubt it can deterministically bound the queuing time of requests
until they are dispatched on the storage network.

Therefore I agree with Dave N. that the server must be the authoritative entity
with regards to layout and device state revocation.

(Continue reading)

Noveck, Dave | 31 Aug 2007 17:44
Picon

RE: RE: Client fencing

> I still feel uncomfortable with letting the server reassign client 
> resources without fencing off clients 

As I understand it, your discomfort is due to the fact that you have no assurance that this approach can be
made to work reliably.  And that means not that you have individual working implementations, but that you
believe that you can define a protocol, which when correctly implemented, will prevent the corruption
you are worried about from ever happening.

On the other hand, we know the fencing approach would work and that is in line with all of the other pnfs
variants. 

> but I believe Dave B. that we're not making the current implementation any worse. 

I do also but I don't think that is the question.  Not requiring fencing seems that it would make the protocol
being specified worse (i.e. less reliable), compared to one that does require fencing.  If there is a
consensus that that gap can be closed so that we believe this change would not create a hole, then that would
be fine.

I'm not naïve.  I know that despite our best efforts including formal review, we may define things that have
holes in them.  We are doing our best to avoid them, mainly because dealing with holes found after the fact is
a real drag.  But we should not close our eyes to a hole that exists in a standards track protocol or knowingly
create one.

Here's where I think we should be.  In this I'm going to use SHOULD and MUST but since these statements are
about what the layout-specific protocols should or must do rather than implementations directly, I'm
not sure what verbiage is appropriate, but I hope I'm being clear in any case:

The protocol MUST ensure that when resources are transferred to another client, they are not used by the
client originally owning them and this must be ensured against any possible combination of partitions
and delays among all of the participants to the protocol (DS, MDS, client).
(Continue reading)

Benny Halevy | 2 Sep 2007 08:44
Favicon

Re: RE: Client fencing

Noveck, Dave wrote:
>> I still feel uncomfortable with letting the server reassign client 
>> resources without fencing off clients 
> 
> As I understand it, your discomfort is due to the fact that you have no assurance that this approach can be
made to work reliably.  And that means not that you have individual working implementations, but that you
believe that you can define a protocol, which when correctly implemented, will prevent the corruption
you are worried about from ever happening.
> 
> On the other hand, we know the fencing approach would work and that is in line with all of the other pnfs
variants. 
> 
>> but I believe Dave B. that we're not making the current implementation any worse. 
> 
> I do also but I don't think that is the question.  Not requiring fencing seems that it would make the protocol
being specified worse (i.e. less reliable), compared to one that does require fencing.  If there is a
consensus that that gap can be closed so that we believe this change would not create a hole, then that would
be fine.
> 
> I'm not naïve.  I know that despite our best efforts including formal review, we may define things that
have holes in them.  We are doing our best to avoid them, mainly because dealing with holes found after the
fact is a real drag.  But we should not close our eyes to a hole that exists in a standards track protocol or
knowingly create one.
> 
> Here's where I think we should be.  In this I'm going to use SHOULD and MUST but since these statements are
about what the layout-specific protocols should or must do rather than implementations directly, I'm
not sure what verbiage is appropriate, but I hope I'm being clear in any case:
> 
> The protocol MUST ensure that when resources are transferred to another client, they are not used by the
client originally owning them and this must be ensured against any possible combination of partitions
(Continue reading)

Black_David | 11 Sep 2007 07:11

RE: RE: Client fencing

Getting back to this after having been diverted by other things going on ... 

> Noveck, Dave wrote:
> >> I still feel uncomfortable with letting the server reassign client 
> >> resources without fencing off clients 
> > 
> > As I understand it, your discomfort is due to the fact that 
> > you have no assurance that this approach can be made to work 
> > reliably.  And that means not that you have individual 
> > working implementations, but that you believe that you can 
> > define a protocol, which when correctly implemented, will 
> > prevent the corruption you are worried about from ever happening.
> > 
> > On the other hand, we know the fencing approach would work 
> > and that is in line with all of the other pnfs variants. 
> > 
> >> but I believe Dave B. that we're not making the current 
> >>implementation any worse. 
> > 
> > I do also but I don't think that is the question.  Not 
> > requiring fencing seems that it would make the protocol being 
> > specified worse (i.e. less reliable), compared to one that 
> > does require fencing.  If there is a consensus that that gap 
> > can be closed so that we believe this change would not create 
> > a hole, then that would be fine.
> > 
> > I'm not naïve.  I know that despite our best efforts 
> > including formal review, we may define things that have holes 
> > in them.  We are doing our best to avoid them, mainly because 
> > dealing with holes found after the fact is a real drag.  But 
(Continue reading)

Robert Gordon | 27 Aug 2007 08:21
Picon

Re: RE: Client fencing


Let's re-wind back to this text...

> This text:
>
>
>> All layouts using the device IDs remain in force, but the
>> server MUST fence the client from accessing the affected
>> data servers until the client has obtained the re-mapped
>> device IDs via GETDEVICEINFO or GETDEVICELIST.
>>
>
> Has the right intent, but is too conservative.  Likely
> implementations for the block layout will make the client
> responsible for some of the fencing (block layout has to
> trust the client in situations where the other layouts
> don't have to).
>
> This text should say something like:
>
> All layouts using the device IDs remain in force, but the
> client MUST NOT access the affected data servers until the
> client has obtained the re-mapped device IDs via
> GETDEVICEINFO or GETDEVICELIST.  The client MAY be prevented
> from accessing the affected data servers until this is done,
> but this prevention (fencing) is OPTIONAL.

I'm 'recalling' my previous "Yes, i prefer that.." comment.

I'm proposing that we state this :-
(Continue reading)

Everhart, Craig | 23 Aug 2007 23:17
Picon

RE: Device stateids and fencing

Even if the client is conservative, as David alludes, about the client's
understanding of when the server thinks its lease will expire, what
happens if the client issues an I/O to a storage device just before its
lease (conservatively) expires?  That I/O request could get delayed for
quite a long time before reaching the storage device.  This seems to be
the basic weakness in client-based fencing.  I don't think you can come
up with a hard bound for that "quite a long time" value.

		Craig

_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4

Black_David | 24 Aug 2007 21:44

RE: Device stateids and fencing

Let me start with Craig's message, then I'll get to Dave's ...

> Even if the client is conservative, as David alludes, about the
client's
> understanding of when the server thinks its lease will expire, what
> happens if the client issues an I/O to a storage device just before
its
> lease (conservatively) expires?  That I/O request could get delayed
for
> quite a long time before reaching the storage device.  This seems to
be
> the basic weakness in client-based fencing.  I don't think you can
come
> up with a hard bound for that "quite a long time" value.

The server needs to add in an allowance beyond when the lease should
have expired at the client.  As for the maximum time that an I/O can
be outstanding, there are practical limits.  In the blocks world, the
usual answer is 30 seconds, as that's the typical timeout used in SCSI
multi-path drivers for determining when to redrive an I/O for which
no response has been seen.  This is supported by engineering of the
infrastructure and devices - for example, Fibre Channel guarantees
to deliver or discard a frame in at most 10 seconds - while typical
delivery times are several orders of magnitude better, that is the
absolute maximum and is usually enforced by configuration limits.

Thanks,
--David
----------------------------------------------------
David L. Black, Senior Technologist
(Continue reading)

Everhart, Craig | 24 Aug 2007 23:32
Picon

RE: Device stateids and fencing

David,

Clearly, we're talking about the error-path scenario, not the typical
one, and just how bad it can get.  It's interesting to know that about
Fibre Channel.  I wonder if there are similar guarantees from other
schemes.  For example, I once heard a story (not from my current
employer) about debugging a problem in which a storage system had
delayed a write operation by tens of times more than what you quote as a
"usual answer"; presumably an awful lot of functionality had piled into
an enormous corner case.

I suppose there are diminishing returns in pNFS blocks clients making
larger and larger allowances, and the *possible* delays are inherent in
the blocks model.  Still, it would be attractive to have an end-to-end
case, particularly because we're mostly talking about some possibly
nasty corruption imposed by a stray write.  Individual disks can suffer
those with low, but noticeable, probability; but storage systems like to
defend agains those things.  In this case, there really aren't any
options for doing so.  It may be the state of the practice, for better
or worse.

		Craig

> -----Original Message-----
> From: Black_David <at> emc.com [mailto:Black_David <at> emc.com] 
> Sent: Friday, August 24, 2007 3:44 PM
> To: Everhart, Craig; nfsv4 <at> ietf.org
> Subject: RE: [nfsv4] Device stateids and fencing
> 
> Let me start with Craig's message, then I'll get to Dave's ...
(Continue reading)

Robert Gordon | 13 Aug 2007 21:23

Re: Device stateids and fencing


On Aug 11, 2007, at 11:50 AM, Black_David <at> emc.com wrote:

> This text should say something like:
>
> All layouts using the device IDs remain in force, but the
> client MUST NOT access the affected data servers until the
> client has obtained the re-mapped device IDs via
> GETDEVICEINFO or GETDEVICELIST.  The client MAY be prevented
> from accessing the affected data servers until this is done,
> but this prevention (fencing) is OPTIONAL.

Yes, i prefer that..

Robert. 

_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4

Noveck, Dave | 1 Aug 2007 15:37
Picon

RE: Device stateids

> The mappings stay in force until the devstateid is recalled 
> (via CB_RECALL) or lease expiration. 

For delegations, for example, recall is only a notification, and you
have the delegation until you return it (or it is revoked).  Is there a
way to return devstateid's?  If not, and this happens purely through
recall, when?  Presmably at the point the server receives the response
to the recall.

The state management chapter defines a general framework for freeing of
stateid's.  In addition to loss due to lease expiration, there is client
reboot, ADMIN_REVOKED which I would assume also applies here, and the
non-return of recallable state (the error now is DELEG_REVOKED but maybe
the name should be changed).

-----Original Message-----
From: Robert Gordon [mailto:rbg <at> openrbg.com] 
Sent: Wednesday, August 01, 2007 5:29 AM
To: NFSv4
Subject: Re: [nfsv4] Device stateids

Okay, well I'm willing to take a stab at tweaking the proposal.

----

The NFS client maintains a mapping of device ids contained in a layout,
to the corresponding data server address.
The state of the mapping is represented by a single recallable device
stateid (devstateid). There is at most one devstateid per { client ID,
layout type } pair.
(Continue reading)

Robert Gordon | 3 Aug 2007 02:21

Re: Device stateids


On Aug 1, 2007, at 8:37 AM, Noveck, Dave wrote:

>> The mappings stay in force until the devstateid is recalled
>> (via CB_RECALL) or lease expiration.
>
> For delegations, for example, recall is only a notification, and you
> have the delegation until you return it (or it is revoked).

We could treat the device_mapping information as a file in that
the client holds a read delegation on that file, thus a delegation
stateid. It is just that the proposal called this special delegation
stateid the "device stateid".

> Is there a way to return devstateid's?

Since we treat the devstateid as a delegation stateid, then as the
proposal said we can use DELEGRETURN.

> If not, and this happens purely through recall, when?
> Presumably at the point the server receives the response
> to the recall.
> The state management chapter defines a general framework for  
> freeing of
> stateid's.  In addition to loss due to lease expiration, there is  
> client
> reboot, ADMIN_REVOKED which I would assume also applies here, and the
> non-return of recallable state (the error now is DELEG_REVOKED but  
> maybe
> the name should be changed).
(Continue reading)

Benny Halevy | 8 Aug 2007 17:58
Favicon

Re: Device stateids

Sorry for taking so long to comment on that.
Anyway, my comments are inline below.

Robert Gordon wrote:
> On Aug 1, 2007, at 8:37 AM, Noveck, Dave wrote:
> 
>>> The mappings stay in force until the devstateid is recalled
>>> (via CB_RECALL) or lease expiration.
>> For delegations, for example, recall is only a notification, and you
>> have the delegation until you return it (or it is revoked).

right

> 
> We could treat the device_mapping information as a file in that
> the client holds a read delegation on that file, thus a delegation
> stateid. It is just that the proposal called this special delegation
> stateid the "device stateid".
> 
>> Is there a way to return devstateid's?
> 
> Since we treat the devstateid as a delegation stateid, then as the
> proposal said we can use DELEGRETURN.

Although this is technically possible I worry about using filehandles
to refer to something that is not a filesystem object as it may have a
profound effect on the protocol as we might need to specify for every use
case of a filehandle what to do about these "special" filehandles, what
error to return, etc.  Also, overloading the filehandle is very weird
for the server when decoding it as, for example, in the linux server
(Continue reading)

Robert Gordon | 9 Aug 2007 08:33

Re: Device stateids


In general i think this is good.

I've started to think that the SEQUENCE status flags could just be
two bits.

SEQ4_STATUS_DEVID_DELETED_ALL
	This flag stays on until the client issues a DELEGRETURN
	using the devstateid.

SEQ_STATUS_DEVID_CHANGED
	This flag stay on until a GETDEVICELIST or GETDEVICEINFO
	(if using CB_NOTIFY) gets the affected mappings.

Robert.

On Aug 8, 2007, at 10:58 AM, Benny Halevy wrote:

> The NFS client maintains a mapping of device ids contained
> in a layout, to the corresponding storage device address.
> The state of the mapping is represented by a single
> recallable device stateid (devstateid). There is at most
> one devstateid per { client ID, layout type } pair.
>
> GETDEVICEINFO and GETDEVICELIST each take the layout type and
> an optional devstateid and return the devstateid.  In case
> the client holds a valid devstateid it MUST use it for subsequent
> GETDEVICEINFO and GETDEVICELIST that update the device ID
> mappings.  The devstateid argument is invalid whenever the
> client has no valid devstateid (e.g. after reboot or after
(Continue reading)

Benny Halevy | 9 Aug 2007 09:22
Favicon

Re: Device stateids

Robert Gordon wrote:
> In general i think this is good.
> 
> I've started to think that the SEQUENCE status flags could just be
> two bits.
> 
> SEQ4_STATUS_DEVID_DELETED_ALL
> 	This flag stays on until the client issues a DELEGRETURN
> 	using the devstateid.
> 
> SEQ_STATUS_DEVID_CHANGED
> 	This flag stay on until a GETDEVICELIST or GETDEVICEINFO
> 	(if using CB_NOTIFY) gets the affected mappings.

The DEVID_DELETED cases should be covered by CHANGED as the client will
have to revalidate all the device mappings anyway to see what actually changed
and should either not see the deleted deviceid returned by GETDEVICELIST
or get an error from GETDEVICEINFO for that deviceid.

The DEVID_ADDED case is covered if the client is doing GETDEVICELIST
and the server supports it.  If the server does not support GETDEVICELIST
is must not set SEQ_STATUS_DEVID_CHANGED for added devices as the client
has no way to get the new device ID.  These will be discovered only
when the client will get new layouts referring to the new devices
(which is perfectly fine)

> 
> Robert.
> 
> On Aug 8, 2007, at 10:58 AM, Benny Halevy wrote:
(Continue reading)

Benny Halevy | 29 Oct 2007 16:45
Favicon

Re: Device stateids

During the most recent bakeathon concern has been raised
about the global device ID namespace and Marc Eshel from
IBM requested that we retain the ability to recall
devices per-fsid.  To satisfy this requirement as well
as the requirement for a simple global device ID namespace
the updated proposal below introduces a device-set identifier
(devsetid) that qualifies the device ID.

This provides for a more flexible deviceid namespace topology that
can cover either a global deviceid namespace, per-fsid namespace, or
anything in between (e.g. when several filesystems reside on (and are
confined to) a set of devices, like in EMC or Panasas' cases).

The proposal also introduces new operations to manage device mapping
state rather than unnecessarily overloading existing operations.

Plus, I simplified the proposal by not requiring any new sequence flags
as the server should use the proposed callback operation to recall
device mappings and use the SEQ4_STATUS_RECALLABLE_STATE_REVOKED and 
SEQ4_STATUS_CB_PATH_DOWN sequence flags as appropriate the same it
may use them for other recallable state objects (locks and layouts).

One last change is providing for a "wildcard" value (0) for device ID
that specifies all the device IDs in a devset.  This means that
the server MUST NOT return a zero device ID in GETDEVICELIST
or in any layout and it must reject it as invalid in GETDEVICEINFO.
I guess that this won't be an issue for anyone; otherwise, please
holler.

Benny
(Continue reading)

Noveck, Dave | 31 Oct 2007 01:24
Picon

RE: Device stateids

> Plus, I simplified the proposal by not requiring any new sequence
> flags as the server should use the proposed callback operation to
> recall device mappings and use the
SEQ4_STATUS_RECALLABLE_STATE_REVOKED 
> and SEQ4_STATUS_CB_PATH_DOWN sequence flags as appropriate the 
> same it may use them for other recallable state objects (locks and 
> layouts).

As far as I can see you don't you them the same and that is an issue. 

> The state of the mapping is represented by a recallable device 
> stateid (devstateid). There is at most one devstateid per { client ID,

> layout type, devsetid } nexus.

> A recommended filesystem attribute - devsetid is introduced.

Notice that this arrangement means that fsid A and fsid B share a 
devsetid or layout type X iff they share it for layout type Y.
It seems that layout type is the one that should have the flexibility
to figure out how it arranges its devices.

I think a better model is to divide device id into major and minor
fields and then all devices sharing a major device value become a device
set.  I think this would give you a lot more flexibility.

> DEVICERETURN returns a device ID mapping.  When called, this client 
> MUST guarantee not to make further use of that device ID mapping.

Do you really mean that?  Isn't this like a CLOSE if which if you used 
(Continue reading)

Benny Halevy | 31 Oct 2007 18:59
Favicon

Re: Device stateids

Dave, thanks for the comments!
My answers inline below.

Benny

On Oct. 31, 2007, 2:24 +0200, "Noveck, Dave" <Dave.Noveck <at> netapp.com> wrote:
>> Plus, I simplified the proposal by not requiring any new sequence
>> flags as the server should use the proposed callback operation to
>> recall device mappings and use the
> SEQ4_STATUS_RECALLABLE_STATE_REVOKED 
>> and SEQ4_STATUS_CB_PATH_DOWN sequence flags as appropriate the 
>> same it may use them for other recallable state objects (locks and 
>> layouts).
> 
> As far as I can see you don't you them the same and that is an issue. 

OK. we can define SEQ4_STATUS_DEVID_DELETED* in the spirit of the
last proposal [that I see is now part of draft-15 which seems to
introduce inconsistencies with the side effects of LAYOUTRETURN
for RETURN_{FSID,ALL} that return the respective device mappings
but do not change the device stateid]

> 
>> The state of the mapping is represented by a recallable device 
>> stateid (devstateid). There is at most one devstateid per { client ID,
> 
>> layout type, devsetid } nexus.
> 
>> A recommended filesystem attribute - devsetid is introduced.
> 
(Continue reading)

Benny Halevy | 5 Nov 2007 11:47
Favicon

Re: Device stateids (v3)

I updated and hopefully simplified the proposal based on Dave's
comments.

- device ID major/minor is used instead of devsetid.
- CB_DEVICERECALL merged into CB_DEVICENOTIFY (now used
  for deletions, changes, and additions)
- removed DEVICERETURN
- defined new SEQ4_STATUS_DEVICE_STATE_REVOKED sequence flag.

Benny
--
1. Introduction

The NFS client maintains a mapping of device IDs contained in a layout
to the corresponding storage device addresses. Device IDs are comprised
of { di_major, di_minor } allowing the server to partition the device
ID namespace along its file systems architecture boundaries (e.g., each
di_major can be associated with a filesystem).

The state of the mapping is represented by a recallable device stateid
(devstateid). There is at most one devstateid per { client ID, layout
type } nexus.

GETDEVICELIST and GETDEVICEINFO are used to retrieve the device ID
mapping information.  CB_DEVICENOTIFY is a callback used to notify
the client of changes to the device ID mapping.
SEQ4_STATUS_DEVICE_STATE_REVOKED is new sequence flag used to
notify the client the its device ID mapping state has been revoked
if the server failed to notify the client of changes to the device 
ID mapping via CB_DEVICENOTIFY.
(Continue reading)

Robert Gordon | 5 Nov 2007 19:33

Re: Device stateids (v3)


On Nov 5, 2007, at 4:47 AM, Benny Halevy wrote:

> I updated and hopefully simplified the proposal based on Dave's
> comments.
>
> - device ID major/minor is used instead of devsetid.
> - CB_DEVICERECALL merged into CB_DEVICENOTIFY (now used
>  for deletions, changes, and additions)
> - removed DEVICERETURN
> - defined new SEQ4_STATUS_DEVICE_STATE_REVOKED sequence flag.
>
> Benny

So, in this proposal are you suggesting that CB_DEVICENOTIFY
is a mandatory operation to implement at the client ?

also, i had thought that the deviceid moving from 32 bits to 64
was to accommodate maj/minor partitioning; now we have a 128 bit
deviceid ?? is that really necessary ?

Robert. 

_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4

Benny Halevy | 5 Nov 2007 19:55
Favicon

Re: Device stateids (v3)

On Nov. 05, 2007, 20:33 +0200, Robert Gordon <rbg <at> openrbg.com> wrote:
> On Nov 5, 2007, at 4:47 AM, Benny Halevy wrote:
> 
>> I updated and hopefully simplified the proposal based on Dave's
>> comments.
>>
>> - device ID major/minor is used instead of devsetid.
>> - CB_DEVICERECALL merged into CB_DEVICENOTIFY (now used
>>  for deletions, changes, and additions)
>> - removed DEVICERETURN
>> - defined new SEQ4_STATUS_DEVICE_STATE_REVOKED sequence flag.
>>
>> Benny
> 
> So, in this proposal are you suggesting that CB_DEVICENOTIFY
> is a mandatory operation to implement at the client ?

yes, pretty much as CB_DEVICERECALL was in the last proposal.
The client can ignore the ADDED notifications though if it wants.

> 
> also, i had thought that the deviceid moving from 32 bits to 64
> was to accommodate maj/minor partitioning; now we have a 128 bit
> deviceid ?? is that really necessary ?

not exactly.  The move to 64 bit was to accommodate for the existing
64-bit device IDs we have implemented in panfs.  The 64-bit major
is to match fsid.major.  This shouldn't be a big deal for files
as you have very few device IDs and only one in each layout.
For objects I thought of changing the pnfs-obj layout to have one major
(Continue reading)

Noveck, Dave | 5 Nov 2007 22:39
Picon

RE: Device stateids (v3)

> > So, in this proposal are you suggesting that CB_DEVICENOTIFY
> > is a mandatory operation to implement at the client ?
>
> yes, pretty much as CB_DEVICERECALL was in the last proposal.
> The client can ignore the ADDED notifications though if it wants.

pNFS is not a mandatory feature so I assume that a client can
return NFS4ERR_NOTSUPP.  Are we saying you MUST NOT do that if
you have done one of the pNFS operations?

-----Original Message-----
From: Benny Halevy [mailto:bhalevy <at> panasas.com] 
Sent: Monday, November 05, 2007 1:56 PM
To: Robert Gordon
Cc: Noveck, Dave; Marc Eshel; NFSv4
Subject: Re: [nfsv4] Device stateids (v3)

On Nov. 05, 2007, 20:33 +0200, Robert Gordon <rbg <at> openrbg.com> wrote:
> On Nov 5, 2007, at 4:47 AM, Benny Halevy wrote:
> 
>> I updated and hopefully simplified the proposal based on Dave's
>> comments.
>>
>> - device ID major/minor is used instead of devsetid.
>> - CB_DEVICERECALL merged into CB_DEVICENOTIFY (now used
>>  for deletions, changes, and additions)
>> - removed DEVICERETURN
>> - defined new SEQ4_STATUS_DEVICE_STATE_REVOKED sequence flag.
>>
>> Benny
(Continue reading)

Benny Halevy | 5 Nov 2007 22:57
Favicon

Re: Device stateids (v3)

On Nov. 05, 2007, 23:39 +0200, "Noveck, Dave" <Dave.Noveck <at> netapp.com> wrote:
>>> So, in this proposal are you suggesting that CB_DEVICENOTIFY
>>> is a mandatory operation to implement at the client ?
>> yes, pretty much as CB_DEVICERECALL was in the last proposal.
>> The client can ignore the ADDED notifications though if it wants.
> 
> pNFS is not a mandatory feature so I assume that a client can
> return NFS4ERR_NOTSUPP.  Are we saying you MUST NOT do that if
> you have done one of the pNFS operations?

This will make sense. and the same for CB_LAYOUTRECALL.

for a MDS supporting pNFS, I'd say that the following operations must be implemented:
 GETDEVICEINFO, (GETDEVICELIST can remain optional)
 LAYOUTCOMMIT,
 LAYOUTGET,
 LAYOUTRETURN

for a client that supports pNFS and issued one of the pNFS operations:
 CB_DEVICENOTIFY,
 CB_LAYOUTRECALL (if issued LAYOUTGET)

> 
> 
> -----Original Message-----
> From: Benny Halevy [mailto:bhalevy <at> panasas.com] 
> Sent: Monday, November 05, 2007 1:56 PM
> To: Robert Gordon
> Cc: Noveck, Dave; Marc Eshel; NFSv4
> Subject: Re: [nfsv4] Device stateids (v3)
(Continue reading)

Robert Gordon | 6 Nov 2007 17:09
Picon

Re: Device stateids (v3)

On Nov 5, 2007, at 3:57 PM, Benny Halevy wrote:

> for a MDS supporting pNFS, I'd say that the following operations  
> must be implemented:
>  GETDEVICEINFO, (GETDEVICELIST can remain optional)

I'm generating a table that lists the mandatory/optional/mandatory  
not to implement
disposition for each operation.

I was surprised to see benny claim that GETDEVICELIST is optional;  
Since no
one has objected .. then I'll assume it IS optional.

--
Robert...
"Don't Make Assumptions"

_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4

Marc Eshel | 6 Nov 2007 18:06
Picon
Favicon

Re: Device stateids (v3)

It is not enough to say that GETDEVICELIST can return an empty list an any 
server that don't want to implement it just return an empty list?
Marc.

Robert Gordon <Robert.Gordon <at> Sun.COM> wrote on 11/06/2007 08:09:12 AM:

> On Nov 5, 2007, at 3:57 PM, Benny Halevy wrote:
> 
> > for a MDS supporting pNFS, I'd say that the following operations 
> > must be implemented:
> >  GETDEVICEINFO, (GETDEVICELIST can remain optional)
> 
> 
> I'm generating a table that lists the mandatory/optional/mandatory 
> not to implement
> disposition for each operation.
> 
> I was surprised to see benny claim that GETDEVICELIST is optional; 
> Since no
> one has objected .. then I'll assume it IS optional.
> 
> --
> Robert...
> "Don't Make Assumptions"
> 
> 
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4 <at> ietf.org
(Continue reading)

Spencer Shepler | 6 Nov 2007 18:52
Picon

Re: Device stateids (v3)


On Nov 6, 2007, at 11:06 AM, Marc Eshel wrote:

> It is not enough to say that GETDEVICELIST can return an empty list  
> an any
> server that don't want to implement it just return an empty list?

No.  If the server is returning an empty list than that means
something very different than the operation is not supported.
It is misleading to the client even if the words in the specification
were to say something to the contrary

Spencer

> Marc.
>
> Robert Gordon <Robert.Gordon <at> Sun.COM> wrote on 11/06/2007 08:09:12 AM:
>
>> On Nov 5, 2007, at 3:57 PM, Benny Halevy wrote:
>>
>>> for a MDS supporting pNFS, I'd say that the following operations
>>> must be implemented:
>>>  GETDEVICEINFO, (GETDEVICELIST can remain optional)
>>
>>
>> I'm generating a table that lists the mandatory/optional/mandatory
>> not to implement
>> disposition for each operation.
>>
>> I was surprised to see benny claim that GETDEVICELIST is optional;
(Continue reading)

Benny Halevy | 6 Nov 2007 19:10
Favicon

Re: Device stateids (v3)

On Nov. 06, 2007, 19:52 +0200, Spencer Shepler <Spencer.Shepler <at> Sun.COM> wrote:
> On Nov 6, 2007, at 11:06 AM, Marc Eshel wrote:
> 
>> It is not enough to say that GETDEVICELIST can return an empty list  
>> an any
>> server that don't want to implement it just return an empty list?
> 
> No.  If the server is returning an empty list than that means
> something very different than the operation is not supported.
> It is misleading to the client even if the words in the specification
> were to say something to the contrary

Agreed.

Benny

> 
> Spencer
> 
>> Marc.
>>
>> Robert Gordon <Robert.Gordon <at> Sun.COM> wrote on 11/06/2007 08:09:12 AM:
>>
>>> On Nov 5, 2007, at 3:57 PM, Benny Halevy wrote:
>>>
>>>> for a MDS supporting pNFS, I'd say that the following operations
>>>> must be implemented:
>>>>  GETDEVICEINFO, (GETDEVICELIST can remain optional)
>>>
>>> I'm generating a table that lists the mandatory/optional/mandatory
(Continue reading)

Marc Eshel | 6 Nov 2007 18:58
Picon
Favicon

Re: Device stateids (v3)

Spencer.Shepler <at> Sun.COM wrote on 11/06/2007 09:52:11 AM:

> 
> On Nov 6, 2007, at 11:06 AM, Marc Eshel wrote:
> 
> > It is not enough to say that GETDEVICELIST can return an empty list 
> > an any
> > server that don't want to implement it just return an empty list?
> 
> No.  If the server is returning an empty list than that means
> something very different than the operation is not supported.
> It is misleading to the client even if the words in the specification
> were to say something to the contrary

How does it help the client to get not supported vs. empty list ? what can 
the client do differently based on the 2 different responses ?
Marc.

> 
> Spencer
> 
> > Marc.
> >
> > Robert Gordon <Robert.Gordon <at> Sun.COM> wrote on 11/06/2007 08:09:12 AM:
> >
> >> On Nov 5, 2007, at 3:57 PM, Benny Halevy wrote:
> >>
> >>> for a MDS supporting pNFS, I'd say that the following operations
> >>> must be implemented:
> >>>  GETDEVICEINFO, (GETDEVICELIST can remain optional)
(Continue reading)

Spencer Shepler | 6 Nov 2007 19:06
Picon

Re: Device stateids (v3)


On Nov 6, 2007, at 11:58 AM, Marc Eshel wrote:

> Spencer.Shepler <at> Sun.COM wrote on 11/06/2007 09:52:11 AM:
>
>>
>> On Nov 6, 2007, at 11:06 AM, Marc Eshel wrote:
>>
>>> It is not enough to say that GETDEVICELIST can return an empty list
>>> an any
>>> server that don't want to implement it just return an empty list?
>>
>> No.  If the server is returning an empty list than that means
>> something very different than the operation is not supported.
>> It is misleading to the client even if the words in the specification
>> were to say something to the contrary
>
> How does it help the client to get not supported vs. empty list ?  
> what can
> the client do differently based on the 2 different responses ?

an operation level error of ENOTSUPP is very clear where an empty list
may actually be a valid response where the server may support
the GETDEVICELIST operation but just not have any deviceids available
at that point in time.

Spencer

>>> Robert Gordon <Robert.Gordon <at> Sun.COM> wrote on 11/06/2007  
>>> 08:09:12 AM:
(Continue reading)

Noveck, Dave | 6 Nov 2007 17:17
Picon

RE: Device stateids (v3)

pnfs is an optional feature so all the ops associated with have to
be optional.  Maybe for things like this that are mandatory if you
implement a particular feature but the feature is optional, the 
table should give the name of the particular feature, rather than
simply saying "optional".

-----Original Message-----
From: Robert Gordon [mailto:Robert.Gordon <at> Sun.COM] 
Sent: Tuesday, November 06, 2007 11:09 AM
To: NFSv4
Subject: Re: [nfsv4] Device stateids (v3)

On Nov 5, 2007, at 3:57 PM, Benny Halevy wrote:

> for a MDS supporting pNFS, I'd say that the following operations  
> must be implemented:
>  GETDEVICEINFO, (GETDEVICELIST can remain optional)

I'm generating a table that lists the mandatory/optional/mandatory  
not to implement
disposition for each operation.

I was surprised to see benny claim that GETDEVICELIST is optional;  
Since no
one has objected .. then I'll assume it IS optional.

--
Robert...
"Don't Make Assumptions"

(Continue reading)

Spencer Shepler | 6 Nov 2007 18:50
Picon

pNFS mandatory operations (was Re: Device stateids (v3))


On Nov 6, 2007, at 10:17 AM, Noveck, Dave wrote:

> pnfs is an optional feature so all the ops associated with have to
> be optional.  Maybe for things like this that are mandatory if you
> implement a particular feature but the feature is optional, the
> table should give the name of the particular feature, rather than
> simply saying "optional".

Yes, it should.  LAYOUTGET wouldn't be very helpful unless
LAYOUTRETURN were also implemented.

Without further justification to the contrary, I believe that
GETDEVICELIST should be mandatory to implement if the pNFS feature is
supported.

Spencer

>
> -----Original Message-----
> From: Robert Gordon [mailto:Robert.Gordon <at> Sun.COM]
> Sent: Tuesday, November 06, 2007 11:09 AM
> To: NFSv4
> Subject: Re: [nfsv4] Device stateids (v3)
>
>
> On Nov 5, 2007, at 3:57 PM, Benny Halevy wrote:
>
>> for a MDS supporting pNFS, I'd say that the following operations
>> must be implemented:
(Continue reading)

Benny Halevy | 6 Nov 2007 19:36
Favicon

Re: pNFS mandatory operations (was Re: Device stateids (v3))

On Nov. 06, 2007, 19:50 +0200, Spencer Shepler <Spencer.Shepler <at> Sun.COM> wrote:
> On Nov 6, 2007, at 10:17 AM, Noveck, Dave wrote:
> 
>> pnfs is an optional feature so all the ops associated with have to
>> be optional.  Maybe for things like this that are mandatory if you
>> implement a particular feature but the feature is optional, the
>> table should give the name of the particular feature, rather than
>> simply saying "optional".
> 
> Yes, it should.  LAYOUTGET wouldn't be very helpful unless
> LAYOUTRETURN were also implemented.
> 
> Without further justification to the contrary, I believe that
> GETDEVICELIST should be mandatory to implement if the pNFS feature is
> supported.

why?
Our server has no idea of a device list at the moment but it knows
to get the device info for each device Id it returns in a layout.
This must be sufficient for the client.

Benny

> 
> Spencer
> 
> 
> 
>> -----Original Message-----
>> From: Robert Gordon [mailto:Robert.Gordon <at> Sun.COM]
(Continue reading)

Spencer Shepler | 6 Nov 2007 19:51
Picon

Re: pNFS mandatory operations (was Re: Device stateids (v3))


On Nov 6, 2007, at 12:36 PM, Benny Halevy wrote:

> On Nov. 06, 2007, 19:50 +0200, Spencer Shepler  
> <Spencer.Shepler <at> Sun.COM> wrote:
>> On Nov 6, 2007, at 10:17 AM, Noveck, Dave wrote:
>>
>>> pnfs is an optional feature so all the ops associated with have to
>>> be optional.  Maybe for things like this that are mandatory if you
>>> implement a particular feature but the feature is optional, the
>>> table should give the name of the particular feature, rather than
>>> simply saying "optional".
>>
>> Yes, it should.  LAYOUTGET wouldn't be very helpful unless
>> LAYOUTRETURN were also implemented.
>>
>> Without further justification to the contrary, I believe that
>> GETDEVICELIST should be mandatory to implement if the pNFS feature is
>> supported.
>
> why?
> Our server has no idea of a device list at the moment but it knows
> to get the device info for each device Id it returns in a layout.
> This must be sufficient for the client.

You have provided the justification.  Your server will refuse to
or is incapable of implementing a mechanism to support device lists.

Spencer

(Continue reading)

Robert Gordon | 6 Nov 2007 17:26
Picon

Re: Device stateids (v3)


On Nov 6, 2007, at 10:17 AM, Noveck, Dave wrote:

> pnfs is an optional feature so all the ops associated with have to
> be optional.  Maybe for things like this that are mandatory if you
> implement a particular feature but the feature is optional, the
> table should give the name of the particular feature, rather than
> simply saying "optional".

I agree, the table will reflect this.

I think that benny was saying that if you implement the
optional feature pnfs, the GETDEVICELIST operation would
be optional to implement.

--
Robert...
"Don't Make Assumptions"

_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4

Benny Halevy | 6 Nov 2007 19:08
Favicon

Re: Device stateids (v3)

On Nov. 06, 2007, 18:26 +0200, Robert Gordon <Robert.Gordon <at> Sun.COM> wrote:
> On Nov 6, 2007, at 10:17 AM, Noveck, Dave wrote:
> 
>> pnfs is an optional feature so all the ops associated with have to
>> be optional.  Maybe for things like this that are mandatory if you
>> implement a particular feature but the feature is optional, the
>> table should give the name of the particular feature, rather than
>> simply saying "optional".
> 
> I agree, the table will reflect this.
> 
> I think that benny was saying that if you implement the
> optional feature pnfs, the GETDEVICELIST operation would
> be optional to implement.

Correct.
It is an optimization.  pNFS should work flawlessly just
by using GETDEVICEINFO on-demand whenever the client needs
to resolve a device ID that it hasn't seen before or that
fell out of its cache.

Benny

> 
> --
> Robert...
> "Don't Make Assumptions"

_______________________________________________
nfsv4 mailing list
(Continue reading)

Mike Eisler | 6 Nov 2007 11:41
Picon
Favicon

Re: Device stateids (v3)


--- Benny Halevy <bhalevy <at> panasas.com> wrote:

> for a client that supports pNFS and issued one of the pNFS
> operations:
>  CB_DEVICENOTIFY,
>  CB_LAYOUTRECALL (if issued LAYOUTGET)

Why can't CB_NOTIFY be used in instead of CB_DEVICENOTIFY?

_______________________________________________
nfsv4 mailing list
nfsv4 <at> ietf.org
https://www1.ietf.org/mailman/listinfo/nfsv4

Benny Halevy | 6 Nov 2007 13:04
Favicon

Re: Device stateids (v3)

Mike Eisler wrote:
> --- Benny Halevy <bhalevy <at> panasas.com> wrote:
>
>
>   
>> for a client that supports pNFS and issued one of the pNFS
>> operations:
>>  CB_DEVICENOTIFY,
>>  CB_LAYOUTRECALL (if issued LAYOUTGET)
>>     
>
> Why can't CB_NOTIFY be used in instead of CB_DEVICENOTIFY?
>
>   
Well, it can, but I think overloading a callback meant for directory 
notifications
doesn't make the the client's life any easier. The fact that device 
mapping resemble
directory delegations by the fact that entries can be added, changed, or 
deleted
from the state is insufficient IMO since the actual notification contents
are totally different.

Having a separate callback operation for device notifications rather 
than doing both
device and directory notification using the same callback is simpler to 
understand and to
implement. since the respective xdr routine and the upper layer service 
wouldn't have
to fork on the notification type to see if they're dealing with a 
(Continue reading)

Mike Eisler | 6 Nov 2007 13:44
Picon
Favicon

Re: Device stateids (v3)


--- Benny Halevy <bhalevy <at> panasas.com> wrote:

> Mike Eisler wrote:
> > --- Benny Halevy <bhalevy <at> panasas.com> wrote:
> >
> >
> >   
> >> for a client that supports pNFS and issued one of the pNFS
> >> operations:
> >>  CB_DEVICENOTIFY,
> >>  CB_LAYOUTRECALL (if issued LAYOUTGET)
> >>     
> >
> > Why can't CB_NOTIFY be used in instead of CB_DEVICENOTIFY?
> >
> >   
> Well, it can, but I think overloading a callback meant for directory 

And in draft-15, it does.

This change was made in draft-15 based on a perceived rough
consensus reach months ago. I've read reports of complaints
during last months bake-a-thon, but those complaints were never
aired on the WG list or in an official WG meeting.

There isn't great deal of time left to make edits and we've got
substantive changes to make. Unfortunately, adding a new op
is something that we don't always get right the first time.

(Continue reading)

Benny Halevy | 6 Nov 2007 15:44
Favicon

Re: Device stateids (v3)

Mike Eisler wrote:
> --- Benny Halevy <bhalevy <at> panasas.com> wrote:
>
>   
>> Mike Eisler wrote:
>>     
>>> --- Benny Halevy <bhalevy <at> panasas.com> wrote:
>>>
>>>
>>>   
>>>       
>>>> for a client that supports pNFS and issued one of the pNFS
>>>> operations:
>>>>  CB_DEVICENOTIFY,
>>>>  CB_LAYOUTRECALL (if issued LAYOUTGET)
>>>>     
>>>>         
>>> Why can't CB_NOTIFY be used in instead of CB_DEVICENOTIFY?
>>>
>>>   
>>>       
>> Well, it can, but I think overloading a callback meant for directory 
>>     
>
> And in draft-15, it does.
>
> This change was made in draft-15 based on a perceived rough
> consensus reach months ago. I've read reports of complaints
> during last months bake-a-thon, but those complaints were never
> aired on the WG list or in an official WG meeting.
(Continue reading)

Robert Gordon | 31 Oct 2007 22:44
Picon

Re: Device stateids


On Oct 31, 2007, at 12:59 PM, Benny Halevy wrote:

>>> The state of the mapping is represented by a recallable device
>>> stateid (devstateid). There is at most one devstateid per  
>>> { client ID,
>>
>>> layout type, devsetid } nexus.
>>
>>> A recommended filesystem attribute - devsetid is introduced.
>>
>> Notice that this arrangement means that fsid A and fsid B share a
>> devsetid or layout type X iff they share it for layout type Y.
>> It seems that layout type is the one that should have the flexibility
>> to figure out how it arranges its devices.
>>
>> I think a better model is to divide device id into major and minor
>> fields and then all devices sharing a major device value become a  
>> device
>> set.  I think this would give you a lot more flexibility.
>
> Yeah, I see your point. Good idea.
>

An alternative way of looking at this is to say a device stateid  
represents
a collection of device id's. The Server can then have the flexibility to
arrange the grouping how it likes. One server implementation may wish to
group things by MDS FSID while another implementation may wish to group
things by some other criteria or just have one stateid that represents
(Continue reading)

Noveck, Dave | 29 Oct 2007 20:57
Picon

RE: Device stateids


I've been hearing a lot talk about getting together a spec by 11/19
(final final deadline for next ietf) that would serve as the basis for a
group last-call, although I've been saying "how about 'group
penultimate-call'".  As part of that, I'm trying to get to closure on
the op to error mapping and I didn't expect to see a proposal at this
point that would add 3 new ops, and a whole bunch of errors to two
existing ops.  If this is really needed, then we'll do what we have to
do, but I'm not clear on the justification for something so different
from what had been previously agreed to. 

Is this stuff really a requirement, or is this a case of preferring to
have bells and whistles, that we can do without?

-----Original Message-----
From: Benny Halevy [mailto:bhalevy <at> panasas.com] 
Sent: Monday, October 29, 2007 11:45 AM
To: Robert Gordon; Marc Eshel
Cc: NFSv4
Subject: Re: [nfsv4] Device stateids

During the most recent bakeathon concern has been raised
about the global device ID namespace and Marc Eshel from
IBM requested that we retain the ability to recall
devices per-fsid.  To satisfy this requirement as well
as the requirement for a simple global device ID namespace
the updated proposal below introduces a device-set identifier
(devsetid) that qualifies the device ID.

This provides for a more flexible deviceid namespace topology that
(Continue reading)

Marc Eshel | 29 Oct 2007 22:28
Picon
Favicon

RE: Device stateids

Yes, we all want to get to a final draft but it can not be an excuse to 
selectively choose changes. The last set of changes to getdevicelist and 
layoutget were a major changes that some how got in very late and in the 
last bakeathon I did not find one person that was happy with those 
changes, so I am not sure how the decision are made. I explicitly 
requested that the ability to recall devices per-fsid will be maintained, 
but some how it was dropped without general agreement. So I think we 
should consider this proposal or undo some changes that took away some 
functionality that we had in previous drafts. In general the approach on 
the Linux server side is to allow the different implementation as much 
flexibility as possible and this proposal helps us to do that.
Marc.

"Noveck, Dave" <Dave.Noveck <at> netapp.com> wrote on 10/29/2007 12:57:44 PM:

> 
> I've been hearing a lot talk about getting together a spec by 11/19
> (final final deadline for next ietf) that would serve as the basis for a
> group last-call, although I've been saying "how about 'group
> penultimate-call'".  As part of that, I'm trying to get to closure on
> the op to error mapping and I didn't expect to see a proposal at this
> point that would add 3 new ops, and a whole bunch of errors to two
> existing ops.  If this is really needed, then we'll do what we have to
> do, but I'm not clear on the justification for something so different
> from what had been previously agreed to. 
> 
> Is this stuff really a requirement, or is this a case of preferring to
> have bells and whistles, that we can do without?
> 
> -----Original Message-----
(Continue reading)

dean hildebrand | 29 Oct 2007 21:47
Picon

Re: Device stateids

On 10/29/07, Noveck, Dave <Dave.Noveck <at> netapp.com> wrote:
>
> I've been hearing a lot talk about getting together a spec by 11/19
> (final final deadline for next ietf) that would serve as the basis for a
> group last-call, although I've been saying "how about 'group
> penultimate-call'".  As part of that, I'm trying to get to closure on
> the op to error mapping and I didn't expect to see a proposal at this
> point that would add 3 new ops, and a whole bunch of errors to two
> existing ops.  If this is really needed, then we'll do what we have to
> do, but I'm not clear on the justification for something so different
> from what had been previously agreed to.

I'm probably behind the times, but what is the previous agreement?
Leave draft 14 devices as is?  The changes suggested in the other post
included below Benny's?

Dean

>
> Is this stuff really a requirement, or is this a case of preferring to
> have bells and whistles, that we can do without?
>
> -----Original Message-----
> From: Benny Halevy [mailto:bhalevy <at> panasas.com]
> Sent: Monday, October 29, 2007 11:45 AM
> To: Robert Gordon; Marc Eshel
> Cc: NFSv4
> Subject: Re: [nfsv4] Device stateids
>
>
(Continue reading)


Gmane