IETF 56 1/2 NFS V4

Welcome, Note Well, and Agenda bashing (5 minutes) (beepy)
[See agenda and intro section of beepy slides]

Ceci n'est pas un papier bleu.

Started a little late because of parking situation. Continental breakfast and lunch. Bathrooms well located. Sign up sheets passed around.

Review current NFSv4 Charter and Milestones (10 minutes) (beepy)

We are still missing the approved wording on RDMA/RDDP problem and requirements work on the published charter (though the milestones are there). beepy to follow up yet again. Milestones were in okay shape.

"RPC Numbering Authority Transfer to IANA"
http://www.ietf.org/internet-drafts/draft-ietf-nfsv4-rpc-iana-00.txt
Rob Thurlow (10 minutes)
[See IANA section in Thurlow slides]

Rob Thurlow started.

Sun transferring program # and authentication flavor # authority to IANA. Re-spin of RFC 1831. Tom Talpey asks about private registrations. Thurlow said IANA will pick up only number reservation, associated with company name - not responsible (as Sun was) for the .x file protocol specification.

"RPC: Remote Procedure Call Protocol Specification Version 2"
http://www.ietf.org/internet-drafts/draft-ietf-nfsv4-rfc1831bis-00.txt
Rob Thurlow (5 minutes)
[See RPC section in Thurlow slides]

Mike Eisler points out RPCSEC GSS added error codes to RPC which should be folded into RFC 1831bis.

Minor versioning process
Rob Thurlow (15 minutes)
[See minor versioning section in Thurlow slides]

Rob Thurlow observes that there was some push back on re-charter and proposal to move forward with train model - concern of meeting milestones, gaining closure. There was some back and forth regarding other ways WGs deal with change (go dormant for a while then reestablish). Rob summarized what minor versioning allows.

We don't have consensus on minor versioning - but we have a proposal. Thurlow has a proposal for dates attached to Connectathon - but dates are not binding. How to move this forward. Brent Callaghan asked why the train model? Eisler and Dave Noveck says we need multiple minor versions moving forward. beepy observed that the "train model" vs. "functional release model" is philosophical for the first minor version - and was interested if the initial proposed dates were even sane. Brent remains skeptical. But Thurlow suggests that if features don't clear hurdle they are dropped vs. dilating schedule. Back and forth about how this will all work. beepy closed down discussion to take to alias.

"XDR: External Data Representation Standard"
http://www.ietf.org/internet-drafts/draft-ietf-nfsv4-rfc1832bis-01.txt
Mike Eisler (2 minutes)
[See XDR section in Mike Eisler slides]

Second time Eisler has talked about XDR draft. IANA considerations and ISOC copyright added, some useful comments in WG last call. Two week last call starting this Monday June 9.

"NFSv4.1: SECINFO Changes"
http://www.ietf.org/internet-drafts/draft-ietf-nfsv4-secinfo-00.txt
Mike Eisler (5 minutes)
[See SECINFO section in Mike Eisler slides]

Second time we talked about SECINFO. Request for enhancement of current SECINFO. Proposed for V4.1. beepy started debate about V4.1 mandatory feature addition - by accident. SECINFO: no recent changes, should be part of 4.1.

"CCM: The Credential Cache GSS Mechanism"
http://www.ietf.org/internet-drafts/draft-ietf-nfsv4-ccm-01.txt
Mike Eisler (15 minutes)
[See CCM section in Mike Eisler slides]

Mike Eisler summarized the motivation for a new security mechanism. See slides. Major motivation is putting something in place something to leverage emerging off the shelf IPsec and TLS HW accelerators. Attack here is to use Kerberos for authentication of the session and IPsec for the steady state ongoing. Mike stated channel bindings are a way to prevent man in the middle attacks.

Mike Eisler has NetApp IPsec people reviewing draft. Comments came back on small noncompliance with GSSAPI, fix identified. Eisler wants to coordinate this with SECINFO minor version.

[Rob Thurlow's notes] CCM: reviewed draft. Q from Carl: performance issues proven? Yes, prior work proved integrity alone was a large scaling problem. Mike will point to or re-post IPsec underneath better than RPCSEC_GSS privacy because whole payload is protected. Changes from -00: have good work on channel bindings (prevents MITM attacks because signed info provable). Three flavors, -KEY, -ADDR, -NULL. Should be refereed to for 4.1, but remain a separate draft because of other applicability. Need more work on channel bindings; need to talk with implementors of IPsec about this; fix Martin Rex's issue.

Break (15 minutes)

11:15 AM EDT Start

Useless poll: 12 people running Windows, 4 people running Linux, 3 others (Macs?). We ignored Talpey's poll of the people without computers.

"Server-to-Server Replication/Migration Protocol Design Principles"
http://www.ietf.org/internet-drafts/draft-ietf-nfsv4-repl-mig-design-00.txt
Rob Thurlow (2 minutes)
[See Replication/Migration section in Thurlow slides]

The presentation was a non-controversial in restating what was on alias already. Specification not ready for advancement. beepy has not delivered the requirements document.

"A Server-to-Server Replication/Migration Protocol"
http://www.ietf.org/internet-drafts/draft-ietf-nfsv4-repl-mig-proto-01.txt
Rob Thurlow (15 minutes)
[See Replication/Migration section in Thurlow slides]

December discussion was fruitful on draft. Now a 3 procedure RPC protocol. Nico Williams from Sun commented on Windows security issue with SNEGO. beepy asked why not allow client to recover as if reboot instead of state transfers. Eisler said Mike Kazar once commented that it was a mistake not to propagate lock state in DCE/DFS - the window opens of client being unable to recover in the "grace period".

Peter Honeyman and CITI are working on some ideas for replication/migration features. Looking at prototypes and issues to move to what protocol. In response to beepy's question about interest and continuing work. Noveck will have NetApp solution on the back-end for testing the front end protocol support. Thurlow needs to think about commitment for implementation to respond to NFSERR_MOVED.

[From Rob Thurlow notes] Repl-mig work: need to find out if anyone will actually use this stuff. Qs: sync or async? A: protocol doesn't care (for now). Dave Noveck commits to providing a server that does something meaningful for repl-mig so clients can test.

[From Rob Thurlow notes] Suggestions from Travis Broughton: provide an escape of some kind to transfer bulk filesystem data without structure, e.g. Let SNDR data flow through and then pick up the rest of the V4 state. Also: transfer filesystem share data so that that can propagate as well.

"NFS version 4 MIB for Server Implementations"
http://www.ietf.org/internet-drafts/draft-shepler-nfsv4-mib-00.txt
Spencer Shepler (10 minutes)
[Spencer Shepler slides]

Shepler has updated the old draft. Updated to latest. Covered current content. Travis (from Intel) spoke of need for per file system error rates would be useful for diagnostic. Eisler suggests per user ID and per client ID granularity.

[From Rob Thurlow] Spencer Shepler - MIB – needs more work, can make this good if we can figure out what requirements are. Should we expose stuff like stateids/clientids etc.?

Carl Burnett asked about CIM vs. SNMP support. Should this not be done here? beepy points out that SNMP MIB definition could act as the spec for what to monitor (manage). Shepler observes there is no client specification. Probably should be done.

What Brent Callaghan would prefer to see from a MIB is not just a long list of counters - but meaning and analysis suggestions of the counters. Brent wonders if we have enough information to do a good job here yet.

"NFS version 4 Directory Notifications"
http://www.ietf.org/internet-drafts/draft-shepler-nfsv4-dirnotify-00.txt
Dave Noveck, Andy Adamson (20 minutes)
[directory slides]

Adamson started mapping out discussion of directory delegations. Discussed notification vs. delegation. Notification is proactive from server to client, eliminating polling behaviour.

[From Rob Thurlow's notes] Dave Noveck/Andy Adamson directory notification – is this enough? Carl – negative entry caching was a big win, and requires a stronger guarantee than this. Negotiate what behaviour to expect? Unclear what attributes to cover; current draft keeps it simple. Renaming an issue. Clientids: I made the change, don't tell me about it – works with session stuff, also.

Carl Burnett found negative entry caching with strong consistency was more useful positive caching for reliable miss detection in software build environments. Solves ENOENT problem. Back and forth discussion. Peter Honeyman had some questions.

Noveck and Adamson are looking for a simpler mechanism that gets us there fast. Brent said Java app startup has hundreds of opens with 50% ENOENT.

Shepler optimized not notifying a client making the change - Noveck said it is a good fit for session stuff in Tom's RDDP draft.

1:00 PM EDT Start

NFSv4/RDDP requirements discussion
[See RDDP section of beepy slides]

beepy opened with some background and ground rules The recent personal drafts are useful from IETF viewpoint only as input to the problem statement and requirements drafts we are chartered to consider (input into the RDDP working group). Can talk about these drafts, but need to meet some charter goals and re-charter to make official progress. Will send e-mail to get the req/prob docs put together. IETF not interested in things like Infiniband.

Review of RDDP motivation
Tom Talpey (10 minutes)
[See motivation section of Talpey slides]

Tom Talpey - reviewed benefits and history of RDMA; helps client the most. Point is to eliminate data copies via direct data placement. "If it hurts at 1 GbE, it is deadly at 10 GbE." Talked about how drafts relate and cover the problem. Have a generic ONC RPC usage doc, a generic NFS doc, and a V4 extensions doc with stuff beyond just RDMA (sessions).

RDMA helps the client the most - frees CPU up for use by application. Talpey described benefits and provided a layered architecture diagram.

"RDMA Transport for ONC RPC"
http://www.ietf.org/internet-drafts/draft-callaghan-rpcrdma-00.txt
Thomas Talpey, Brent Callaghan (20 minutes)
[Brent Callaghan slides]

Brent discussed draft and mapping RPC over RDMA. Said it is client performance of NFS vs. FibreChannel performance in the data center. Lot's of good background as to how RDMA protocols would interact with NFS to lead into the open discussion for requirements. Did cover interaction of RPC and RDMA ops, and credits for buffer reservations.

Talpey talked about chunked and non-chunked reading/writing. Which covered issues arising in full exploitation of an underlying RDMA transport.

"NFS Direct Data Placement"
http://www.ietf.org/internet-drafts/draft-callaghan-nfsdirect-00.txt
Thomas Talpey, Brent Callaghan (5 minutes)
[See RDDP section of Talpey slides]

Talpey explained the reason and content of the three drafts as indicating the interesting divisions of how to divide the "RDMA" problem up when considering a protocol(s) response. Tom - further compared RDMA SEND, WRITE and READ usage in different cases.

"NFSv4 RDMA and Session Extensions"
http://www.ietf.org/internet-drafts/draft-talpey-nfsv4-rdma-sess-00.txt
Thomas Talpey, Spencer Shepler (20 minutes)
[See session extensions section of Talpey slides]

Optimize RDMA use and semantics via V4 minor revision. Again, these notions were fodder to drive requirements discussion.

Discussed channels, relationship to SCTP streams, match to proposed CCM proposal. Rebinding the back channel and delegation callback stuff.

Exactly once semantic fall out from the RDMA design due to the bounded response cache size. This is applicable to non-RDMA transports (That is, Talpey observed that session extensions were primarily a transport proposal that happened to address RDMA as well). Credits exist in the RPC layer (response to Carl Burnett).

Discussion ensuing regarding changes to protocol (simplification) if given exactly once semantics. Noveck again brought up optional behaviour requirement for minor versions. OPEN_CONFIRM, and a couple other ops are probably made obsolete by this. Noted issue.

Carl Burnett asked about "non-idempotent" vs. flow control and stream ids (an addition for the SESSION semantics). On Streamid - Talpey responded to questions regarding additional resources required to track - Talpey said more efficient than XID hashed lookup. Talpey was talking to Carl Burnett about the layering where channel rebinding may occur under a long lasting SESSION. See need for chaining concept to overcome COMPOUND op total length limitation. Talpey summarized the improved efficiency derived from SESSION.

Thurlow (not as co-chair) asked that non-RDMA stuff be factored out to improved V4.1 semantics.

Eisler observes that these do not represent requirements to be given to the RDDP working group.

Wittle touched on EOS and ordering as transport level requirements - to feed to RDDP. Thurlow says we need to de-assume and list expectations.

[Rob Thurlow's notes - decided to keep in toto - these are parallel to above] Tom Talpey - V4 RDMA & session extensions document. Split out sessions as V4.1 fodder now and de-emphasize RDMA language, and find a new home for some of the RDMA text. Gets around minor rev rules by adding a new op to put in every compound. Callbacks get better - no firewall issues. Synergy with CCM high. Channels would likely be streams within SCTP connection (Tom adds "is only one possibility, might also be multiple connections, if trunking is desired. The "likely" stems from operations and callback channels sharing a connection, for demux efficiency at the client."). Q: can we use same GSS context on the callback channel? A: likely, but we had better look at it. Can probably do something lighter than initial authentication. Exactly-once: currently can't maintain dupreq cache correctly because of unbounded storage requirements. Session resource bookkeeping exactly bounds the requirements, so we should be able to get this right. RDMA prompted the solution, but we get a nice benefit from it. Streamid points to a particular cache, in effect; gets rid of XID lookup; sessionid/channelid carried per-op via OPERATION_CONTROL. Q: isn't this a big change? So dramatic we should have done this first rather than 4.0? A: sure, if we'd thought about it in time. Q: doesn't this obsolete SETCLIENTID, SETCLIENTID_CONFIRM, OPEN_COMFIRM? A: Yup, should make that clear in the draft. Comment: re-factor into V4.1 session stuff as a WG draft with 90% less RDMA language, plus a RDMA personal draft. Capture: needs reqs for RDDP, need to tease them out of the current stuff, even if obvious and there already. Need to examine work on generic RPC/XDR and NFS specifics to see what comes out. Comment: need to separate out the dupreq cache and the other state. Q: multiple connections for separation of small ops vs. reads and writes? A: Sure, supported now. Q: separate idempotent and non-idempotent ops? A: Good idea, worth noting. Q: Is there more stuff to transfer in repl-mig due to the session extensions? A: yes.

Open requirements discussion (30 minutes)

beepy opened with some observations. "Gallia est omnis divisa in partres tres." (There were three individual drafts submitted.) The working group owns RPC and NFS etc. The proposals open possibility of "transparent" RDMA for things like NFS Version 3. Is that a requirement? How do people feel. Talpey spoke of minor version to gain things like SESSION semantics. And we are to provide our problem statement and requirements document to the RDDP working group.

Requirements discussion for 15 minutes.

There was a question from Eric Kustarz on possible denial of service attack on the channel binding (?)? where the server can notice binding attacks. Carl Burnett asked about "server revocation" capabilities - server can drop connections.

beepy repeatedly tried to up-rev the conversation to a problem and requirements discussion - and the group kept noodling deep in the lower layers. Sigh.

Noveck said something about caching and non-caching on multiple connections - to be put in draft.

Wittle observes that migration/replication proposal may require additional state.

beepy asked for people who want to sub-group on problem and requirements documents. (Brent Callaghan, Tom Talpey, Chet Juszczak and Mark Wittle volunteered to make first attack to prepare drafts by Vienna IETF meeting I-D cutoff).

BREAK (15 minutes)

2:45PM EDT Start

Owner/Group mappings in multiple security contexts
Nico Williams (30 minutes)
[Nico Williams slides]

NFS V4 defines an ACE with domain attribution - but Nico Williams observes that seems to be no explicit requirement for multiple domain support in ACLs - but certainly is highly desirable for multi-domain organizations. Proposal for a dynamic mapping of ACE principals.

Touched on use of LDAP and and security principals and mappings.

Eisler and Nico Williams discussing NIS and UID mapping.

[Rob Thurlow's notes - again in toto] Mergers creating complex ID spaces are a pain, and V4's domain-qualified Ids are good, but need some other help. Need a way to support small flat ID spaces such as Posix UIDs and GIDs. Windows SIDs are structured as well, with a domain part. Problem comes about when we support both CIFS with SIDs and NFSv2/3 (flat) with v4. Proposes mapping on the fly via a new network service. When you see an ACE name, get a mapping and cache it; first time, you could get a dynamic new value. Reuse of Ids and ACE names are an issue - discusses restrictions. Renaming OK with aliases, otherwise don't. Service is at least per-domain, with mapping domain probably matching LDAP or DNS domain. Also talks about GSS-API principal mapping. Swiping idea from MS AD, could put list of ACE names in Kerberos V5 auth info you get with TGT, nice benefit. Wrote RPC protocol, not sure it can be implemented with LDAP - multiple names to one UID/GID permit canonicalization, but LDAP indices can't quite match, have to use aliases to get close. Used SECINFO rather than SNEGO due to a vulnerability in Microsoft's SNEGO (ick), since we can assume that the presence of this bug meant customers with MS involvement would dumb down their configs (sigh). Issue: with lookup by ID, MS AD needs to be configured to permit lookup persons by SID, not the default, implies the ability to enumerate all users in a domain. Next steps: LDAP schema for ACE name canonicalization, add pseudocode, add refs, fix typos. Q: how does this relate to NIS? A: with NFSv4 and cross-domain stuff, you need this on the client as well as on the server, and don't care anymore about that info in NIS. Q: Really? Don't we still need to sync? A: Only for pre-mapping service Ids? Mapping service should probably respect local Dbs, but need not, says Nico. Nico will proceed, will also pitch to Kerb folks.

"A Namespace For NFS Version 4"
http://www.ietf.org/internet-drafts/draft-thurlow-nfsv4-namespace-00.txt
Rob Thurlow (15 minutes)

Name space - need to have replicated way to find designated root, then replicated way to find interior portions of the name space. Mechanism vs. conventions - enabling global name space and the way it appears.

Automounter is a weak mechanism leading to inconsistent (therefore non-global) usage.

Thurlow went over requirements for a useful name space.

Travis Broughton laid out requirement for local delegation of portions of the name space. And dynamic views being immediately visible in a distributed environment. We kept ratholing on Automounter. Simplified administration is another requirement.

But LDAP keeps coming up as a solution for driving the global name space.

The name space should allow for finding "more local" copies that provide more efficient access.

The name space move to the server side becomes more transactional says Carl Burnett. He agrees that closer server policies are good. He says it is no longer a mount/remount scenario - when data moves it is transparent - do not take data offline. Key objection to any "mount" based mechanism.

Carl Burnett and Travis Broughton held up the data space driven model of AFS for uniform name space construction.

Thurlow says two parts of problem. Find global root, find referrals within the space. Brent brought up the scalability of the root name space servers vs. active directory attack by Microsoft for availability of name space in a large enterprise.

Thurlow is leery of usefulness of fs_locations usage. Honeyman and Noveck and Eisler said we can use fs_locations. Eisler worries that if we don't exploit 4.0 for name space, we may be looking at long delay for name space construction.

beepy asked the parties to chat about the issues off line and come back to working group (on fs_locations).

What about NFS Version 3, asks Brent - "Upgrade to V4." Brent Callagan says unacceptable. Thurlow said you can back patch to V3 with parallel automounter maps.

"Mapping Between NFSv4 and Posix Draft ACLs"
http://www.ietf.org/internet-drafts/draft-eriksen-nfsv4-acl-01.txt
Marius Eriksen (15 minutes)
[Eriksen slides]

Marius Eriksen is talking about mapping Posix ACLs to V4 and being able to map back again with no lost semantics? This is not a solution to a general mapping of V4 ACLs to Posix ACLs.

Marius Eriksen described the Posix view of the world. And how it differed from the NFS V4 ACL evaluation model.

Comment on grammar vs. semantics and architecture for the mapping?

Eisler and Noveck raised issue of an arbitrary V4 or CIFS ACL via the Posix mapping to get an approximation. Having to do with Marius rejecting an unmappable ACL? Need a new error for this says Noveck.

Optimistic mode - Eisler doesn't want an error returned.

This is a personal draft at this moment - can we make it a working group document? Yes.

[Rob's notes] Differences in interpretation algorithm mean we have to emulate Posix ACEs with more than one NFSv4 ACE. Favorite quote: "This is not a good example because it's wrong." Q: what about translation in the other direction? One-to-many works for expression of value, but not for setting the ACL to a new value. Q: how do we handle issues like Posix letting owner change a read-only file? Do we need to return an approximation? Estimate high or low? Q: How do we handle ACLs we just can't map? Fudge, or return EINVAL? Q to audience: WG work item? A: sure.

NFSv4 (RFC3530) interpretation discussions/clarifications
Spencer Shepler (30 minutes)

NFSv4 implementation and test plans for moving to Draft Standard (5 minutes)

A question arose as to how and why move RFC 3530 to Draft Standard.

beepy says the IETF "ships" protocol specifications - not implementations. Implementations are only used to demonstrate the state of a specification.

[Rob Thurlow's notes of beepy's exposition] Proposed standard needs no implementations, draft needs two independent, complete implementations that interoperate. Each feature (even optional) must have two implementations as well, or they will either be dropped from draft or will delay the draft. Also: all normative references must be at the level you wish to go to. Have to submit reports that prove it. Q: do we need two server and two client implementations for each feature? A: yes. Q: could we move some cruft to Informational to simplify things? A: Yes, e.g. C-bindings for GSS-API, which isn't a protocol. Q: Do products have to ship with features? A: No, they just have to have been created and tested.

END