The Sun Network Filesystem: Design, Implementation and Experiences --- Sandberg, 1985. Overview -------- * NFS uses RPC and XDR to provide a system-independent protocol for * accessing a remote filesystem. It uses a stateless, idempotent * protocol to obviate the need for crash recovery. NFS is * implemented in the kernel, and is transparent to existing * applications; programs need do nothing different to access remote * files - protocol stack: NFS / XDR / RPC / UDP / IP - XDR (eXternal Data Representation) - used to describe protocols in a machine and system independent way; - NFS is implemented below the VFS/vnode interface; - VFS defines the operations available over filesystems, while the vnode defines the operations over files, independently of its filesystem; - main design goal of NFS: provide a way of making remote files available without having to modify or relink existing programs; - overall design goals of NFS: - machine and OS independence; - crash recovery; - transparent access (no special pathname parsing, libraries, or recompiling); - UNIX semantics maintained on UNIX client; - reasonable performance; - three major pieces: the protocol, the server side, and the client side; NFS Protocol: ------------ - uses the Sun RPC mechanism - RPC helps simplify the definition, organization, and implementation of remote services; - RPC calls are synchronous - the client blocks while servers process requests; - NFS uses a stateless protocol, to facilitate crash recovery; - the parameters to each call contain all of the information necessary to complete the call; - Sun's RPC is designed to be transport independent; - as NFS is stateless, UDP losses are not a big problem; - a NFS file handle is provided by the server and used by the client to reference a file. The handle is opaque, i.e., the client never looks at its contents; - the paper shows a subset of NFS procedures, including: lookup, create, remove, read, write, and getattr. It is interesting to note that read and write calls pass as a parameter the file position (this is needed to make the operations idempotent, i.e., they may be repeated with unexpected result); - the first remote file handle (for the root of a filesystem) is obtained by the client using the mount protocol; - the reason for making mount a separate protocol is that it makes easier to plug in different access control methods (such as using Kerberos for authentication); - the mount protocol is the only place that pathnames are passed to the server; Server side: ----------- * as it is stateless, a server must commit any modified data to stable * storage before returning results; this may call data blocks, * indirect blocks and i-node blocks to be modified and written back to * disk; - the file handle is composed of a filesystem ID, an i-node number, and a generation number; * the i-node generation number is necessary because the system may * decide to reuse a given i-node after a file is deleted. So, to * differentiate between different files that happen to have the same * i-node, the generation number is incremented every time a given * i-node number is freed; Client side: ----------- - provides an interface to NFS which is transparent to applications; - instead of using a fixed hierarchy to identify remote filesystems, NFS decided to do the binding during the mount command. The disadvantage is that filesystems are not accessible before they are mounted; Filesystem interface: -------------------- - NFS keeps the VFS interface; there is one VFS structure per mounted filesystem in the kernel and one vnode structure for each open file; - a root operation is provided in the VFS to return the root vnode of a mounted filesystem; - pathname traversal is done in the kernel by breaking the path into directories and doing a lookup call through the vnode for each component; * the main reason for this is that any component of the path * could be a mounting point for another filesystem, and the * mount information is kept above the vnode implementation * level; - cache of directory vnodes may alleviate this problem; * once RPC and the vnode kernel were in place, the implementation of * NFS was simply a matter of writing the XDR routines to do the NFS * protocol, implementing the RPC server for the NFS procedures, and * implementing a filesystem interface which translates vnode operations * into NFS remote procedure calls; - a hard mounted FS will retry NFS requests forever, if the server goes down; a soft mount gives up after a while and returns an error; Security: -------- - in order to use UNIX authentication (uid, gid, etc), the mapping from uid and gid to user must be the same on the server and client. To achieve this, they used the yellow page (YP) service to keep the password files consistent; ! also, it is not clear that a root user in one machine should behave ! as such in another (not clear??); * NFS does not support remote file locking. Instead, they have a * separate, RPC based file locking facility; - since the server keeps no locks between requests, two clients writing to the same remote file may get intermixed data; - differently from UNIX (which checks access attributes only when the file is open), NFS checks attributes on every call; - timestamps saved in different parts of the system may cause problems if the time skew between client and server is big; - read-ahead and write-behind caches are implemented in both clients and servers, to add performance; - a performance problem is that writes (5%) are synchronous; * at this point, the paper becomes a fight against AT&T's Remote * Filesystem (RFS). They basically talk about all advantages of NFS * over RFS; Security focus: -------------- - As mentioned above, the only implemented security is the mapping of root to nobody. All this does is prevent root on a client machine from accessing files that only root on the server machine can access. If any other uid can access a file on the server, root on the client can just change to that user, and access the file. More recently, tiny improvements have been made. For example, most versions of mountd have an option (though it's often not used, as it disallows valid mounts from some older systems, like Ultrix) to only allow mount requests from port numbers reserved for root. This prevents regular users on a machine that happens to be able to mount a file system from a remote server from being able to find out the root file handle of the file system. Also, very very recently (I've only seen the Linux nfsd do this), nfsd will reject packets from clients that aren't listed in the exports file; before this, it was assumed that if a client knew a file handle, it must have been previously verified by mountd, so it was OK