Distributed File Systems ------------------------ - properties to consider: - replication/availability (data vs. metadata); - consistency model; - fault-tolerance; - location transparency (network transparency); - OS dependency; - transaction support; - scalability; - security; - concurrency control; - LOCUS [Walker83] ---------------- - single tree structure naming hierarchy; - covers all objects in the filesystems of all machines; - names are fully transparent; - replication at the granularity of the entire directory; - each logical filesystem (filegroup) may be stored (incompletely) in several physical containers; - copies of a file are assigned the same file descriptor; global id: ; - each filegroup has one or more storage sites (SSs) but only one current synchronization site (CSS); all open requests go through the CSS; - scalability is an issue, but CSSs can be maintained in a per-filegroup basis; - each file copy has a version; - different synchronization policies implementable by the CSS; it also stores the latest version vector; - LOCUS uses shadow pages to provide atomicity; - commited data lives on disk, while new info stays in memory; - CSS denies changes to two copies of a file simultaneously; - an SS changing a file informs the other SSs and the CSS ; - support for transactions; - NFS [Sandberg85] ---------------- - idempotent operations to avoid the need for crash recovery; - machine and OS independence; - network transparency; also implements the VFS interface; - NFS uses a stateless protocol, to facilitate crash recovery; - the parameters to each call contain all of the information necessary to complete the call (e.g., for reads, the access position is sent in the request); - no replication; - no transaction facility; actually, servers keep no locks between requests (makes sense, as they are stateless); two clients writing to the same remote file may get intermixed data; - changes made by one client may not show up in another client for up to 30 seconds. - scalability: does not scale well, as there is no replication; - V System [Gray89] ----------------- - use of leases: a lease is a contract that gives its holder specified rights over property for a limited period of time. In the context of caches, a lease grants control over writes to the covered datum during the term of the lease; - from V system measurements, it is shown that short-term leases provide near optimal efficiency; * when a client writes a datum, the server must defer the request * until each leaseholder has granted approval or the term of its lease * has expired - fault tolerance is inversely proportional to the lease period; - replication: it seems that there is only a centralized server, with other machines caching the file; - scalability: very little, as no replication is used and the work relies on reliable, synchronous broadcast communications; it also needs loosely synchronized clocks; - Harp [Liskov91] --------------- - replication: yes; there are primary servers, backups and witnesses; - better performance than NFS, even with more communication, as a log is kept in memory and a UPS is used to surpass failures; - Harp is not a file system. Rather, it's a replication extension written using the VFS interface. It strives to maintain Unix and NFS file system semantics - fault tolerance: the paper defines several mechanism to survive crashes and network partition; backups become primaries, witnesses become backups; - consistency: modifications are relayed and acknowledged by the backup before the primary returns success; - concurrency control: primary handles concurrency; - scalability: primary receives all operations and has to communicate with backups, whichc in my point of view limits the scalability of the system; - nothing about transaction support; - Zebra [Hartman93] ----------------- - One-line summary: Distributed file system, performing RAID-like striping in software on a per-client basis using LFS techniques; - clients stripe file data across servers so that different pieces of data are stored on different servers; parity information is also stored in each stripe; - instead of striping individual files, each client forms its new data for all files into a sequential log that it stripes across the storage servers; this allows even small files to benefit from striping; - clients' logs do NOT contain attribues, directories, or other metadata; they are all kept at the file manager; - stripes are immutable once they are complete; - fault tolerance: because of the parity stripe, Zebra can continue operation while a server is down; - consistency: file manager keeps all the metadata, while the data is stored at storage servers; the client requests block pointers from the file manager, then it reads the data from the storage servers; cache consistency is maintained by the file manager; - scaling: the performance of the file manager is a concern because it is a centralized resource; - XFS [Anderson95] ---------------- - XFS is a serverless network file system. It decentralizes all the tasks normally assigned to a file server across a cluster of workstations -- actually, the clients themselves; - building blocks: RAID, LFS, Zebra; - scalability: more scalable than Zebra, as the main contribution is to descentralize the file manager; - AFS [] ------ - Coda [Kistler91] ---------------- - replication: yes; caching is done for whole files (at clients); - availability: server replication, caching across disconnected operation; - consistency: optimistic approach, which is the one taken by Coda, allows all updates and hopes no conflicts will ensue; when conflicts do arise, they are resolved upon reconnection; - disconnected operation: emulate the server, log all file activity, be robust in face of possible client caches, accept some performance hit, possibly do cleaning if local disk becomes full with emulation logs; - reconnection (`cache writeback'): replay the logs, merge conflicts manually if necessary. - scalability: scales well, as files are replicated (consistently) at the servers. Conflicts shouldn't increase too much as much of the files are not shared;