Router Design and Optics - Carnegie Mellon School of Computer ...

Router Design and Optics - Carnegie Mellon School of Computer ...

15-440 Distributed Systems Lecture 7 Distributed File Systems 1 1 Outline Why Distributed File Systems? Basic mechanisms for building DFSs Using NFS and AFS as examples Design choices and their implications

Caching Consistency Naming Authentication and Access Control 2 andrew.cmu.edu Lets start with a familiar example: andrew 10,000s of people

10,000s of machines Terabytes of disk Disk Disk Disk Goal: Have a consistent namespace for files across computers. Allow any authorized user to access their files from any computer Why DFSs are Useful

Data sharing among multiple users User mobility Location transparency Backups and centralized management 4 What Distributed File Systems Provide Access to data stored at servers using file system interfaces What are the file system interfaces?

Open a file, check status of a file, close a file Read data from a file Write data to a file Lock a file or part of a file List files in a directory, create/delete a directory Delete a file, rename a file, add a symlink to a file etc 5

Challenges Remember our initial list of challenges... Heterogeneity (lots of different computers & users) Scale (10s of thousands of peeps!) Security (my files! hands off!) Failures Concurrency oh no... weve got em all. How can we build this?? Just as important: non-challenges Geographic distance and high latency Andrew and AFS target the campus network, not the wide-area

Prioritized goals? / Assumptions Often very useful to have an explicit list of prioritized goals. Distributed filesystems almost always involve trade-offs Scale, scale, scale User-centric workloads... how do users use files (vs. big programs?) Most files are personally owned Not too much concurrent access; user usually only at one or a few machines at a time Sequential access is common; reads much more common that writes There is locality of reference (if youve edited a file recently, youre likely to edit again) Outline

Why Distributed File Systems? Basic mechanisms for building DFSs Using NFS and AFS as examples Design choices and their implications Caching Consistency Naming Authentication and Access Control 9

Components in a DFS Implementation Client side: What has to happen to enable applications to access a remote file the same way a local file is accessed? Accessing remote files in the same way as accessing local files kernel support Communication layer: Just TCP/IP or a protocol at a higher level of abstraction? Server side: How are requests from clients serviced? 10 VFS interception

VFS provides pluggable file systems Standard flow of remote access User process calls read() Kernel dispatches to VOP_READ() in some VFS nfs_read() check local cache send RPC to remote NFS server put process to sleep server interaction handled by kernel process retransmit if necessary convert RPC response to file system buffer store in local cache wake up user process nfs_read() copy bytes to user memory 11

VFS Interception 12 A Simple Approach Use RPC to forward every filesystem operation to the server Server serializes all accesses, performs them, and sends back result. Great: Same behavior as if both programs were running on the same local filesystem! Bad: Performance can stink. Latency of access to remote server often much higher than to local memory. For andrew context: bad bad bad: server would get hammered!

Lesson 1: Needing to hit the server for every detail impairs performance and scalability. Question 1: How can we avoid going to the server for everything? What can we avoid this for? What do we lose in the process? NFS V2 Design Dumb, Stateless servers w/ smart clients Portable across different OSes Low implementation cost Small number of clients

Single administrative domain 14 Some NFS V2 RPC Calls NFS RPCs using XDR over, e.g., TCP/IP Proc. Input args Results LOOKUP dirfh, name status, fhandle, fattr

READ fhandle, offset, count status, fattr, data CREATE dirfh, name, fattr status, fhandle, fattr WRITE fhandle, offset, count, data

status, fattr fhandle: 32-byte opaque data (64-byte in v3) 15 Server Side Example: mountd and nfsd mountd: provides the initial file handle for the exported directory Client issues nfs_mount request to mountd mountd checks if the pathname is a directory and if the directory should be exported to the client nfsd: answers the RPC calls, gets reply from local file system, and sends reply via RPC

Usually listening at port 2049 Both mountd and nfsd use underlying RPC implementation 16 NFS V2 Operations V2: NULL, GETATTR, SETATTR

LOOKUP, READLINK, READ CREATE, WRITE, REMOVE, RENAME LINK, SYMLINK READIR, MKDIR, RMDIR STATFS (get file system attributes) 17 NFS V3 and V4 Operations V3 added: READDIRPLUS, COMMIT (server cache!) FSSTAT, FSINFO, PATHCONF V4 added:

COMPOUND (bundle operations) LOCK (server becomes more stateful!) PUTROOTFH, PUTPUBFH (no separate MOUNT) Better security and authentication Very different than V2/V3 stateful 18 Operator Batching Should each client/server interaction accomplish one file system operation or multiple operations? Advantage of batched operations?

How to define batched operations Examples of Batched Operators NFS v3: READDIRPLUS NFS v4: COMPOUND RPC calls 19 Remote Procedure Calls in NFS (a) Reading data from a file in NFS version 3 (b) Reading data using a compound procedure in version 4. 20

AFS Goals Global distributed file system One AFS, like one Internet Why would you want more than one? LARGE numbers of clients, servers 1000 machines could cache a single file, Most local, some (very) remote Goal: O(0) work per client operation O(1) may just be too expensive! 21 AFS Assumptions

Client machines are un-trusted Must prove they act for a specific user Secure RPC layer Anonymous system:anyuser Client machines have disks(!!) Can cache whole files over long periods Write/write and write/read sharing are rare Most files updated by one user, on one machine 22 AFS Cell/Volume Architecture Cells correspond to administrative groups /afs/andrew.cmu.edu is a cell

Cells are broken into volumes (miniature file systems) One user's files, project source tree, ... Typically stored on one server Unit of disk quota administration, backup Client machine has cell-server database protection server handles authentication volume location server maps volumes to servers 23 Outline Why Distributed File Systems? Basic mechanisms for building DFSs

Using NFS and AFS as examples Design choices and their implications Caching Consistency Naming Authentication and Access Control 24 Topic 1: Client-Side Caching Huge parts of systems rely on two solutions to

every problem: 1. All problems in computer science can be solved by adding another level of indirection. But that will usually create another problem. -- David Wheeler 2. Cache it! Client-Side Caching So, uh, what do we cache? Read-only file data and directory data easy Data written by the client machine when is data written to the server? What happens if the client machine goes down? Data that is written by other machines how to know that the data has changed? How to ensure data consistency? Is there any pre-fetching?

And if we cache... doesnt that risk making things inconsistent? 26 Failures Server crashes Data in memory but not disk lost So... what if client does seek() ; /* SERVER CRASH */; read() If server maintains file position, this will fail. Ditto for open(), read() Lost messages: what if we lose acknowledgement for delete(foo) And in the meantime, another client created foo anew?

Client crashes Might lose data in client cache Use of caching to reduce network load read(f1)V1 read(f1)V1 read(f1)V1 read(f1)V1 cache Read (RPC)

F1:V1 Return (Data) Client it e r W OK write(f1) read(f1)V2 cache F1:V2 C)

P (R Server cache F1:V2 F1:V1 K AC Client 28

Client Caching in NFS v2 Cache both clean and dirty file data and file attributes File attributes in the client cache expire after 60 seconds (file data doesnt expire) File data is checked against the modified-time in file attributes (which could be a cached copy) Changes made on one machine can take up to 60 seconds to be reflected on another machine Dirty data are buffered on the client machine until file close or up to 30 seconds If the machine crashes before then, the changes are lost 29 Implication of NFS v2 Client

Caching Advantage: No network traffic if open/read/write/close can be done locally. But. Data consistency guarantee is very poor Simply unacceptable for some distributed applications Productivity apps tend to tolerate such loose consistency Generally clients do not cache data on local disks 30 NFSs Failure Handling Stateless Server Files are state, but... Server exports files without creating extra state

No list of who has this file open (permission check on each operation on open file!) No pending transactions across crash Crash recovery is fast Reboot, let clients figure out what happened State stashed elsewhere Separate MOUNT protocol Separate NLM locking protocolStateless design Stateless protocol: requests specify exact state. read() read( [position]). no seek on server. NFSs Failure Handling Operations are idempotent

How can we ensure this? Unique IDs on files/directories. Its not delete(foo), its delete(1337f00f), where that ID wont be reused. Write-through caching: When file is closed, all modified blocks sent to server. close() does not return until bytes safely stored. Close failures? retry until things get through to the server return failure to client Most client apps cant handle failure of close() call. Usual option: hang for a long time trying to contact server NSF Results NFS provides transparent, remote file access Simple, portable, really popular

(its gotten a little more complex over time, but...) Weak consistency semantics Requires hefty server resources to scale (writethrough, server queried for lots of operations) Lets look back at Andrew NFS gets us partway there, but Probably doesnt handle scale (* - you can buy huge NFS appliances today that will, but theyre $$$-y). Is very sensitive to network latency How can we improve this? More aggressive caching (AFS caches on disk in addition to just in memory) Prefetching (on open, AFS gets entire file from server, making later ops local & fast).

Remember: with traditional hard drives, large sequential reads are much faster than small random writes. So easier to support (client a: read whole file; client B: read whole file) than having them alternate. Improves scalability, particularly if client is going to read whole file anyway eventually. Client Caching in AFS Callbacks! Clients register with server that they have a copy of file; Server tells them: Invalidate! if the file changes This trades state for improved consistency What if server crashes? Lose all callback state! Reconstruct callback information from client: go ask everyone who has which files cached?

AFS v2 RPC Procedures Procedures that are not in NFS Fetch: return status and optionally data of a file or directory, and place a callback on it RemoveCallBack: specify a file that the client has flushed from the local machine BreakCallBack: from server to client, revoke the callback on a file or directory What should the client do if a callback is revoked? Store: store the status and optionally data of a file Rest are similar to NFS calls 36 Topic 2: File Access Consistency

In UNIX local file system, concurrent file reads and writes have sequential consistency semantics Each file read/write from user-level app is an atomic operation The kernel locks the file vnode Each file write is immediately visible to all file readers Neither NFS nor AFS provides such concurrency control NFS: sometime within 30 seconds AFS: session semantics for consistency 37 Session Semantics in AFS v2 What it means: A file write is visible to processes on the same box

immediately, but not visible to processes on other machines until the file is closed When a file is closed, changes are visible to new opens, but are not visible to old opens All other file operations are visible everywhere immediately Implementation Dirty data are buffered at the client machine until file close, then flushed back to server, which leads the server to send break callback to other clients 38 AFS Write Policy Writeback cache Opposite of NFS every write is sacred

Store chunk back to server When cache overflows On last user close() ...or don't (if client machine crashes) Is writeback crazy? Write conflicts assumed rare Who wants to see a half-written file? 39 Results for AFS Lower server load than NFS More files cached on clients Callbacks: server not busy if files are read-only (common case)

But maybe slower: Access from local disk is much slower than from another machines memory over LAN For both: Central server is bottleneck: all reads and writes hit it at least once; is a single point of failure. is costly to make them fast, beefy, and reliable servers. Topic 3: Name-Space Construction and Organization NFS: per-client linkage Server: export /root/fs1/ Client: mount server:/root/fs1 /fs1

AFS: global name space Name space is organized into Volumes Global directory /afs; /afs/cs.wisc.edu/vol1/; /afs/cs.stanford.edu/vol1/ Each file is identified as fid = All AFS servers keep a copy of volume location database, which is a table of vol_id server_ip mappings 41 Implications on Location Transparency NFS: no transparency If a directory is moved from one server to another, client must remount

AFS: transparency If a volume is moved from one server to another, only the volume location database on the servers needs to be updated 42 Naming in NFS (1) Figure 11-11. Mounting (part of) a remote file system in NFS. 43 Naming in NFS (2) 44

Topic 4: User Authentication and Access Control User X logs onto workstation A, wants to access files on server B How does A tell B who X is? Should B believe A? Choices made in NFS V2 All servers and all client workstations share the same name space B send Xs to A Problem: root access on any client workstation can lead to creation of users of arbitrary Server believes client workstation unconditionally Problem: if any client workstation is broken into, the protection of data on the server is lost;

sent in clear-text over wire request packets can be faked easily 45 User Authentication (contd) How do we fix the problems in NFS v2 Hack 1: root remapping strange behavior Hack 2: UID remapping no user mobility Real Solution: use a centralized Authentication/Authorization/Access-control (AAA) system 46 A Better AAA System: Kerberos Basic idea: shared secrets

User proves to KDC who he is; KDC generates shared secret between client and file server KDC e Ne d client to ce

c a K s ticket server f ss generates S [S] nt c li e encrypt S with clients key

Kf [ s S] file server S: specific to {client,fs} pair; short-term session-key; expiration time (e.g. 8 hours) 47 Todays bits Distributed filesystems almost always involve a tradeoff: consistency, performance, scalability. Weve learned a lot since NFS and AFS (and can implement faster, etc.), but the general lesson holds. Especially in the wide-area.

Well see a related tradeoff, also involving consistency, in a while: the CAP tradeoff. Consistency, Availability, Partition-resilience. More bits Client-side caching is a fundamental technique to improve scalability and performance But raises important questions of cache consistency Timeouts and callbacks are common methods for providing (some forms of) consistency. AFS picked close-to-open consistency as a good balance of usability (the model seems intuitive to users), performance, etc. AFS authors argued that apps with highly concurrent, shared access, like databases, needed a different model

Failure Recovery in AFS & NFS What if the file server fails? What if the client fails? What if both the server and the client fail? Network partition How to detect it? How to recover from it? Is there anyway to ensure absolute consistency in the presence of network partition? Reads Writes

What if all three fail: network partition, server, client? 72 Key to Simple Failure Recovery Try not to keep any state on the server If you must keep some state on the server Understand why and what state the server is keeping Understand the worst case scenario of no state on the server and see if there are still ways to meet the correctness goals Revert to this worst case in each combination of failure cases 73

Recently Viewed Presentations

  • Principles of Software Testing for Testers Module 8:

    Principles of Software Testing for Testers Module 8:

    Explain briefly that Verifying that your test Approach will work follows the RUP's underlying philosophies of addressing risk and progress through tangible demonstration. Verify Test Approach - Content Outline ... Brief overview of activities and artifacts typical of the work.
  • Early Learning Policy Update Lauren Heintz Freemire Manager,

    Early Learning Policy Update Lauren Heintz Freemire Manager,

    Reforming school finance (only CPP and preschool special education now) Hamner/Rankin bill. Current long bill/budget bill proposes an increase of $185 per student. TABOR, the taxpayer bill of rights which puts a cap on the revenue the state can retain...
  • Diapositiva 1 - drea.co.cr

    Diapositiva 1 - drea.co.cr

    Dei Verbum 12 En este importante texto se nos dice: Hay que atender a los géneros literarios. Es decir, que la determinación por estudiar los géneros literarios no es de libre elección. No actúa correctamente quien descuida el estudio de...
  • LD-CAP ADS-B Equipage

    LD-CAP ADS-B Equipage

    Bismarck Aero Center. Executive Air. Minot Aero Center. Participation in data collection efforts. Periodic surveys. Electronic submission of data collected with the ADS-B unit. Periodic data collection. Looking for signal dropouts, etc. Questions: [email protected] 6/1/2015. LD-CAP ADS-B Outreach Program
  • INTERMEDIATE PROGRAMMING Lesson

    INTERMEDIATE PROGRAMMING Lesson

    For this lesson, our goal is to move a desired amount of rotations at a desired power and return the ultrasonic value at the end. Step 1: Select the two blocks in the code that you want to turn into...
  • ECONOMICS Twelfth Edition, Global Edition Michael Parkin 5

    ECONOMICS Twelfth Edition, Global Edition Michael Parkin 5

    Some years ago, Jim Tobin told Michael Parkin a nice test of whether a person is a liberal or a conservative. It also generates a good classroom discussion. Here's how it goes. Give the students the following scenario and question:...
  • Carbon: Transformations in Matter and Energy Environmental Literacy

    Carbon: Transformations in Matter and Energy Environmental Literacy

    Rearranging the atoms to make product molecules: Carbon Dioxide and Water . Complete Step 3 ofPart B of your worksheet.. Have students construct a model of the chemical change. Tell students to follow the instructions the worksheet to construct their...
  • Chapter 1 INTRODUCTION FORENSIC SCIENCE: An Introduction, 2nd

    Chapter 1 INTRODUCTION FORENSIC SCIENCE: An Introduction, 2nd

    Leone Lattes -developed a procedure to determine blood type from dried bloodstains. Calvin Goddard -used a comparison microscope to determine if a particular gun fired a bullet. Albert Osborn -developed the fundamental principles of document examination. History Walter McCrone -utilized...