CS6456: Graduate Operating Systems Brad Campbell [email protected] https://www.cs.virginia.edu/~bjc8c/class/cs6456-f19/

CS6456: Graduate Operating Systems Brad Campbell  bradjc@virginia.edu https://www.cs.virginia.edu/~bjc8c/class/cs6456-f19/

CS6456: Graduate Operating Systems Brad Campbell [email protected] https://www.cs.virginia.edu/~bjc8c/class/cs6456-f19/ Slides modified from CS162 at UCB 1 2 Binding Instructions and Data to Memory Process view of memory data1: dw start: lw r1,0(data1) jal loop: addi

1 bnz checkit: 32 checkit r1, r1, r1, loop Physical addresses 0x0300 00000020 0x0900 8C2000C0 0x0904 0C000280 0x0908 2021FFFF 0x090C 14200242

Binding Instructions and Data to Memory Physical Memory 0x0000 0x0300 00000020 Process view of memory data1: dw start: lw r1,0(data1) jal loop: addi 1 bnz checkit:

32 checkit r1, r1, r1, loop Physical addresses 0x0300 00000020 0x0900 8C2000C0 0x0904 0C000280 0x0908 2021FFFF 0x090C 14200242 0x0900 8C2000C0 0C000340

2021FFFF 14200242 0xFFFF Execute Second Instance of Program? Physical Memory 0x0000 0x0300 Process view of memory data1: dw start: lw r1,0(data1)

jal loop: addi 1 bnz checkit: 32 checkit r1, r1, r1, loop Physical addresses 0x0300 00000020 0x0900 8C2000C0 0x0904 0C000280 0x0908 2021FFFF 0x090C

14200242 0x0900 ? 0xFFFF App X Execute Second Instance of Program? Physical Memory 0x0000 0x0300 Process view of memory data1: dw

start: lw r1,0(data1) jal loop: addi 1 bnz checkit: 32 checkit r1, r1, r1, loop Physical addresses 0x1300 00000020 0x1900 8C2004C0 0x1904

0C000680 0x1908 2021FFFF 0x190C 14200642 0x0900 App X 0x1300 00000020 0x1900 8C2004C0 0C000680 2021FFFF 14200642 0xFFFF When do we decide on addresses? At compile time?

Tricky we don't know state of physical memory when program is executed At load time? Scan through binary and modify addresses Expensive what if we have a large program? Still using physical addresses directly At run time? Translation Modify addresses issued by CPU on the fly Uniprogramming Operating System Application 0xFFFFFFFF Valid 32-bit Addresses No translation No protection

Application can access any physical address directly Application always runs at same location 0x00000000 Not just illusion of dedicate machine, it really is a dedicated machine! Primitive Multiprogramming No translation No protection Programs need to use different address ranges Rely on loader to modify addrs at execution time Common on early operating systems Operating System Application2

Application1 0xFFFFFFFF 0x00020000 0x00000000 Multiprogramming with Protection Operating System Application2 Application1 0xFFFFFFFF BoundAddr=0x10000 0x00020000 BaseAddr=0x20000 0x00000000 Hardware Support: Base and Bound Registers

Access outside of range: Error! (Unless in Kernel Mode) Modify from kernel mode only Kernel loads register values from PCB when context switch occurs Loading Executable Into Memory disk (huge) memory info data code exe View so far: OS copies each segment into memory Then set up registers, jump to start location New View: Create Address Space disk (huge, TB) VAS per process kernel

stack stack heap data heap data code One (very conservative) method: Every page in address space is backed by disk Just allocate space on disk, let a page fault trigger a load into memory New View: Create Address Space disk (huge, TB) VAS per process kernel stack

stack heap data heap data code Note that we do not need an extra copy of read-only data already contained in a file Executable code, memory mapped files (soon) New View: Create Address Space disk (huge, TB) VAS per proc. PT memory kernel stack

stack user page frames heap user pagetabl ekernel heap data data code code & data User page table maps entire virtual address space One per process, distinguishes present from absent pages

OS needs to store mapping from virtual page to disk location New View: Create Address Space disk (huge, TB) VAS per proc. PT memory kernel stack stack user page frames heap user pagetabl ekernel

heap data data code Backing Store code & data Cache Provide Backing Store for VAS disk (huge, TB) VAS 1 PT 1 memory kernel stack

stack stack heap heap data data code heap VAS 2 kernel stack heap data code PT 2

data code user page fram es user pagetabl e kern el code & data A Page Fault disk (huge, TB) PT 1

VAS 1 memory kernel stack stack stack heap heap data data code heap VAS 2 kernel stack

heap data code PT 2 data code user page fram es user pagetabl e kerne l code & data

active process & PT A Page Fault: Find and Start Load disk (huge, TB) PT 1 VAS 1 memory kernel stack stack stack heap heap data data

code heap VAS 2 kernel stack heap data code PT 2 data code user page fram es user pagetabl

e kern el code & data active process & PT A Page Fault: Switch During IO disk (huge, TB) PT 1 VAS 1 memory kernel stack stack stack

heap heap data data code heap VAS 2 kernel stack heap data code PT 2 data code

user page fram es user pagetabl e kern el code & data active process & PT On Page Fault: Update PTE disk (huge, TB) PT 1 VAS 1 memory

kernel stack stack stack heap heap data data code heap VAS 2 kernel stack heap data

code PT 2 data code user page fram es user pagetabl e kern el code & data active process & PT Eventually Reschedule Faulting

Thread disk (huge, TB) PT 1 VAS 1 memory kernel stack stack stack heap heap data data code

heap VAS 2 kernel stack heap data code PT 2 data code user page fram es user pagetabl e

kern el code & data active process & PT File IO File IO so far: Explicit transfer between buffers in process address space to regions of a file Overhead: multiple copies in memory, syscalls Alternative: Map file directly into an empty region of a process address space Implicitly page in file when we read it Write to file and eventually page it out Executable file is treated this way when we execute a process! Using Paging to mmap Files physical address Process virtual address instruction

retry MMU page# page fault exceptionRead File PT frame# offset contents from Operating System Create PT entries memory! Page Fault Handler for mapped region as backed by file

scheduler File mmap() file to region of VAS mmap system call May map specific region or let the system find one for you mmap Example #include /* also stdio.h, stdlib.h, string.h, fcntl.h, unistd.h */ int something = 162; int main (int argc, char *argv[]) { int myfd; char *mfile; printf("Data at: %16lx\n", (long unsigned printf("Heap at : %16lx\n", (long unsigned printf("Stack at: %16lx\n", (long unsigned $ ./mmap test Data at: 105d63058

Heap at : 7f8a33c04b70 Stack at: 7fff59e9db10 mmap at : 105d97000 is line one int)This &something); int)This malloc(1)); is line two int)This &mfile); is line three This is line four /* Open the file */ myfd = open(argv[1], O_RDWR | O_CREAT); if (myfd < 0) { perror("open failed!");exit(1); } /* map the file */ $ cat test mfile = mmap(0, 10000, PROT_READ|PROT_WRITE, MAP_FILE|MAP_SHARED,

myfd, 0); if (mfile == MAP_FAILED) {perror("mmap failed"); exit(1);} This is line one ThiLet's write over its line three This is line four printf("mmap at : %16lx\n", (long unsigned int) mfile); puts(mfile); strcpy(mfile+20,"Let's write over it"); close(myfd); return 0; } Sharing through Mapped Files VAS 1 VAS 2 0x000 instructions

instructions data data File heap 0x000 heap Memory stack stack OS OS

0xFFF 0xFFF 32-bit x86 Linux Virtual Memory Layout 32-bit x86 Linux Memory Layout Memory Mapped Files Extra stacks for multithreaded processes Shared Libraries Some Memory Allocations 32-bit x86 Linux Memory Layout ASLR: Address Space Layout Randomization Make it harder to exploit bugs to

compromise system 32-bit x86 Linux Memory Layout Kernel mapped into every process's address space Protection bits: pages can't be accessed in user mode Why? Faster than changing address space on every switch to kernel mode Linux Virtual memory map 3GB Total 128TiB 896MB

Physical 0xC0000000 Kernel Addresse s 0xFFFFFFFFFFFFFFFF 64 TiB Physical 0xFFFF800000000000 Canonical Hole User Addresse s 0x00000000 32-Bit Virtual Address Space Kernel Addresse s

Empty Space 0x00007FFFFFFFFFFF 128TiB 1GB 0xFFFFFFFF User Addresse s 0x0000000000000000 64-Bit Virtual Address Space Linux Virtual memory map 3GB Total User Addresse s

0x00000000 32-Bit Virtual Address Space 128TiB 896MB Physical 0xC0000000 Kernel Addresse s 0xFFFFFFFFFFFFFFFF One-to-One64 TiB Physical maps of 0xFFFF800000000000 "bottom" of Canonical Hole Physical

Memory Kernel Addresse s Empty Space 0x00007FFFFFFFFFFF 128TiB 1GB 0xFFFFFFFF User Addresse s 0x0000000000000000 64-Bit Virtual Address Space

Linux Virtual memory map 3GB Total User Addresse s 0x00000000 32-Bit Virtual Address Space 128TiB 896MB Physical 0xC0000000 Kernel Addresse s 0xFFFFFFFFFFFFFFFF Can temporarily

64 TiB Physical map higher 0xFFFF800000000000 physical Canonical Hole addresses Kernel Addresse s Empty Space 0x00007FFFFFFFFFFF 128TiB 1GB 0xFFFFFFFF User

Addresse s 0x0000000000000000 64-Bit Virtual Address Space January 2018 - Meltdown Possible to inspect contents of kernel memory if it's mapped into address space (even as user!) Fix: Kernel Page Table Isolation Use entirely different page tables when in user mode vs. when in kernel mode Problem: Address space change whenever an interrupt or syscall occurs! Change page tables Flush TLB unless it is tagged Reduced Performance, depends on syscall workload Interprocess Communication Thus far, we've said the following: Hard to cooperate across processes because they're inherently isolated (separate addr. spaces)

But this is good for protection Easy to cooperate among threads because they share an address space But this is bad for protection Have to use synchronization primitives like locks Interprocess Communication Allow two (or more) processes to exchange information with each other Why use this approach rather than multithreading? Keep most of the benefits of process isolation Expose processes to each other only through a carefully structured interface Ex: Google Chrome Design We've Already Seen Some IPC 1. Two processes share a file (e.g. both mmap it) Needs some synchronization, must structure file layout Con: Still involves entire kernel IO infrastructure What if file is only temporary, needed while processes are running? 2. Create a pipe

read and write on set of file descriptors. Still limited in how much can be written at once. Another IPC option 3. Open a socket to 127.0.0.1 (localhost) Nice if we ever want to deploy process on remote machine later on But lots of extra work: Packet/header assembly, checksum calculation, TCP ACKs, etc. UNIX Domain Sockets Open a socket connection with a local process Use familiar write/read calls to communicate But don't incur usual overhead of networking Optimized for processes on same machine Using Unix Domain Sockets Still need same sequence of syscalls: socket, bind, listen, accept to act as a server But socket address now corresponds to an object in local machine's filesystem Specify path on bind Why this approach?

Filesystem gives us a namespace: any process can specify path on connect (doesn't need to be a child of server) Filesystem enforces permissions Using Unix Domain Sockets #include int fd; struct sockaddr_un un; un.sun_family = AF_UNIX; strcpy(un.sun_path, "/home/oski/demo.socket"); fd = socket(AF_UNIX, SOCK_STREAM, 0); bind(fd, (struct sockaddr*)&un, sizeof(un)); Many Other Forms of IPC Named Pipes (FIFOs): Pipe interface, but given a name in the file system (man 3 mkfifo) Named semaphores (man sem_open) Message Queues (man 7 mq_overview) And more Summary Alternate view of loading a program: allocate space on

disk, load in pages only when needed Memory-Mapped IO: Map contents of file into virtual address space, store/load instead of read/write Inter-Process Communication: Structured sharing Pipes: Read/write ordered, in-memory buffer Unix Domain Sockets: Avoid networking overhead C Concurrency and Synchronization Standard approach: use pthreads, protect access to shared data structures Shared Memory Paradigm One pitfall: consistently unlocking a mutex int Rtn() { lock.acquire(); if (exception) { lock.release(); return errCode; } lock.release(); return OK; }

Other Languages and Threading Many other mainstream languages also focus on threads and shared memory But offer useful libraries and built-in features to make our lives easier Thread management libraries Thread pools Safer lock management Objects as monitors C++ Lock Guards #include int global_i = 0; std::mutex global_mutex; void safe_increment() { std::lock_guard lock(global_mutex); global_i++; // Mutex released when 'lock' goes out of scope } Python with Keyword More versatile than we'll show here (can be used to close files, server connections, etc.)

with lock: # Automatically calls acquire() some_var += 1 # release() called however we leave block Java Support for Synchronization class Account { private int balance; } public Account (int initialBalance) { balance = initialBalance; } public synchronized int getBalance() { return balance; } public synchronized void deposit(int amount) { balance += amount; } Every Java object has an associated lock for

synchronization: Lock is acquired on entry and released on exit from synchronized method Lock is properly released if exception occurs inside synchronized method Java Support for Synchronization Along with a lock, every object has a single condition variable associated with it To wait inside a synchronized method: void wait(); void wait(long timeout); To signal while in a synchronized method: void notify(); void notifyAll(); Newish programming language: Go "Goroutines": Lightweight, user-level threads Channels: Named message queues for communication among threads Key Idea: Prefer message passing over shared memory

Go Why this approach? Efficiency of a shared address space Tolerates many threads in one program Passing data through channels: no need for explicit synchronization Sender passes ownership to receiver Go Channels type chan struct { qcount uint // total data in the queue dataqsiz uint // size of the circular queue buf unsafe.Pointer // array of dataqsiz elements elemsize uint16 closed uint32

elemtype *_type // element type sendx uint // send index recvx uint // receive index recvq waitq // list of recv waiters sendq waitq // list of send waiters

lock mutex } Go Channels dataqsiz buf qcount Go Channels buf recvx Next slot to read from sendx Next slot to write to Go Channels

Rules much like for pipes Synchronization handled for us Use atomic ops or acquire channel's mutex Send: If recvq non-empty, pass value to first waiter and wake it up Otherwise, append element to buffer at sendx Send when full buffer (qcount == dataqsize) Put thread on sendq Schedule another thread Go Channels Recv: If buffer non-empty, read from recvx If thread on sendq, append its element to buffer, remove it from queue, mark as ready again Recv when empty buffer (qcount == 0) Put thread on recvq Schedule another thread Go Channels Closing a channel much like closing a pipe All waiting readers woken up, "receive" zero

value All waiting writers woken up, panic (~exception) occurs in each What if we give a channel a buffer length of 0? Synchronous channel Send blocks until another thread receives Remember this Slide? Simple One-to-One Threading Model Many-to-One Many-to-Many User-Mode Threads: Problems One user-level thread blocks on syscall: all user-level threads relying on same kernel thread also block Kernel cannot intelligently schedule threads it doesnt know about

Multiple Cores? No pre-emption: User-level thread must explicitly yield CPU to allow someone else to run Go User-Level Thread Scheduler Global Run Queue Local Run Queue Newly created goroutines Local Run Queue OS Thread (M) OS Thread (M) CPU Core CPU Core

Local Run Queue OS Thread (M) CPU Core Go User-Level Thread Scheduler Why this approach? 1 OS (kernel-supported) thread per CPU core: allows go program to achieve parallelism not just concurrency Fewer OS threads? Not utilizing all CPUs More OS threads? No additional benefit Well see one exception to this involving syscalls Keep goroutine on same OS thread: affinity, nice for caching and performance Cooperative Scheduling No pre-emption => goroutines must yield to allow

another thread to run Programmer does not need to do this explicitly Go runtime injects yields at safe points in execution Sending/receiving with a channel Acquiring a mutex Making a function call But your code can still tie up the scheduler in "tight loops" (Go's developers are working on this) Dealing with Syscalls Local Run Queue Running Grtn. OS Thread (M1) CPU Core What if a goroutine wants to make a blocking syscall? Example: File I/O

Dealing with Syscalls While syscall is blocking, allocate new OS thread (M2) M1 is blocked by kernel, M2 lets us continue using CPU Local Run Queue OS Thread (M2) CPU Core Blocking Grtn. OS Thread (M1) Dealing with Syscalls Local Run Queue Running Grtn.

OS Thread (M2) Blocking Grtn. CPU Core OS Thread (M1) Syscall completes: Put invoking goroutine back on queue Keep M1 around in a spare pool Swap it with M2 upon next syscall, no need to pay thread creation cost

Recently Viewed Presentations

  • Tools and Approaches for Modeling Human Exposures to

    Tools and Approaches for Modeling Human Exposures to

    • EOHSI/Rutgers Collaborators - Clifford Weisel - Tina Fan - Rob Laumbach - Charles Weschler - Leonard Bielory - Alan Robock - and many others…. • NYSDEC Collaborators - Christian Hogrefe - Gopal Sistla - Eric Zalewsky 26 * *
  • National Occupational Standards - SQA Accreditation

    National Occupational Standards - SQA Accreditation

    Need to secure commissioning and delivery function with new entity. Standards and Frameworks will continue but likely to be a new approach (Universal Services was devised by UKCES) Greater cognisance to be given to DA Policies on Skills, Qualifications and...
  • Cell Division Controls in Mammalian Cells

    Cell Division Controls in Mammalian Cells

    At least 4 different cyclin/CdK combinations active in diff phases: G1 (Cycling G1/S cell) kinase (Start) S-kinase M-kinase Cdk Cycl D Kinase Activity CdK4 Cycl E Cycl A Cycl B CdK2 CdK 1or2 CdK 1 _____ G1 S G2 M...
  • 5.3 Classification Classification  For centuries, humans have been
  • Classification and Taxonomy

    Classification and Taxonomy

    The whole name is in Italics or underlined The first name is the Genus name and the second name is the Species name The Genus name is Capitalized the species name is not capitalized The species name normally describes a...
  • DARBS UN ENERĢIJA. - lvg.lv

    DARBS UN ENERĢIJA. - lvg.lv

    Ep t Ep t Ep t Ep t a) b) d) c) Pamatojums: E = mgh h1 = at2 2 Pareiza atbilde. h1 h H h = H - h1 h = H - at2 2 Ek = Ab /...
  • Malignant hyperthermia - wickUP

    Malignant hyperthermia - wickUP

    When a MH susceptible patient is exposed to a triggering anaesthetic agent (suxamethonium or inhalants), continuous activation of the Ryanodine 1 (RY1) receptor occurs, resulting in supraphysiological levels of sarcoplasmic reticulum calcium release with a compensatory increase in activity of...
  • Plane geometry - Campbell County Schools

    Plane geometry - Campbell County Schools

    The length of a rectangle is 3 times the length of a smaller rectangle. The 2 rectangles have the same width. The area of the smaller rectangle is A square units. The area of the larger rectangle is kA square...