CDA-5155 Computer Architecture Principles Fall 2000

CDA-5155 Computer Architecture Principles Fall 2000

CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures Review Protocols: reliable and heterogeneous networking Interconnect technologies/topologies Length, latency, diameter, blocking, deadlock, bisection BW, overheads, routing, congestion, connectionless? CPU interface to memory hierarchy vs. network (SPEC)

Standardization key for LAN, WAN Internetworking protocols used as LAN protocols IC revolutionizing networks and processors Switch is a specialized computer Amdahl: High BW networks with high overheads Overview High performance computing Parallelism

Taxonomy of multiprocessors Programming models Performance ASCI Accelerated Strategic Computing Initiative High Performance Computing Hardware and software El dorado - Attack of the killer micros Microprocessor: the most cost-effective processor Dynamic supercomputer market Timesharing workloads

Multiprocessor vs. high performance uniprocessor Performance and application domains Throughput (multiprocessing workloads) Timesharing, file, database, and web servers Response time (parallel applications) Single complex problem Computation/communication = f(#processors, data size) Parallelism Two or more things that happen at the same time Granularity - size of computations performed at the same time between synchronizations Carry lookahead adder Pipelined processor Two-way superscalar processor

Multiprocessor COW Levels of parallelism Bit level Instruction level Thread level Challenges (Amdahls law) Limited amount of parallelism in programs High cost of communication Parallel Computers Parallel computer: collection of processing elements that cooperate and communicate to solve large problems fast.

Questions about parallel computers: How large a collection? How powerful are processing elements? How do they cooperate and communicate? How are data transmitted? What type of interconnection? What are HW and SW primitives for programmer? Does it translate into performance?

Taxonomy of Parallel Computers Flynn: I & D streams Shared Memory Model Each processor can name every physical location in the machine via Load and Store Data size: byte, word, ... or cache blocks Process: a virtual address space (>= 1 thread of control) Multiple processes can overlap (share), but ALL threads share a process address space Writes to shared address space by one thread are visible to reads of other threads Usual model: share code, private stack, some shared heap, some private heap Performance

Latency, BW, scalability when communicate? Message Passing Model Nodes: whole computers (CPU, RAM, I/O) Communication: explicit I/O operations Send (local buffer, remote process) Recv (local buffer, remote process) Synchronization When send completes When buffer free When request accepted

Necessary even for 1 processor Shared Memory machine1 machine2 machine1 machine2 machine1 machine2 Application

Application Application Application Application Application Language run-time system Language run-time system

Language run-time system Language run-time system Language run-time system Language run-time system

Operating system Operating system Operating system Operating system Operating system Operating system

Hardware Hardware Hardware Hardware Hardware Hardware Shared-Memory SIMD Vector Addition 2 load pipes &1 store pipe

2 load/store pipes Distributed Memory SIMD Shared Memory UMA Bus-Based SMP Crossbar-Based SMP Sun Enterprise 10000 NUMA Bus-Based NUMA

ASCI Program Accelerated Strategic Computing Initiative Big impulse to the HPC industry Architecture: clusters of RISC-based SMP nodes Goals (1995 2004) 1 Teraflops: Intel/Sandia ASCI Red 3 Teraflops: SGI/LLNL ASCI Blue 10 Teraflops: IBM/LLNL ASCI White 30 Teraflops: ? 100 Teraflops: ? Intel/Sandia ASCI Red

160 m2 200-MHz Pentium Pro Nodes: service, compute, I/O, and system Six-link router chip (dimensional, wormhole routing) Link BW: 400MB/sec (full duplex) Top 500 HPC Computer Rmax (GFlops) 4938 IBM

ASCI White, SP Power3 375 MHz Intel ASCI Red 2379 SNL Mnftr Hitachi ASCI BluePacific SST, IBM SP 604e ASCI Blue

Mountain SP Power3 375 MHz SP Power3 375 MHz SR8000F1/112 SP Power3 375 MHz 8 way SR8000F1/100 Cray Inc. T3E1200 IBM SGI

IBM IBM Hitachi IBM Site Country Year Area # Proc Rpeak

(GFlops) USA Research 2000 Energy 8192 12288 USA 1999 Research 9632

3207 LLNL USA Research 1999 Energy 5808 3868 1608 LANL USA

6144 3072 1336 2004 1104 1656 LLNL 2144 1417 ANO

USA 1179 NCP USA 1998 Research Research 2000 Aerospace Research 2000 Weather 1035 LRM Germany

2000 Academic 112 1344 UCSD USA 2000 Research 1152 1728 917

HEARO Japan 2000 Research 100 1200 892 Govern't USA

1998 Classified 1084 1300.8 NAVOCE 929 Architectures CPU Processor Type Customer

Governt 2% 3% 5% 17% 49% 24% Performance Manufacturers

Recently Viewed Presentations

  • St Faiths School Association

    St Faiths School Association

    Tough Mudder £2514. Southdowns Way Cycle Challenge £2187 Christmas Tree Event £3079 (£3386) Quiz Night £2308 (£1527) Summer Fete £2700 (£2198) Film Night and Disco £696 (£793) Cake Sales £678 (£768) Sample Sales £3065 (£1636) Other (sponsorship, easyfundraising, PE bags...
  • Competency: 206.00 Draw Wall Sections Objective 206.01 Identify

    Competency: 206.00 Draw Wall Sections Objective 206.01 Identify

    Competency: 206.00 Draw Wall Sections Objective 206.01 Identify terms and definitions related to wall sections. Wall Sections & Details Terms & Definitions Anchor bolt - threaded rod inserted in masonry construction to anchor sill plate to foundation Blocking - Framing...
  • Warm-up: #2

    Warm-up: #2

    ROME Economic Reasons High taxes wiped out the middle class Trade and businesses went down High unemployment due to slave labor Political Reasons All of the fighting to be "emperor" Many different Emperors over a short time span Social Reasons...
  • Unit 18: Genetics and Genetic Engineering Unit 18

    Unit 18: Genetics and Genetic Engineering Unit 18

    Assignment 2. Cell division. Mitosis stages and details. Meiosis stages and details. Practical examining dividing cells (garlic root tip squash) How cell division can increase variation in reproduction.
  • Presentation title - this space can be used for inserting two ...

    Presentation title - this space can be used for inserting two ...

    KS EAS 39:2000 Hygiene in the food and drink manufacturing industry - Code of practice. NOTES: ... The pyramid illustrates how foods should be selected and indicates the foods that should be eaten more (at the base of the food...
  • A Toolbox for Easily Calibrating Omnidirectional Cameras

    A Toolbox for Easily Calibrating Omnidirectional Cameras

    Most similar work toour system. Passive & active markers. Problems: Cluttered environments. Low & dynamic lighting conditions [Censi, 2013] « Low-latency localization by active LED markers tracking using a dynamic vision sensor» Event based camera. Track LEDs blinking at kiloherz...
  • Profile - me.stier.org

    Profile - me.stier.org

    Send JFK a message Poke message Paul Revere , was a prosperous and prominent Boston silversmith, who helped organize an intelligence and alarm system to keep watch on the British military .
  • FHA Limited 203(k) Mortgage Program - MWF Wholesale

    FHA Limited 203(k) Mortgage Program - MWF Wholesale

    FHA Limited 203(k) Mortgage Program. Providing borrowers an affordable, stable financing solution that combines the purchase or refinance of the home along with the costs of the improvements into a single loan