Processor Technology John Gordon, Peter Oliver e-Science Centre, RAL October 2002 All details correct at time of writing 09/10/02 Outline What is a CPU? Current Technologies CPU, memory and Motherboads Concentrate on x86 architectures INTEL AMD Longer view Parallel CPUs Overview of Other vendors SGI, SUN, COMPAQ, HP, IBM What is a CPU? (1) A CPU is comprised of Clock to 3GHz The controls how often an instruction can be performed Integer Units 1 to N Used to perform integer maths Floating point Units 1 to N
32/64bit arithmetic Memory Cache L1, L2 and L3 Instruction and data Caches L1 Typically 8-64k L2 128-512k L3 as large as 8MB Memory Bus Speed bus speed 100Mhz or higher Width What is transferred per cycle Typically 64bit What is a CPU? (2) Memory Architecture SDRAM One fetch per bus cycle DDR Two fetches per bus Cycle RDRAM Special Units SSE single precision (32 bits ) SIMD units SSE2 Double precision (64 bits) SIMD units SIMD single instruction multiple data Eg A(1:100)=A(1:100)*.08 Ie multiply each array element by 0.08. Brief History Nam e
Date Trans is tor s 8080 1974 6,000 M icrons Clock s pe e d Data w idth M IPS 6 2 MHz 8 bits 0.64 16 bits 8088 1979
29,000 3 5 MHz 8-bit bus 0.33 80286 1982 134,000 1.5 6 MHz 16 bits 1 80386 1985 275,000 1.5 16 MHz
32 bits 5 80486 1989 1,200,000 1 25 MHz 32 bits 20 32 bits Pentium 1993 3,100,000 0.8 60 MHz 64-bit bus 100
32 bits Pentium II 1997 7,500,000 0.35 233 MHz 64-bit bus ~300 32 bits Pentium III 1999 9,500,000 0.25 450 MHz 64-bit bus ~510 32 bits Pentium 4
2002 55,000,000 0.13 2.8GHz 64-bit bus ? 32 bits A thlon 2002 37,200,000 0.13 1.8GHz 64-bit bus ? Differences between Intel and AMD CPU Integer
FP Cache Special PIII 3 1 L1 16I/16D L2 512k SSE Athlon 3 2 + store L1 64I/64D L2 256k SSE (3DNOW!) Opteron 3
2 + store L1 64I/64D L2 ~1-2MB? SSE,SSE2 64bit PIV 2 1 L1 12I/8D L2 512k SSE,SSE2 Itanium 2 4 4 L1 16I/16D L2 256k L3 3MB EPIC IA-64 Current Technologies - Intel
(1) Intel Offerings Celeron and PIII SSE , single precision SIMD units BLAS libraries very fast (using ATLAS http://www.netlib.org/atlas/ ) 700Mflops DGEMM (70% of peak) for 1GHz PIII (256k L2) PIII dropped Celeron moved to PIV core as of 1.7GHz Chip SeeGHz Bus Speed L1 cache for latest L2 Cache http://www.specbench.org numbers Spec int Spec fp Celeron Celeron Celeron PIII PIII PIII 1.2 1.3 1.8 1.13 1.266 1.4 100 100 400 133 133 133
32k 32k 20k 32k 32k 32k (16k I 16k D) (16k I 16k D) (8k I 12k D) (16k I 16k D) (16k I 16k D) (16k I 16k D) 256k 256k 128k 512k 512k 512k 474 301 561 611 648 377 415 437
Current Technologies - Intel (2) Intel Offerings PIV, Xeon, Itanium (IA64) PIV BLAS libraries very fast (using ATLAS) SSEII - double precision SIMD 2.8Gflops for 2.2GHz P4 (using SSEII) Xeon PIV core with SMT (symmetric multithreading) Itanium 2 EPIC 3.5 Gflops DGEMM for 1000MHz Itanium 2 Chip GHz Bus Speed P4 2.2 400 (64bit) P4 2.53 533 (64bit) P4 2.8 533 (64bit) Xeon 2.4 400 (64bit) Itanium 2 1 400 (128bit) L1 cache 20k (12k I 8k D) 20k (12k I 8k D) 20k (12k I 8k D) 20k (12k I 8k D)
32k (16k I 16k D) L2 Cache Spec int Spec fp 512k 746 659 512k 896 861 512k 976 915 512k 824 803 256k / 3 MB L3 810 1356 Price of itanium prohibitively expensive Current Technologies - AMD (1) AMD Offerings Duron, AthlonXP and AthlonMP Duron, (phased out?) AthlonXP for single CPU AthlonMP required for Dual SMP SSE , single precision SIMD units
BLAS libraries very fast (using ATLAS) 2.4 Glops DGEMM (75% of peak) 1.6GHz AthlonMP Chip GHz Duron 1.2 AthlonXP 1.8 (2200)* 2.133(2600)* 2.25(2800)* AthlonMP 1.67 (2000)** 1.8 (2200)* * "Thoroughbred" Core Bus Speed L1 cache 200 (100*2) 128k (64 I 64 266 (133*2) 128k (64 I 64 266 (133*2) 128k (64 I 64 333(166*2) 128k (64 I 64 266 (133*2) 128k (64 I 64 266 (133*2) 128k (64 I 64 **Palomino Core D) D) D) D) D) D) L2 Cache 64k 256k
256k 256k 256k 256k Spec int Spec fp 428 428 738 624 813 655 898 782 618 544 699 592 Current Technologies Motherboards PIII, PIV and AthlonMP available in Dual form Both Xeons and AlthonMP cost more a 1.8GHz AthlonMP costs ~1.5x 1.8GHz AthlonXP a 2.4GHz Xeon costs ~1.5x 2.4GHz PIV A 2.4GHz Xeon costs 1.3x 1.8GHz AthlonXP 64bit/66MHz PCI for both Motherboard costs AthlonMP Tyan S2462UVM (SCSI, PCI, 100Mbit) ~x2 Xeon supermicro P4DP6 (SCSI, PCI-X,100Mbit) 1U rack mount routine Blades becoming available for even higher density
PCI-X 64bit/133MHz Very interesting for high speed interconnects Myrinet (www.myri.com), Wulfkit (www.wulfkit.com), Quadrics (www.quadrics.com) PIV Quad motherboards Expensive Limited memory bandwidth - bus based CPUs on the horizon Intel (1) Very difficult predicting the future Intel. Celeron 1.7GHz , 1.9GHz, 2GHz (128 L2) Q3 and Q4 (P4 core) single CPU only ? PIII 1.4GHz probably the last CPU? PIV/Xeon 3.06 GHz (512k L2) 533MHz bus (4*133) November 2002 3.2 GHz (512k L2) 533MHz bus (4*133) Q1-2 2003 XeonMP highend 256k L2, 1MB L3 1.6GHz 2GHz systems with 4 or more procs. PIV Prescott crystal ball gazing 3.2GHz, 4.0GHz , 1MB L2 , 666 MHz bus Q3 2003, Q4 2003 CPUs on the horizon Intel (2) IA-64 Compiler choice critical 32bit x86 code supported but how fast?
McKinley Itanium 2 1GHz 1.5MB-3MB L3 cache 400MHz bus (cf itanium 266MHz) very expensive 10 time the cost of PIV? Madison 1.2/1.6 GHz > 3MB L3 cache 2H 2003 CPUs on the horizon AMD (1) AMD Duron silently dropped? AMD AthlonXP and MP lines 2800 333MHz FSB, 256k L2 cache 3000 and beyond 333MHz FSB, 512k L2 cache Barton H1 2003 CPUs on the horizon AMD(2) AMD HAMMER Series 4th Q 2002 1st 2003 64bit x86 CPU with 32bit x86 native SSE and SSEII SIMD units AMD - 8000 Chipset (Hyper transport) PCI - X (133MHz) Parallel CPUs (1) Parallel high end CPUs Itanium 2 DMH (DDR memory Hub) Good memory bandwidth (6.4GB/s) Poor scalability, all shared! Parallel CPUs (2)
Parallel high end CPUs Hammer/Opteron 2 way 5.4GB/s memory bandwidth Non shared therefore scales well Ideal for memory intensive calcs Cc-numa problems for Linux? 4 way Motherboard Trends Previously we chose dual cpus Only low-end cpus not supported (Celeron, Duron) Boards not much dearer Vague feeling that PP work would saturate NIC and/or Bus >2 cpus In future Only top-end cpus supported (eg Xeon) Boards dearer Need to monitor whether increased costs are still offset by increased density Other vendors There are still other vendors in the market place SGI, SUN, COMPAQ, HP, IBM SGI R14k 600MHz , Specint 483, Specfp 499 Not very fast but large scale cc-numa SMP systems 1024 procs Moving to IA-64
SUN Ultra sparc III (1050MHz, Specint 537, Specfp 701 Speed OK but medium sized SMP systems COMPAQ/HP Alphaserver systems - EV68 1.25GHz , Specint 928, specfp 1327 fast cpu and systems 1 - 32 cpu SMP Moving to IA-64 with HP HP 750MHz PA-8600, Specint 569, Specfp 526 Heavily involved with IA-64 1GHz Itaniumn 2 Specint 807, specfp 1356 IBM Power4 Specint 804, Specfp 1202 Summary Dont just judge on clock speed. A long way from RISC Can we consider AMD for general-purpose user batch? Keep re-costing the optimal number of cpus/box. Keep watching blades. Everything will be different tomorrow! References In no particular order
www.ugeek.com www.amd.com www.intel.com www.aceshardware.com www.amdzone.com www.jc-news.com www.theregister.co.uk www.theinquirer.net www.top500.org www.theinquirer.net