ISPD2005 Fast IntervalValued Statistical Interconnect Modeling And Reduction James D. Ma and Rob A. Rutenbar Dept of ECE, Carnegie Mellon University {jdma, [email protected]} Funded in part by C2S2, the MARCO Focus Center for Circuit & System Solutions New Battlefield: Manufacturing Variations CMOS scaling Good for speed Good for density BOX BOX BOX Bad for variation Bad for manufacturability Bad for predictability No longer realistic to regard device or interconnect as deterministic Continuous random distribution with complex correlations Slide 2 New Problem: Statistical Analysis Statistical static timing analysis

Propagate correlated normal distribution A limited number of operators: sum and maximum Statistical interconnect timing analysis Require a richer palette of computations Not easy to represent statistics and push them through model reduction algorithms max, max, ? ? Slide 3 Approaches to Statistical Interconnect Analysis Straight-forward Monte Carlo simulation Control-theoretic (model order reduction)

Repeat model reduction algorithms at the outermost loop General, accurate, but computationally expensive Based on perturbation theory [Liu-et al, DAC99] Multi-parameter moment matching [Daniel-et al, TCAD04] Circuit performance evaluation Low-order analytical delay formula [Agarwal-et al, DAC04] Asymptotic non-normal probability extraction [Li-et al, ICCAD04] Classical interval delay analysis [Harkness-Lopresti, TCAD92] Our interest new correlated interval technique for representing statistics inside algorithms Slide 4 New Interval Ideas Classical interval Define two end-points No inside information Unable to consider correlations lo -1

hi x 1 -2 xx -1 Affine interval Define central point and radius Keep source of uncertainties Handles correlations by uncertainty sharing -x2 -x1 1 x0 x1 x2 x = x0 + x1 1 + x2 2 R (xa) = | x1 | + | x2 | 2 Should be 0 ! x x = (x0 x0) + (x1 x1) 1 + (x2 x2) 2 = 0 Slide 5 Affine Arithmetic: An Overview x = x0 + xi i y = y0 + yi i

z = x0 + y0 + (xi + yi) i Results are still affine (accurate or conservative) z = x0 y0 + (x0 yi + y0 xi) i + (xi i) (xi i) Replace second-order terms with one new uncertainty term z = x0 y0 + (x0 yi + y0 xi) i + R (xy) Develop a library for most affine arithmetic operations More accurate or efficient approximations are also available Slide 6 From Intervals to Statistics Statistical assumption for the uncertainty symbols? Uniform distribution? Keep conservative bounds Not realistic for modeling manufacturing variations Choose normal distribution

= 0, 2 = 1 for each symbol i Probability not equal in the interval Model the central mass of the infinite, continuous distribution Essential assumption Mechanics of calculation for finite affine intervals are a reasonably good approximation of how statistics move through the same computations Slide 7 Putting Altogether: From Intervals to Algorithms Scalar-valued linear solve Backward substitution 3 0 0 0 4 2 3 0 4 x1 1 x2 = 10 16 x3 x1 5

x2 = 1 4 x3 Classical interval-valued linear solve Backward substitution Classical interval arithmetic [1 5] 0 [1 7] 0 [1 3] [1 5] 0 0 [6 2] x1 1 x2 = 10 16 x3 x1 x2 x3 = [0.3 55] [7.3 30] [8 2.7] Slide 8 From Intervals to Algorithms (Contd) Affine interval-valued linear solve 1 x1 3 + 1 + 2 0 4 1 + 23

x2 = 10 0 2 + 1 3 + 1 3 16 x3 0 0 4 + 1 3 x1 x2 = x3 Affine Classical 5 31 3 2 + 1.83 2.75 1 + 3.41 43 2.34 4 1 + 3 Sample matrix element intervals & scalar solve Cube for the range of x1, x2, x3 Polytope for the range of x1, x2, x3 Slide 9 Our New Approach: Affine Interval-Valued Statistical Interconnect Model Reduction [Ma-Rutenbar, ICCAD2004] Replace scalar computation with interval-valued computation by pushing intervals through chain of model reduction

Stop, and repeatedly sample a reduced set of intervals Continue with scalar-valued computation Obtain delay distribution Sampling Interval computation Reduced set of intervals Scalar computation delay Represent variational RLC elements as correlated intervals Slide 10 Interval Modeling of Interconnect Parameters Global variations inter-die Local variations intra-die Affect device and interconnect close to each other, in a similar way Linearized combination of global and local variations

Affine forms Affect all the device and interconnect, in a similar way R = R0 + (Ri i ) + (Rj j) + (Rk k) C = C0 + (Ci i ) + (Cj j) (Ck k) L = L0 + (Li i ) (Lj j) (Lk k) One variation source may contribute to multiple RLCs & lead to correlation Any variation can have positive or negative impact on RLC Slide 11 Interval-Valued AWE: 1st Generation LU decomposition Hankel matrix & vector Vandemonde matrix Solve for residues Poles/residues

Transient analysis Delay distribution Scalars Sampling Solve for poles Intervals MNA formulation Interval-valued MNA and LU for model reduction Interval-valued pole/residue analysis Mostly fundamental affine operations Compare intervals based on their central values Obtain a reduced, small set of interval poles and residues Sample and continue scalar transient analysis Monte Carlo sampling over this reduced model is very fast Similar approach for intervalvalued PRIMA Slide 12 Interval-Valued AWE: 2nd Generation 1st improvement Replace MNA formulation & LU decomposition with path-tracing for tree-structured circuits to compute interval-valued moments much

more efficiently 3 2 1 5 0 C 4 6 0 1 C 2 R 3 R 4 R C R 5 R 6 C 2nd improvement Stop interval-valued computation at moments, not poles/residues

Then switch to sampling and scalar-valued computation Slide 13 1st Improvement: Interval LU vs. Path-Tracing Path-tracing DC analyses for moments via depth-first search Tree topology does not change DFS only once Tracing order can be stored and remembered Interval estimation errors Like floating-point errors, but more macroscopic, not so easy to ignore The longer the chain of computation, the more errors LU decomposition range range Path-tracing Replace interval LU with interval path-tracing Reduce number of approximate affine operations significantly Improve greatly both efficiency and accuracy Slide 14

Sampling Moments Hankel matrix & vector & Vandemonde matrix Solve for poles/residues Poles/residues Transient analysis & delay distribution Scalars Tree & path-tracing Intervals Interval-Valued AWE: 2nd Generation A reduced, small set of interval moments via intervalvalued path-tracing Sample over moment intervals to produce a set of scalar moments Continue scalar computation, just like a standard AWE Monte Carlo sampling over the reduced model is very fast Similar approach for interval path-tracing-based PRIMA

Slide 15 2nd Improvement: AWE Interval/Scalar Tradeoff Pervasive interval computation Sampling Interval root finding Interval poles/residues Scalar delay Scalars Interval moments Intervals Interval MNA & LU Hybrid interval/scalar strategy Interval tree & path-tracing Interval moments Scalar root finding Scalar poles/residues Sampling 2nd generation Intervals 1st generation Scalars

Scalar delay Interval computation for large-scale near-linear model reduction Scalar sampling & small-scale nonlinear root finding Similar tradeoff for 2nd generation of interval-valued PRIMA Slide 16 Benchmarks 3 tree-structured RC(L) interconnects 6 21 variation symbols One global, shared by all RLCs Others local, shared by a cluster of nearby RLCs Relative of global / local vars

From 120 to 2400 elements Deterministic unit step input 1 2 3 4 5 20% / 10%, 10% / 20%, 5% / 30% Able to accommodate Any number of uncertainties, from most types of variation sources Any reasonable combinations of global / local variations Slide 17 2nd Generation: Implementation Interval arithmetic library and AWE/PRIMA in C/C++ Compare distribution of 50% delay Monte Carlo 2nd generation (statAWE/statPRIMA) vs. RICE 4/5 used in a simple Monte Carlo loop (RMC) Determine proper number of Monte Carlo samples using standard confidence interval techniques [Burch-et al, TVLSI93] Specify accuracy within 1%, with 99% confidence level ~ 3000 samples for each design combination

Sample RC(L)s Scalar AWE/PRIMA vs. RC(L) intervals Interval AWE/PRIMA Monte Carlo Sample intervals Slide 18 Pole Distribution At the end of 2nd generation interval AWE/PRIMA, an intervalvalued reduced model is obtained How well do the reduced interval model produce scalar poles? design0, 123 RLCs, 5% global variation, 30% local variation, 6 variation terms, 8th order AWE, distributions of 4 dominant poles on complex plane 10000 samples of RLCs 10000 samples in moment intervals Monte Carlo simulation Interval-valued estimation Slide 19 Accuracy & Efficiency Mean Delay Err Stdev Err Speed-up statAWE1.7% 1.8% 11

statPRIMA2.5% 2.6% 10 CPU time: 1 interval analysis 300 deterministic runs 25%Delay PDFs ex: 1275 RCs, 5% global, 30% local, 4 th order models 25% 20% 15% 10% 20% AWE Monte Carlo 15% 10% Interval 5% 5% 0% 0% delay PRIMA Monte Carlo Interval delay Slide 20 Interval/Scalar Tradeoff Compare 4 AWE

interval strategies Interval Path-tracing MNA & LU Moments I (2nd gen.) 3.5 3 2.5 2 1.5 1 Run time vs. mean error II IV I III 0.5 0 IV (1st gen.) Run time vs. std error II IV I III 3.5 3 2.5 2 1.5 1 0.5 0

0% log (run time) log (run time) Poles/residues III II 2% 4% 6% mean error 8% 10% 0% 2% 4% std error 6% 8% If ~510% error is OK, one can still use intervals pervasively 1st 2nd generation: ~10X less CPU, ~34X less %error Slide 21

Conclusions and Ongoing Work Affine interval model & statistical interpretation allow us to Improved 2nd generation Represent the essential mass of a random distribution Preserve 1st-order correlations among uncertainties Retarget classical model reduction to interval-valued computations Smarter interval linear solves and interval/scalar tradeoffs ~10X faster, and ~34X less %error Whats next? Works well for interconnect reduction but how general is the idea? Can we bring statistics into arbitrary CAD tools efficiently? In progress: interval-valued physics-based TCAD/DFM modeling Slide 22