Computational Performance Optimisation for Statistical Analysis of the Effect of Nano-CMOS Variability on Integrated Circuits

The intrinsic variability of nanoscale VLSI technology must be taken into account when analyzing circuit designs to predict likely yield. Monte-Carlo(MC-) and quasi-MC(QMC-) based statistical techniques do this by analysing many randomised or quasirandomised copies of circuits. The randomisation must model forms of variability that occur in nano-CMOS technology, including “atomistic” effects without intradie correlation and effects with intradie correlation between neighbouring devices. A major problem is the computational cost of carrying out sufficient analyses to produce statistically reliable results. The use of principal components analysis, behavioural modeling, and an implementation of “Statistical Blockade” (SB) is shown to be capable of achieving significant reduction in the computational costs. A computation time reduction of 98.7% was achieved for a commonly used asynchronous circuit element. Replacing MC by QMC analysis can achieve further computation reduction, and this is illustrated for more complex circuits, with the results being compared with those of transistor-level simulations. The “yield prediction” analysis of SRAM arrays is taken as a case study, where the arrays contain up to 1536 transistors modelled using parameters appropriate to 35 nm technology. It is reported that savings of up to 99.85% in computation time were obtained.


Introduction
Anticipating the impact of device variability on performance is a critical aspect of design procedures for integrated circuits.With nanoscale technology, "intradie" variability, causing behavioural variation from device to device within each die, is an important consideration [1].With some dimensions approaching atomic scales, intrinsic atomic scale variations such as line edge roughness and dopant granularity have become the main sources of such variation [2,3].These "atomistic" variabilities are random in nature and result in "within-die" fluctuation which cannot be disregarded.Traditional variability analysis, based on "worst case" corner models [4] and guard bands for parameter variations, is likely to be very pessimistic in its estimations of the effects of the variability.Consequently, new circuit analysis techniques are needed which adopt a statistical treatment of the intradie variability of device performances.This has led to much interest in statistical analysis which can take into account parameter variation in all parts of a design in a single comprehensive computational procedure, allowing the impact on yield to be efficiently estimated.
Because of the high dimensionality of the parameter space, it is very difficult to derive analytical models of largescale ICs for analysis.Sets of equations describing specific circuit performance parameters, like timing or yield, in terms of the huge number of parameters would be very difficult to derive analytically.Therefore, the use of conventional statistical methods, based on the analysis of such equations, has restricted applicability for the variability analysis of nanoscale ICs.With Monte Carlo (MC) analysis methods, the integrated circuits are simulated directly, and there is no need to derive differential equations that describe the dependency of the properties of interest on circuit parameters.The only requirement is that the circuit properties to be modeled are capable of being described by probability density functions (pdfs) that are dependent on the pdfs of circuit parameters and input variables.Monte Carlo analysis proceeds by generating a random sample of the value of each circuit parameter and each input variable, based on their known statistical properties.It then simulates the circuit thus obtained, using a finite difference analysis package, to compute a random sample of the circuit property of interest.The process is repeated many times to obtain a sequence of samples of the property of interest from which statistical models can be derived.
In nanoscale technology, "atomistic" variabilities are random and generally uncorrelated from device to device.According to Srivastava et al. [4], a critical form of variability is gate-length variability which occurs "interdie" due to variation in exposure time and "intradie" because of other lithography effects [5].Device threshold voltages will be dependent on gate lengths and channel doping concentration, and variation in these parameters will cause comparable amounts of independent interdie variability and spatially correlated intradie variability.Intradie variations will have therefore both spatially correlated and uncorrelated contributions.The effect of the correlation in "intradie" variation must be given consideration.
MC analyses are particularly suitable for nanoscale IC statistical simulation to achieve statistical estimates of properties of interest.General conclusions drawn from the analysis of many randomised samples of a circuit can be made representative of the effects of variability.To quantify these conclusions, statistical averages are produced.The statistical reliability of the conclusions generally improves as the number of circuit increases, though the rate of improvement can often be increased by carefully choosing the samples.MC techniques are especially useful for predicting the "yield" from many copies of the circuit when these are manufactured with typical accuracy and parameter variations.
A major problem is the computation required for carrying out sufficient analyses to produce statistically reliable averages.The introduction of intradie correlation into circuit models increases the computational complexity considerably.To simplify the computation, a form of Extreme Value Theory known as statistical blockade (SB) [6,7] may be employed to identify "rare event" circuits whose responses fall in the "tails" of their pdfs and are therefore worth analyzing.These circuits are needed for circuit yield failure predictions.Circuits in the "body" of the distributions are not analysed.The use of "quasi-Monte Carlo (QMC)" analysis with "low-discrepancy sequences" achieves further computation reduction [8][9][10].The use of "statistical behavioural circuit blocks (SBCB)" and "Principal Component Analysis (PCA)" to reduce the dimensionality of the analyses [8] also reduces computational complexity.PCA allows the effects of correlation within the parameters of each device and from device to device to be partitioned thus reducing the sizes of the matrices which characterize the correlation and making them easier to process.Despite these economies, the statistical analysis of large and complicated ICs requires computational resources that can only be provided by parallel and/or distributed computing, for example, by the Condor framework [11,12].This paper contains seven sections.The first is an introduction, and Section 2 deals with the use of MC techniques for the statistical analysis of nanoscale technology with the introduction of correlation due to the intradie proximity of components.Section 3 discusses two methods of reducing the dimensionality of the input parameter space to achieve computational efficiency in MC simulations, PCA and SBCB.Section 4 introduces the concept of Extreme Value Theory (EVT) and explores the algorithm known as "statistical blockade" (SB) [11].The potential for using this technique to achieve major computational savings is explored and illustrated by examples.Section 5 investigates the use of quasi-Monte Carlo (QMC) techniques and "low-discrepancy" sampling to achieve further efficiency improvements, over what was achieved in earlier sections with Monte Carlo circuit simulation.The effect of using a "Sobol" low-discrepancy sequence generator to replace the uniformly distributed pseudorandom number generator previously used to produce the required Gaussian variation is discussed and illustrated by example.Section 6 evaluates the results obtained with VLSI SRAM circuits, and Section 7 presents conclusions.The use of parallel processing for efficiently undertaking the intensive computation required will be discussed, taking into account the intrinsically parallel nature of massive MC simulations.

MC Simulation for the Design of Nanoscale VLSI Circuits
2.1.Monte Carlo Methods.Monte Carlo methods use repeated pseudorandom sampling of the behaviour of mathematical equations, or real or simulated systems, to solve mathematical problems or to determine the properties of systems [13].In this work, the systems are integrated circuits simulated using a finite difference circuit analysis technique, that is, the public licensed version of SPICE, NGSPICE [14].The transient behaviour of sets of highly complex multidimensional equations describing integrated circuits may thus be analysed by repeated random sampling to produce observations to which statistical inference can be applied to obtain information about the equations or systems.Monte Carlo methods use pseudorandom processes implemented in software.The simulation is not required to be numerically identical to the true physical process since the aim is to produce statistical results such as averages, expectations, and distributions rather than deterministic numerical measurements.The derivation of such results requires a "sampling" of the population of all possible modes of behaviour of the system.One of the most often quoted applications of Monte Carlo methods is the evaluation of multidimensional integrals [15] such as that in Here,  is the vector ( 1 ,  2 ,  3 , . . .,   ),  is some function of the  variables of , and  denotes the region of integration.() may be evaluated at regularly spaced points or uniformly distributed random points in -dimensional space as a means of evaluating the integral.The problem with regularly spaced points is that the number of them, say , must increase exponentially with the dimension  if the error is not to increase exponentially with .The error with regular spacing and a fixed value of  is known to increase as the th root of the order of magnitude [16].The error for one-dimensional "trapezoidal rule" integration with  regularly spaced points can be shown to be proportional to 1/ 2 , whereas the error for the -dimensional equivalent of the trapezoidal rule has an error proportional to 1/ 2/ .This is "the curse of dimensionality" [17].With regular sampling and fixed , as  increases, each dimension must be sampled more and more sparsely and less and less efficiently, since more and more points will have the same value in a given dimension.
Monte Carlo sampling avoids the inefficiency of the rectangular grids created by regular sampling by using a purely random set of  points uniformly distributed over the -dimensional region .The advantages of the Monte Carlo method with randomly distributed points now become apparent.The Monte Carlo error has the property that, for high dimensions , its error is proportional to 1/( √ ), which means that, to reduce the error by a factor of 10, the sample size  must be increased by a factor of 100.This effect is independent of the order of magnitude  when the input variables are statistically independent which means that there is no correlation between them.It can be simply illustrated by comparing the performances of regular sampling and Monte Carlo integration for the following integral: Figures 1(a), 1(b), and 1(c) show the error for  = 3, 4, and 5, respectively, for both regular sampling and uniformly distributed random sampling for a range of values of .The random sampling has zero correlation from dimension to dimension and from sample to sample.The increasing advantage of independent random sampling over regular sampling as  increases is clear.
Much research has been devoted to finding ways of decreasing the Monte Carlo error even further to make the technique still more efficient.One approach has been to use variance reduction techniques [18] which are able to reduce the variance of  by transforming it to another function whose integral is the same.Another approach is to use "quasirandom" or "low-discrepancy" sampling of the space .The use of such quasirandom sampling for numerical integration is referred to as "quasi-Monte Carlo" integration.Quasirandom sampling based on "scrambled nets" [18][19][20][21][22] has the property that, for "well-behaved" functions, the error becomes proportional to  −3/2 log /2 , which is much less than the 1/( √ ) for traditional Monte Carlo integration.Section 5 will discuss quasi-Monte Carlo methods.Other approaches use "recursive stratified sampling" [23] which breaks down a Monte Carlo calculation into a series of Monte Carlo subcalculations with feedback from each stage to decide how to best continue the series.Thus, the calculation is adapted to the application.In the case of integration, the region of integration is progressively divided up into subvolumes to concentrate the sampling into regions where the variance of the function is largest or of the most interest.This makes the sampling more efficient.In the example of the multidimensional sin function considered above, there would be a higher probability of samples occurring at the edges of the function, where the rate of change is greater than in the central part.The well-known "MISER" algorithm of Press and Farrar [24] is based on this approach.
The "Vegas" Monte Carlo approach of Lepage [25] is based on "importance sampling." The idea, when used for integration, is to use the pdf of the function to determine how sampling points can be concentrated in subspaces that produce values of the function that contribute most to the integral.Vegas has been amended to incorporate stratified sampling as well as importance sampling, and it also uses a form of variance reduction in subspaces where importance sampling is inappropriate because the sampling turns out to be too sparse [26].From these well-known references and many others on Monte Carlo integration, it is clear that much improvement in efficiency can be gained over the original Monte Carlo approach.
The ideas proposed for integration have inspired similar ideas for efficiency improvement when Monte Carlo techniques are used for simulation, and these prove to be especially valuable for VLSI circuit simulation where the dimensionality and complexity are very high.

Monte Carlo Simulation.
Monte Carlo simulation is the application of Monte Carlo methods to study properties of systems having stochastic components.It uses repeated pseudorandom sampling of input variables to determine the behaviour of some physical system as characterized by a computer model.In this work, the physical system is an integrated circuit modelled by SPICE, the input variables are component values which are variable due to the uncontrollability of manufacturing effects referred to in the introduction, and the behaviour we are interested in may be viability, or otherwise, of the circuit.With repeated sampling used to simulate the fabrication of batches of nominally identical integrated circuits with the specified component variation, the estimation of the probability of a circuit being viable, that is, that it works, can be considered an estimate of the expected "yield, " that is, the percentage of working circuits within a batch.The criteria that determine viability are many, including correct logical operation, the power consumption, and the propagation delay in the whole circuit or parts of it.
As argued in [18], Monte Carlo simulation can be formulated in terms of integration.The direct simulation approach provides an intuitive way of defining the problem, while transforming it into an integration formulation can be useful when studying theoretical properties of the estimators obtained, especially when variance reduction or quasi-Monte Carlo techniques are used.A comparison of the direct view of simulation versus an integration formulation can be summarised as follows.
The direct simulation approach samples observations of the random vector, , of inputs to and parameters of the simulated system and for each vector calculates the outputs that are of interest.We thus obtain a random distribution of each output measurement, say (), which is of interest.
The integration formulation of simulation samples the "source of randomness"  as a vector of independent uniformly distributed random numbers.This is transformed into a vector of observations  with the appropriate multivariate distribution (e.g., Gaussian) and covariance matrix  by calculating where  is the appropriate "inverse cumulative distribution function" (ICDF) and  is a matrix such that The covariance matrix  expresses the interdependency that exists between different elements of  and may be derived from assumptions about the effects of proximity as explained in chapter 2 of [4].There are many ways of deriving  for a given covariance matrix , the best known and fastest being Cholesky decomposition.The required matrix  is not unique, and an alternative is to calculate where  1  2 ⋅ ⋅ ⋅   are the eigenvalues and V 1 , V 2 , . . ., V  are the corresponding eigenvectors of . must be positive semidefinite to be a covariance matrix. is the matrix of eigenvectors and Λ is the diagonal matrix of eigenvalues as supplied by the MATLAB function: [Λ] = eig () .
In comparison to the Cholesky decomposition, the eigenvalue/eigenvector method is more intuitive, and a recent paper by Keiner and Waterhouse [27] gives a more efficient way of performing the same transformation.
Calculating the distribution, between limits, of (()) = ℎ() for an output measurement of interest, say , can now be seen as a process of integration over the range of the multidimensional vector .The integration formulation simply reexpresses the simulation in terms of input vectors, , of uncorrelated uniformly distributed pseudorandom elements which are transformed into input vectors  intended to be representative of the observations expected of true inputs and parameters.This approach has the immediate advantage of being able to model correlation between elements such as that may occur intradie on integrated circuit devices or interdie from sample to sample within a batch of integrated circuits.
The direct approach is clearly applicable when actual component or parameter measurements are available, or when they have been synthesised as for the transistor set provided by RandomSPICE [28].The randomisation is then achieved by selecting randomly from the sets of parameters provided.Although a pseudorandom modelling process of physical effects was used to generate the RandomSPICE parameters, they could have been obtained by direct measurement of real devices.Then a model of the parameter variation is not needed since the true natural physical variation, with its inherent distribution and correlation, will be fed directly into the series of simulations.
However, even when real sets of parameters are available, instead of using them directly, it may be advantageous to produce a model of them as the transformation of a smaller set of independent uniformly distributed random variables.Then the integration formulation may be adopted, based on models derived from real data.The models may be derived by employing Principal Components Analysis (PCA) to extract a smaller set of statistically independent parameters that may be transformed back to the complete set with little distortion.The dimensionality may thus be reduced, and each of the independent parameters may be modelled as the transformation of a uniformly distributed random variable.If the independent variables may be considered Gaussian, it is straightforward to model each of them as a transformed uniform random variable.The transformation to Gaussian is achieved by the Gaussian ICDF function.Transformations to independent random variables with distributions other than Gaussian may be achieved by different ICDFs.Multivariate versions of these functions allow correlation to be introduced.With the different distributions, essentially the same methodology with respect to statistical delay estimation as used with Gaussian, Pareto, and low-discrepancy sequences in this work can remain valid.
Apart from the likely reduction in dimensionality, this transformation of a direct simulation approach to an integration-like formulation allows a much larger set of randomised devices to be generated than are available in the original set.Hence more simulations may be run with different sets of parameters based on parameters obtained by measuring real devices.
Viewed in either formulation, Monte Carlo simulation samples the probability distribution of all the input variables and system parameters to produce many repeated versions of the system.These are in turn analysed to determine how certain key output measurements vary due to the input variability.A histogram of each key output measurement gives an estimate of its likely distribution, the estimate becoming more and more reliable as the number of simulations increases.Since the number of simulations must be restricted for practical reasons, the accuracy of these results is also limited by practicality.
Where valid assumptions can be made about the shape of the distribution, for example, that it is Gaussian or Chisquared, a maximum likelihood fit to the histogram can be made on such assumptions.Such a fit would produce a value of mean and variance thus allowing a pdf as shown in Figure 2 to be drawn.Assuming the measurement to be a delay that is required to be less than  for viability and a Gaussian distribution, the yield of the circuit can now be estimated as the area under the pdf "tail" from  to infinity which may be derived from the "complementary error function" evaluated at .
The accuracy of this estimation will depend on the number of simulations which must be limited.Unfortunately, the effect of the limitation will be most serious over the tails of the distribution, which is the part we are most interested in.For example, for a Gaussian pdf, the probability of being more than three standard deviations greater than the mean is (3) = 0.5 × erfc(3/ √ 2), where erfc is the traditional Gaussian "complementary error function." (3) is about 0.0013 meaning that, if 1000 circuits are simulated, we can only expect to find one value of the key output in this region.There would be a need to have many more than one value to have a chance of reasonable accuracy.This illustrates the need for complexity reduction techniques when applying Monte Carlo simulation to integrated circuit variability, and such techniques will be considered in the next sections.
In general, Monte Carlo methods proceed as follows.
(1) The characteristics of the input vectors are determined.
(2) Random vectors are generated with appropriate distributions and intercorrelation either directly or by transforming independent uniformly distributed random vectors.
(3) A deterministic computation is performed to simulate the behaviour of the system for each of the randomized input vectors.
(4) For each key output measurement, its pdf is estimated in the best way possible, given the inevitable limitations in the amount of data available.
(5) Deductions about the probability of certain events are made from the estimated pdf.The procedure of an IC's Monte Carlo simulation is described as follows.
(1) Find the statistical distribution for each parameter.
(2) Sample the statistical process to produce a value for each parameter.
(3) Parameterize one circuit and simulate it.
(4) Repeat for many copies of circuit and obtain the statistical distribution of a specific measurement.

Introducing Intradie Correlation.
Correlation between device parameters may be due to the intradie proximity of components, or other causes.The device parameters may be direct physical quantities, for example, resistance or capacitance, or they may be coefficients of principal component vectors which are ultimately transformed back to physical quantities.Define a matrix  mat of "column" vectors   , each containing  parameters: Each column vector,   , of parameters is for a different randomised circuit copy.Within  mat , there is a row-to-row correlation due to interdependencies between devices or the parameters of individual devices.To model intradie correlation between parameters due to proximity, let parameter  be defined for a point   = (  ,   ) on the chip.Many models of intradie correlation due to proximity have been proposed, including the widely known Pelgrom model [4] and two simpler approaches: "the exponential model" and the "Matern model, " as analysed and compared by Hargreaves et al. [5].The "exponential model" represents the correlation (, ) between parameters  and  as where (  ,   ) is the Euclidean distance between any two points   and   , in units of nanometers.This is assumed to be the Pearson correlation coefficient defined as where   and   are standard deviations of parameters  and , respectively, "cov" means covariance, and the rows are those of  mat .A value for  may be calculated from measurements of actual device parameters, or manufacturer's data.Clearly (, ) = 1, and if it may be assumed that at a distance of  nm the correlation reduces to some small value, say , it follows that Assuming  = 0.01 and  = 200, we find that  = 0.023.
Therefore, if we have a location   = (  ,   ) on the chip for each element , a value of correlation (, ) between rows  and  may be modelled.Hence, we obtain a "row-to-row" (parameter-to-parameter) correlation matrix: A similar matrix may be constructed for the more complex Matern model which was found [1] to be more accurate.To provide an example for the simpler model, assume that there are four devices at  1 ,  2 ,  3 , and  4 , each with a parameter whose mean and standard deviation are We need to generate, for each Monte Carlo randomised circuit , random variables (1, ), (2, ), (3, ), and (4, ) as the values of the four parameters to obtain a parameter vector .To generate a pseudorandom variable for each device with the right amount of intradie correlation, we first need to deduce, from , an appropriate row-to-row covariance matrix, which is as follows: Then find matrix  such that   ⋅  =  using the "Cholesky Decomposition" or the "eigenvector-eigenvalue method" [27], which calculates  with (5).
Finally, generate a multivariate (4-element) correctly correlated random vector as follows: where  is a 4 by 1 vector of independent random or pseudorandom variables.This is generated from a vector  of uniformly distributed random or pseudorandom variables in the range 0 to 1 as follows: where  −1 is the ICDF corresponding to the required pdf for the elements of .We can do this repeatedly for each randomised circuit , producing many column vectors   .
Introducing intradie variability into the multiple parameters of transistor devices is possible using the same approach as used for component values.In such applications, it is useful to partition a large correlation matrix into a number of smaller ones, each catering for one type of parameter or principal component.This is possible when there can be assumed no correlation between the different types or principal components.This is guaranteed for principal components.Clearly, a different value of  can be used for each partition.
The example below demonstrates the effect of correlation on statistical performance analysis of a binary full adder with behavioural models of NAND gates.Consider the binary full adder shown in Figure 3, whose on-chip layout is assumed to be as shown in Figure 4.
With one delay parameter for each behavioural NAND gate [8,9], as will be described in Section 3, there are nine parameters in total.Hence, there are just nine statistically varying parameters.Assuming the exponential model of intradie correlation with  = 0.007, the parameter-parameter correlation matrix  is found to be as in (17).
Repeating the same procedure with  = 10 (very low correlation) and 500 training circuits produces the histogram shown in Figure 6, whose mean is 8.647 ps and standard deviation is 0.318 ps.In comparison to the correlated case where  = 0.007, the mean remains approximately the same, but the standard deviation reduces from 0.345 ps to 0.318 ps, that is, by about 7.8%.

Dimension Reduction of MC Circuit Simulation
The assertion that the error resulting from MC analyses is proportional only to the square root of the sample size does not mean that the same sample size is appropriate to any circuit no matter how complicated it is.There are clear advantages in reducing the dimensionality of analysis problems including the simplification of the computational complexity of the analyses.This chapter deals with two methods of reducing the dimensionality of Monte Carlo analysis.The first is Principal Components Analysis (PCA) which transforms the random variables required to characterize a circuit to a reduced number of statistically independent variables.PCA is also useful as a means of introducing intradie and interdie correlations.The second is the use of statistical behavioural circuit blocks (SBCB) which substitute functional but computationally simpler circuit models for device-level analogue subcircuits.The aim of this research is to apply them in new ways which may be suitable for inclusion in the ongoing NGSPICE open-source project [14].

Principal Component Analysis (PCA).
PCA is a technique for transforming samples of  variables into samples of a smaller number of variables by exploiting interdependency or correlation among the  variables.Referring to a set of  parameter vectors ("feature vectors") in an  dimensional vector space, the transformation converts each parameter vector to a lower dimensional vector in such a way that no information, or little information, is lost.This means that the original vectors,   for  = 1, . . ., , can be reconstructed from the transformed vectors either exactly or with a small error.The elements of the transformed (reduced dimensional) vectors are the coefficients (or "loadings") of "principal component" (PC) vectors.PC vectors are eigenvectors of the  ×  covariance matrix  which are normalized and statistically independent.

Performing the Analysis.
Taking the summed squared differences between the elements of an original component ("feature") vector and a reconstructed one as a measure or the error or loss of information incurred by PCA, of all possible linear transformations to a lower dimensional space, PCA is optimal in minimising this error over all vectors.Further, the PC vectors are conveniently ordered in the sense that the first one has the highest variance and accounts for as much variability as possible.Each succeeding PC vector has lower variance, and therefore less importance, but has the highest variance possible while being uncorrelated to all the previous ones.
PCA may be carried out by eigenvalue/eigenvector decomposition of the  by  covariance matrix of the original vectors data matrix which is defined as where In this expression,   is the th element of vector   with   denoting the mean of   for each variable  over all  vectors, that is, the mean of row  of matrix [  ] for  = 1, 2, . . ., .Therefore,  is the covariance matrix of variations from the mean of each parameter and is unaffected by the means themselves.The means of columns are not subtracted from columns.If the eigenvectors of  are  1 ,  2 , . . .,   , with corresponding eigenvalues  1 ,  2 , . . .,   , these are easily calculated by the MATLAB statement "[, ] = eig()." This statement produces the matrix composed of all the eigenvectors as columns and matrix  whose diagonal elements are  1 ,  2 , . . .,   with all other elements being zero.It follows from the definitions of eigenvectors and eigenvalues (  =     for all ) that since eigenvectors are all orthogonal to each other and normal (i.e., of unit length) meaning that   is always the inverse of .Therefore,  =   , meaning that If matrix  is transformed to  as follows: then If any of the eigenvalues  1 ,  2 , . . .,   are zero and we remove their corresponding eigenvectors and corresponding rows of  in (22), the remaining eigenvectors are principal components which can represent all the original data, without any loss of accuracy.Equation ( 23) will reconstruct the original vector exactly from the nonsquare matrix  of reduced dimensional PC coefficient (loading) vectors.Taking this idea further, we can remove elements of  with their eigenvectors when the corresponding eigenvalues are small on the grounds that, if they do not significantly affect , they should not significantly affect  either.It may be shown that  will always be a positive semidefinite symmetric matrix (i.e., positive definite or singular); therefore, all eigenvalues will be real and positive or zero.The previous outline of PCA hides the obvious difficulty of deciding what error is incurred by removing components with nonzero eigenvalues which are considered small and how small an eigenvalue must be to be considered negligible.Such considerations of PCA are application specific and best related to the specific objective and how the error is to be quantified.

Application of PCA for Modelling Statistical Variation.
Assume that we have a database of sets of randomised device parameters which have been published by a manufacturer or generated on the basis of theoretical modelling.Taking the RandomSPICE [28] Toshiba NMOS database as an example, for each device there is a set of 201 randomised parameter lists each with 300 parameters.Many of the parameters are zero or fixed, but in general they need not be.We compute the 300 by 300 element covariance matrix  for the parameter sets after subtracting the mean for each parameter.The 300 eigenvectors and eigenvalues of  are calculated and then the original mean-subtracted data is transformed by projecting each vector onto the set of 300 eigenvectors.The first ten ordered eigenvalues are plotted in Figure 7 and are seen to fall off rapidly in magnitude after number six.It is therefore reasonable to delete all but the first six eigenvalues and eigenvectors to obtain the required reduced dimensional PC coefficient vectors.Figure 8 shows how the mean squared difference between original (mean subtracted) data and reconstructed data varies with the number of eigenvectors.This shows how the approximation improves with the number of eigenvectors and that the mean squared error falls to around 10 −20 when this number reaches about six.The modelling technique used by Glasgow University [29] explains this result.Similar results and graphs were obtained for the PMOS data provided by RandomSPICE.PCA analysis often needs some consideration of numerical stability because of the wide variation of parameter values.

Application of PCA to Reduce Dimensionality in MC
Simulation.PCA clearly has value in reducing the dimensionality of the randomisation required for MC analysis.It is now possible to randomize the principal component coefficient vectors rather than the complete list of device parameters, and then transform these back to a set of parameters for each device.An independent randomisation may clearly be performed for each device, where the effects of interdie or intradie correlations are not required to be modeled.
However, the modelling of intradie variability is afforded in a convenient way by the use of PC coefficient vectors, and the correlation introduced by proximity on the die can be conveniently applied to these.Interdie variability can also be introduced in this way.The approach is to determine a set of PCs for each device model and then to introduce correlation into the corresponding PC coefficient vectors for each device within a circuit or subcircuit, according to the exponential model outlined previously.The correlation matrix should be partitioned into a number of smaller ones, each catering for one principal component which will be independent of all the others.If the device model has  PCs, there will be  partitions, one for each PC.A different value of  can be used for each partition.

Statistical Behavioural Circuit Blocks (SBCB).
A statistical behavioural circuit block (SBCB) is a behavioural model of a device such as a transistor, or a circuit building block such as a gate or an adder.Such a block may be based on a combination of a look-up table and a simplified passive circuit.Its purpose is to model the most important aspects of the device's or circuit building block's behaviour, to an acceptable accuracy, with a relatively small number of parameters.An SBCB  may be defined as a dependent voltage or current source combined with a linear time-invariant "tau" circuit.Such a combination can be optimised to match a required delay and switching waveshape, and this approach has been found to eliminate difficulties with the time-step adjustment algorithm that is sometimes encountered when using MC analysis with versions of SPICE.

Look-Up Tables and Tau Modelling
. SPICE provides versatile dependent voltage and current sources [30] which can model switching behaviour which is a linear or nonlinear function of controlling-node voltages, branch currents, or time.These elements have many possible functions including "behavioural" voltage or current sources [30] and ideal delay elements.Ideal delay elements could be very useful in behavioural simulation, but their use for MC analysis raises problems which will be discussed later.A 2-input NAND gate ( =  NAND ) using an "e-element" look-up table to  1. Figure 9 shows the response of this NAND gate to a voltage pulse with finite (10 ns) rise time and fall time.
The use of such ideal voltage-dependent elements provides a good way to build up behavioural models suitable for augmenting with delay modelling for statistical timing analysis.It is also useful to employ voltage-controlled resistors to implement a switch-level MOSFET.
The tau model of a transistor has long been used as a simple behavioural model in many transistor optimisation tools for designing integrated circuits, such as TILOS [31], COP [32], and EPOXY [33].It is commonly used for both synchronous and asynchronous circuits [34].In its simplest form, each transistor is modelled as an ideal switch with appropriate on/off resistance and source, gate, and drain capacitance introduced by discrete capacitors each being determined from the gate dimensions.The gate delays between inputs  and  and the output  of a CMOS 2-input "pull-down" circuit may be modelled [34] by the  circuit in Figure 10.
The following expressions are obtained for  ↑ and  ↑ which are the time constants in the response to rising  and  transitions, respectively: Look-up tables, as exemplified in Table 1, and "tau models" as in Figure 10 are very simple but individually have limitations.Look-up tables can model limiting and nonlinear "slew rates" but not delay, whereas tau models are either restrictive if simple versions are used or complicated with more accurate ones.Time constants are achieved with characteristic exponentially rising or falling waveforms that may not be close to the true switching waveforms.However, combining these two techniques produces better behavioural models that are still quite simple.The tau model is used to produce the required delay, and the look-up table then modifies the waveshape with appropriately chosen entries.
If the output of a tau model is applied to the look-up table-defined e-element in Table 1, the result will be an output waveform affected both by delay and waveshaping according to the look-up table.
A curve fitting procedure can optimize the tau-model elements and the look-up table for the true waveshape produced by the device for which a behavioural model is required.
This "tau and delay" SBCB model may be used for statistical static timing analysis.In some ways, it is similar to  1. the "composite current source" (CCS) modelling technique produced by Synopsys [35] and the "effective current source model" (ECSM) provided by CADENCE Design Systems Ltd [36].CCS and ECSM use similar modelling techniques and are supported by comprehensive libraries of cell models and interconnect models [37], which are based on industrysupplied data.Some details of these models are commercial and not openly available.Similar approaches are the "blade and razor" model [38] and a patent by Cadence Design Systems [39].There have been many other publications inspired by the original idea of using controlled current sources with passive delay circuits to model the behaviour of interconnection and cells with respect to their timing delay, noise generation, power consumption, and statistical variability characteristics [40][41][42][43].
The "tau and delay" modelling approach proposed previously uses voltage-controlled voltage sources (VCVS) rather than voltage-or time-dependent current sources.The choice was made arbitrarily since we did utilise any existing libraries, though the dominance of interconnection delay in submicron technologies offers some justification for our approach.It is argued [42] that nonlinearised Thevenin or voltage-sourcebased driver cell models cannot easily be adapted to accurately modelling the effect of loads with highly nonlinear characteristics.Current source modelling has therefore been generally preferred for modelling the nonlinear aspects of the input-output relationships of cells and their interconnections.However, it is often pointed out (e.g., [35]) that VCCS-based models can easily be converted to VCVS-based models and vice versa.
Our SBCS modelling approach was used to produce a single model of each cell type.Each SBCS is characterised by parameters which may be randomised to simulate the effect of statistical circuit variation.The statistical characteristics of the randomisation (distribution, mean, and standard deviation) were derived from transistor-level simulations of true devices.In principal, a different look-up table is needed for each value of delay, but the required adjustments were found to be very small.

Behavioural Models for MC Simulation
. SPICE offers a large number of different ways of defining and randomising behavioural models, and we have suggested yet another alternative.The most sophisticated version of SPICE, HSPICE, does not often specify how certain features are implemented, and the open-sourced NGSPICE, though based on the same long-established analysis engine, does not have many of these features.All dependent sources in HSPICE have "ideal" delay as an option, which is trivial to achieve in digital circuits but unachievable exactly in analogue circuits.Modelling ideal delay in analogue circuits can cause difficulties with MC analysis since randomising such delays can cause SPICE to make increasingly slow progress and eventually to "hang." Investigations revealed that, when this happened, it was not the models, but the step-size selection that was to blame.If ideal delays are specified with high numerical precision, the relative timing of events on a single chip can appear to vary almost continuously.Coincident switching events when randomised may become different but very close together.The time-step adaptation algorithm will try to model the very small timing differences and thus generate exceedingly small time steps.Quantizing the Monte Carlo variation eliminates this problem which means that the randomisation should ideally be done with reference to the anticipated time-step size.However, the effect on the results of the statistical analysis must then be investigated.
The modified tau-modelling approach has been adopted for the statistical behavioural circuit blocks to be described in the next subsection.Matching MC randomisation of behavioural model parameters to the step-adaptation algorithm of SPICE is a matter deserving further investigation as there may be great economies and insights to be gained.

Building Up SBCBs.
SBCBs are used to replace true or accurately modelled subblocks to reduce the dimensionality and therefore the complexity of the analysis.By analysing a representative and random sample of fabricated gates, a delay distribution may be obtained whose statistical properties (pdf, mean, variance, etc.) can be used to define the SBCB.

Computation Reduction by Extreme Value Theory
Extreme value theory (EVT) [44] is a branch of statistics concerned with the estimation of probabilities which are "extreme" in the sense that the range of values of interest is many standard deviations from the mean of an assumed probability distribution.An algorithm known as "statistical blockade" (SB) [6,7] applies EVT to circuit analysis by eliminating or "blocking out" randomised parameter vectors that are considered unlikely to produce circuits that fall in the low-probability tails.The intention is that only the ones likely to produce "rare events" are analysed.In our application, the rare events are the circuit yield failure predictions which are extreme in the sense that they are on the "tails" of Gaussianlike probability distributions for circuit quantities such as overall delay.Because they are designed to be rare, reliable estimates of these failures by conventional MC techniques require very large numbers of randomised input vectors.
The idea of SB is to try to concentrate on parameter vectors that are likely to generate the "rare events" of failing circuits and block out or disregard the ones that are unlikely to produce such failing circuits.Many input vectors are generated, but only the ones likely to produce "rare events" are simulated.This partial sampling of the performance distributions is the basis of EVT.The computational complexity involved in introducing the bias and compensating for it is much less expensive than performing many uninteresting circuit simulations.The "blockade filter" is a standard classifier used in machine learning and data mining.It is trained by simulating a relatively small "training set" of randomized circuits and is further refined as more and more simulations are carried out.Statistical blockade with this recursive updating is intended to make estimation of rare event statistics computationally feasible.
The implementation of SB is initiated by a "seed" netlist which specifies the basic circuit with its SBCB blocks.This netlist also specifies which parameters are to be randomized and the statistics (mean, standard deviation, etc.) of the required randomisation.It can be divided into four parts.
(i) The training of an estimator for predicting circuit performance with minimal computation.This requires a sufficiently large training set of randomized circuits to be generated and analysed by SPICE.In our work, the coefficients of a linear estimator are calculated using the "pseudoinverse" approach, and there must be many more randomized circuits than parameters.The coefficients are computed to make the estimator minimise the sum of the squared differences between the estimated circuit output measurements and the "true" circuit output measurement, as obtained from SPICE, over the whole set of training circuits.(ii) The generation of a much larger set of randomized versions of the circuit, and the use of a classifier to "block" the versions that are not likely to be within the tail.The classifier consists of the "linear estimator" followed by a "threshold" comparison with a "start of tail" parameter.Only the circuit copies which are estimated to fall within the tail will be unblocked and submitted for analysis by simulation.(iii) The "recursive" refinements of the linear estimator as more and more simulations are carried out.When a sufficient number of nonblocked "tail" circuits have been analysed, a second estimator is calculated using the "pseudoinverse" technique.The second estimator is more accurate than the original estimator for the tail and may be used for more accurate blocking.The use of recursion can move the defined "start of tail" parameter further away from the mean: typically from two to 3 or 4 standard deviations.Through recursion, we can thus get more accuracy in more extreme parts of the tail.
(iv) The fitting of a Pareto Distribution (PD) to the measurements obtained from the nonblocked ("statistical tail") versions of the circuit.This is necessary because, with SB, the non-tail circuits are blocked (not analysed) so we can no longer use Gaussian statistics.Also, very few measurements will occur in the "far tail" even when large numbers of circuits are generated.The use of PD fitting to the rarely occurring "tail circuits" allows the prediction of likely yield without the very large number of circuit simulations that would be required with traditional MC analysis.
To illustrate the computation time savings that may be achieved when synchronous and asynchronous circuits employing SBCB blocks are statistically analysed by MC techniques with SB, a frequently used handshaking component in the asynchronous control circuits produced by the BALSA design package [45], that is, a C-element [46], was considered.The intention was to compare the speed and accuracy achievable with that of straightforward MC analysis.A binary full adder, using NAND gates as building blocks, and a 4-Phase Bundled Data Muller pipeline and a Muller "ring, " each using the C-element as building blocks, were also used as test circuits.The use of SB to analyse the switching delay of the output of a single C-element was found to reduce the computation time by about 98.5%, when the start of the distribution tail was defined to be two standard deviations (2) from the mean.
The accuracy of the linear estimator obtained with the start of tail defined 2 from the mean is illustrated by the scatter graph in Figure 11.It was obtained by applying SB to 1000 copies of the binary full adder circuit.The blue points represent the measurements which are in the tail and not blocked.
To obtain accurate predictions of behaviour further from the mean, recursion was employed to refine the accuracy of the original estimator using the results of nonblocked simulations.Figure 12(b) shows the effect of recalculating the estimator from the tail points, shown in Figure 12(a), as identified by the original 2 estimator.The experiment is applying recursion to the tail points obtained from blocking 1000 copies of the 4-Phase Bundled Data Muller pipeline circuit shown in Figure 13.
The estimated failure probability distributions shown in Figures 16(a) and 16(b) were obtained for the behavioural model of a single C-element, as shown in Figure 14, with parameters for each behavioural gate model extracted from transistor-level simulations referred to in Section 3.2.3.Figure 15(a) was obtained by traditional MC analysis and plots the yield failure probability against yield threshold, showing how the failure probability reduces as the permissible delay increases.The threshold value is measured in seconds relative to a reference delay 200 ps from the start of the analysis where the trigger occurs at 100 picoseconds from the start.The graphs show the distribution tails only, which are assumed to start at two standard deviations (i.e., 2×0.657 ps) from the mean which is 29.7 ps relative to 200 ps.Therefore, the graphs show a time scale from 31 ps (relative to 200 ps) and extending over 5 standard deviations.The graph shown in Figure 15(b) was obtained from MC analysis with SB and a Pareto fit to the tail assumed to start at two standard deviations from the mean, which was set to 31 ps. Figure 15(b) may be seen to be close to Figure 15(a) as obtained from traditional MC analysis with much greater computation.To assess the quality of fit, the two graphs are shown on the same axes in Figure 15(c).
It may be seen that the maximum difference in yield threshold delay between the two graphs for any yield failure probability is about 0.06 ps seconds, which is about 0.1 standard deviations.A more useful measure of difference is the maximum difference in yield failure probability.This cannot accurately be deduced from the graphs, but a resampling of the data plotted in one of the two graphs (since the sampling instants are different) revealed that this maximum difference occurred at a yield threshold of 31.23 ps and is equal to a probability difference of 0.002.This represents a discrepancy of approximately 14.2% from the probability 0.0129 predicted by the non-SB Monte Carlo simulation being used as a reference.Thus the maximum discrepancy in the yield failure probability is 14.2% which occurs when the yield threshold delay is 0.1 standard deviations.
Since a possible source of this discrepancy is the quality and suitability of the Pareto fit, some investigations were carried out.It was observed that one source of the discrepancy was the difference in mean and standard deviation of the Gaussian fits to the delay measurements produced on one hand by the non-SB MC simulations ("meanNB" and "sigmaNB") and on the other hand by the training procedure  ("meanTR" and "sigmaTR").These are used to determine the "start of tail" parameter at two standard deviations from the mean.The more accurate estimations "meanNB" and "sigmaNB" are available for producing the comparisons since a computationally expensive non-SB will have been carried out for test purposes.But in reality, only the less accurate "meanTR" and "sigmaTR" estimates (based on far fewer randomised circuits) will be available to the SB version and were therefore used in the comparison.As a test, "meanTR" and "sigmaTR" were replaced by "meanNB" and "sigmaNB, " thus eliminating two sources of discrepancy and allowing the suitability of the Pareto fit to be seen more clearly in Figure 15(d).It was found that the two graphs did indeed become closer, the maximum discrepancy being 0.00061 in a non-SB yield probability measurement of 0.0136, that is, a 4.5% discrepancy.A conclusion from this investigation is that the Pareto fit is capable of giving a reasonable approximation to a Gaussian tail, incurring error likely to be less than that resulting from other statistical estimates.Also, we concluded that there is scope for increased accuracy in the SB estimations, for example, by updating the estimates of mean and standard deviation as statistical analysis proceeds.
With a more accurate estimator, the "start of tail" parameter may then be redefined as two or even three standard deviations from the mean to obtain even greater time saving since even fewer circuits need to be analysed.This increases the possibility of finding measurements yet further from the mean, that is, "rarer events, " in reasonable computational time and allows a yet more accurate estimation of the statistics of the "far tail." Table 2 summarises the computation time savings that were obtained by applying SB to the statistical variability analysis of three of the circuits mentioned above.The computation was carried out on a standard desktop PC with a dual core 2.8 GHz Intel processor.A MATLAB program that harnesses an implementation of SPICE carried out the randomisation, implementation of SB, and statistical analysis.It may be seen that the most significant computation time saving, 98.7%, was achieved for a 4-Phase 3-stage Bundled Data Muller pipeline "ring, " with the start of the tail defined at two standard deviations from the mean.

Computation Reduction by Quasi-Monte
Carlo Techniques Var. of diff.against number of training crts.×10 −31   Var. of diff.was achieved in earlier sections, with Monte Carlo circuit simulation.Low-discrepancy sampling is the basis of "quasi-Monte Carlo" (QMC) techniques as often applied to multidimensional integration; therefore, this approach to circuit simulation may be referred to as "quasi-Monte Carlo" simulation.QMC methods are modified Monte Carlo methods where the input vectors are not totally random but are to a degree deterministic in that they conform to "low-discrepancy sequences" [47,48].A low-discrepancy sequence is a sequence of N-dimensional vectors which covers a finite space more uniformly than is achieved by Ndimensional vectors of independent uniformly distributed random elements.The discrepancy of a sequence of vectors is a measure of how the number of points they define in a multidimensional cube varies with the position and size of the cube.If the discrepancy is low, the same-sized cube will always contain approximately the same number of points wherever it is located, and the number of points will be proportional to the volume of the cube.A sequence of vectors of independent uniformly distributed random elements does not give low discrepancy when the dimensionality is high.It is known that the use of low-discrepancy vector sequences can achieve significant speed gains over standard Monte Carlo integration techniques by reducing the number of input vectors needed for a given accuracy [49].Similar gains are anticipated when QMC is used for statistical circuit simulation.
As mentioned in Section 2.1, traditional Monte Carlo methods with statistically independent input vectors are argued to have a convergence rate proportional to  −1/2 which is independent of dimensionality N. Quasi-Monte Carlo (QMC) methods are claimed to have a much better rate of convergence approaching proportionality with  −1 in optimal cases [50].A theoretical upper bound for the convergence rates of multivariate low-discrepancy sequences is reported to be proportional to (ln )  / [5], where  is the number of dimensions.QMC performance therefore decreases with the dimensionality .
Uniformly distributed numbers in the interval (0, 1) can be generated as pseudorandom numbers or quasirandom numbers, and the variables for all other distributions may be derived from these by means of the appropriate cumulative distribution function inversion.In practice the range must be restricted from (0, 1) to (, 1 − ) for small positive ; otherwise, Gaussian variables very far from the mean may occur with QMC and cause numerical instability.Examining a Gaussian pdf graph reveals that taking  = 10 −6 or 10 −11 restricts the transformed variables between 5 or 7 standard deviations, respectively, from the mean.
The MATLAB functions "Haltonset" and "Sobolset" are provided for constructing initial sequences of N-dimensional quasirandom vectors with the required properties.To avoid the undesirable effects of any correlations, especially in VLSI Design the initial segments, the random sequence obtained can be required to skip, leap over, or scramble values in the sequence as generated.Scrambling reduces correlations while also improving uniformity.These sequences use different prime bases to form successively finer uniform partitions of the unit interval in each dimension.Latin hyper-cube sequences, as used by SPICE, are generated by the "lhsdesig" function.Strictly, these are not LD sequences, but they nevertheless produce uniform samples of a sort.
As suggested in this work, the idea is to use a lowdiscrepancy sequence generator to replace the uniform random number generator as the source of randomisation in both the training and the recursive SB phases of RandomLA, the developed MATLAB software in the research.The choice of LDS will be "Sobol." First, we present an example that compares the effect of using QMC rather than MC for training the linear estimator.Then we investigate the effectiveness of QMC for MC simulation with and without Statistical Blockade.
To provide comparison for training, an SRAM32×1 array circuit was taken as an example that is described in more detail in Section 6.The convergence of the linear estimator training using MC and QMC was analysed and compared.
As shown in Figure 16, the QMC training converges more quickly and smoothly to an estimator producing the minimum "variance of difference" attainable, approximately 9 × 10 −31 , than MC training with pseudorandom vectors, which is shown in Figure 16(a).The measure is just the variance of the prediction error.With QMC training, the variance becomes close to optimum with about 300 training circuits, whereas the number needed for MC is about 700 training circuits.The reason for this improved training is probably the more reliable coverage of QMC, for a given number of circuits (sample size) across the whole domain of parameters which gives us more "tail" circuits, even without the advantages of Statistical Blockade.
Figure 17 shows the yield predictions obtained by MC and QMC analysis with and without Statistical Blockade and Pareto fitting for the "binary full adder" circuit in Figure 3.There are 36 transistors, and each was randomized based on two PCA components giving 72 parameters for the "transistor-level" statistical analysis.The nonblockade analysis was carried out, with MC randomisation only, on 3000 circuits which took 3497.1 seconds.The results were taken as a benchmark for comparison with both MC-SB and QMC-SB, though a benchmark close to this could have been obtained with about 700 fewer circuits using nonblockade QMC.
Both MC and QMC Statistical Blockades were applied using 300 training circuits in both cases.The estimator order, as always, was equal to the number of parameters, that is, 72 in this case.The analysis time for recursive SB with MC and QMC was 146.6 s and 120.6 s, respectively, achieving close to 99% savings in each case with statistical variation from run to run, depending on how many circuits are blocked; it is not uncommon for QMC-SB to take longer than MC-SB when the same number of circuits is specified.Where the criterion is accuracy and reliability, QMC reduces the required sample size.For a given sample size, the advantages of QMC with SB over "non-SB" are not as striking as those of MC-SB over MC without SB.More analysis is needed on this matter.As in Section 5, the effects of the difference in mean and standard deviation produced by the non-SB simulations and the training procedure can be seen in Figure 17.If better estimates were available, the graphs would be closer.For QMC, the maximum discrepancy in yield failure estimates between SB and non-SB is a probability difference of about 0.003 or 0.3%.For MC the discrepancy was less, that is, about 0.1%.

Results and Evaluation with SRAM Arrays
6.1.Effect of Correlation on Failure Yield Analysis of an SRAM8 × 1 Array.The 8 × 1 SRAM array shown in Figure 18 was analysed to predict its yield, subject to a maximum delay constraint.All "Bitlines" were set to "1, " the "bit" nodes were initialised to "0, " and the "Wordline" inputs switched from "0" to "1" at a fixed point in time, causing the "bit" outputs to switch to "1" with random delay.A SPICE harness in MATLAB randomised the device parameters and made repeated calls to SPICE to analyse the effect of these parameter variations on the delays at the "bit" nodes.
There are 48 transistors within the circuit.Transistorlevel simulations of the array were carried out to estimate the yield failure probability for different values of yield threshold when statistical variability is included in the model for each transistor.This was obtained by connecting the eight outputs from the array, all initialised to zero, to a behaviourally modelled NAND gate whose output switching delay was compared to the threshold.The statistics for the parameter variation were derived from analyses of the 35 nm transistor model data set provided by RandomSPICE.Two extreme cases were considered: firstly where there is assumed to be strong intradie cell-to-cell correlation between randomised device parameters of a particular type and secondly where there is no intradie correlation between devices from cell to cell.A graph obtained for yield failure probability against allowable delay threshold for the strong correction case is shown in Figure 19.The "vth0" parameters of all six devices within each of the eight cells were parameterised with  = 0.007.
As estimated by the MC runs with 3000 randomised circuits, the mean delay was found to be 16.7 ps from the input pulse edge occurring 20 ps after a timing reference.The standard deviation was found to be 0.376 ps.The graph, in Figure 19, starts at the "start of tail" which is two standard deviations from the mean of the distribution, that is, at 37.5 ps from the timing reference.
The results obtained from the noncorrelation case are presented in Figure 20.The eight cells were randomised by independent variables with negligible correlation ( = 10).The mean was observed to be 40.1 ps from the timing reference (i.e., 20 ps from the edge), and the standard deviation was 0.263 ps.There is a reduction in standard deviation from 0.376 ps for the strongly correlated case to 0.263 ps, which is a factor of 0.69.With independent Gaussian parameter  variations, the distribution of worst case cell delay which determines the yield failure probability is no longer Gaussian.It becomes a type of "Gumbel" distribution [51].The results indicate that the standard deviation reduces by a factor of 0.69, and the mean increases by 1.436 standard deviations.Comparing the strongly correlated case (Figure 19) with the noncorrelated case (Figure 20), it can be seen that the intradie  correlation can have a significant effect on predictions of yield failure probability.

Evaluation with an SRAM32×8
Array.Thirty-two copies of Figure 18 were cascaded to construct the 32×8 array shown as Figure 21.There are 1536 transistors within the circuit.The SBCB model of this array is established from the single cell model and connected together.The simulations were undertaken for the strong correlation case only.
Figures 22 and 23 present the graphs obtained from the simulations; the run times are given in Table 3.The observed delay mean and standard deviation are very close to the ones  in Figure 19 for the SRAM8 × 1 array, which is as expected for the strongly correlated case with all cells in parallel.Analysis of the results revealed the following.
(a) SB with Pareto fitting version is reasonably accurate and much faster in comparison to the "nonblockade" version.
(b) For the SB simulations with strong correlation, the number of wrong decisions not to block "W" was always close to zero for QMC with Sobol points, while the number for MC fluctuated between zero and about 20 when there were 3000 circuits being analysed.This is an indication of the accuracy of the linear estimator which was found to be better for "Sobol" vector training with under 200 training circuits than for MC with pseudorandom parameter vectors.The results also demonstrate that QMC with Sobol vectors can make nonblockade simulation more efficient, reaching a given accuracy with fewer runs than are required with MC.
(c) For the noncorrelation examples, the estimator was much less accurate for both MC and QMC training.This caused many more wrong decisions not to block.The results of these wrong decisions are discarded for the Pareto tail fitting procedure with some loss of efficiency.The behaviour of the linear estimator when adapting to the maximum delay criterion explains this loss of efficiency.
(d) The use of QMC with "Sobol" vectors makes nonblockade more efficient than with MC in that a given accuracy is achieved with fewer runs.
(e) QMC with SB compared with QMC alone offers further savings, but these remain to be fully analysed.
(f) When comparing overall time savings for MC with SB and QMC with SB, both the training and the analysis times must be taken into account.A lower number of training circuits were required for QMC simulations than for MC to reach a given estimator accuracy.
From the Gaussian distribution, it may be deduced that, if the delay distribution is Gaussian and the tail is assumed to start at two standard deviations from the mean, the percentage of unblocked circuits may be expected to be about 2.1%.Therefore, out of 3000 randomly generated circuits, about 63 unblocked circuits should be observed.Out of the first 500 circuits, about ten unblocked circuits should occur, and this observation suggests a simple adaptation mechanism for countering inaccuracy in the estimator.After a certain number of random circuits, say 500, have been generated, if the number of unblocked circuits is significantly different from what is expected, say 10, the tail threshold can be decreased or increased accordingly.The decision can be revisited later in the run, say after 1000 circuits, 2000 circuits, and so on.This adaptation was found to be useful in the noncorrelated examples presented in this chapter where the accuracy of the estimator was found to be lower than for the strongly correlated examples.Decreasing the threshold does not greatly affect the computation run time if the intention is to base the tail estimation on a specific number of circuits, say 2.1% of the total.This approach appears even more advantageous when higher deviations from the mean are to be examined, say three or more standard deviations.Instead of specifying a fixed number of randomised circuits, the simulations could be allowed to continue until a suitable number of unblocked circuits have been produced to allow a reliable estimation of the tail distribution.
The timing results quoted in this section are for singlecore nondistributed computation.The RandomLA SPICE harness has been developed in such a way that it may be run on multicore machines and distributed frameworks such as Condor.Using parallel or distributed computing facilities can achieve great time savings.For example, rerunning the simulations in this chapter on a dual-core PC achieves a time saving which is very close to 50%, that is, a factor of two reductions in runtime.Using Condor, the run time of the   transistor-level simulations of SRAM32 × 8 reduced from 47263.96 s (13.13 hours) to 720 s (12 minutes).

Conclusions
Monte Carlo (MC) analysis with analogue simulation is an effective tool for the statistical variability analysis of nanoscale IC designs, taking into account the effects of intradie correlation as modelled by well-known techniques.Computational complexity is a major problem, though a variety of adaptations to the standard MC approach can greatly reduce this complexity as demonstrated by examples based on the simple "exponential" model of correlation due to proximity.These examples indicate that disregarding the effects of intradie correlation may give pessimistic estimates of yield.The results obtained from the simulations of SRAM arrays demonstrate the potential of RandomLA to achieve computation reduction for yield analysis with a delay specification.The RandomLA software is highly suitable for parallel and distributed implementations, which have already been shown to achieve great computation time savings.

Figure 2 :
Figure 2: Gaussian probability density function of a circuit propagation delay and delay threshold .

Figure 7 :
Figure 7: Values of first ten ordered eigenvalues for Toshiba NMOS data.

Figure 8 :
Figure 8: Mean square difference between original (meansubtracted) data and PCA approximated data as the number of eigenvectors increases.

Figure 11 :
Figure 11: Accuracy of linear estimator.Tail defined to start at 9 − 12.Classification errors out of 1000: wrong to block: 4, wrong not to block: 2.

Figure 16 :
Figure 16: Error analysis of linear estimators in RandomLA training phase: (a) MC simulation and (b) QMC simulation.

From
non-SB crts.: Gaussian-pdf integrated from T to inf.00 * std from mean Yield-failure prob.

Figure 23 :
Figure 23: Comparison of yield analysis results of SB and non-SB for behavioural-level SRAM32 × 8 simulations for strongly correlated case, (a) MC and (b) QMC.

Table 2 :
Time saving illustrated by comparing simulations with SB to simulations without SB.

Table 3 :
Run times and time savings for MC/QMC simulations of the SRAM32 × 8 array (strongly correlated).