Fundamental roles of extreme-value distributions in dielectric breakdown and memory applications (minimum-value versus maximum-value statistics)

In this paper, a thorough review of minimum- and maximum-value statistical distributions is provided. Using the Weibull model (statistics of minima) and the Gumbel model (statistics of maxima) along with the respective scaling properties of their scale-factor and distribution-percentile with device area (size), the application of these two types of extreme-value distributions to dielectric breakdown (BD) and memory operations is discussed. In the case of dielectric breakdown, the minimum-value distribution (the Weibull model) provides an indispensable tool to establish a valid voltage/field acceleration model from experimental perspectives. On the other hand, recent advances in the introduction of maximum-value distribution (the Gumbel model) overcomes the shortcomings of the conventional practice of adopting the normal distribution to characterize memory functional operations and provides much needed mathematical rigor and physical insight particularly for the rapid growing field of resistive random-access memory devices.


Introduction
Extreme-value theory refers to a mathematical framework which describes statistical phenomena involving the extreme values of random variables, also termed the statistics of extremes by Gumbel. 1) Unlike widely used normal or lognormal distributions to characterize statistical phenomena based on the central-limit theorem, the statistics of extreme values forms a unique family of statistical distributions which can be either minimum-value or maximum-value distributions. 2) While extreme-value statistics has found a wide range of applications such as in mechanical engineering, earth science, and radioactive emission as well as flood and seismic analysis, its fundamental role in microelectronics research and industry is often under-appreciated or less understood. It is a common bias that a statistical distribution merely gives a mathematical description of repetitive experimental measurements. In reality, a distribution choice fundamentally reflects the physical underpinnings of a natural statistical process versus another. Moreover, extreme-value distributions themselves alone are not adequate to capture experimental data involving different sample areas (or sizes). A spatial distribution for the occurrence of random events of interest in device areas is often required for scaling and projection purpose. In this review paper, we will examine the fundamental characteristics of both minimum-value and maximum-value distributions as well as the relevant scaling properties of scale-factor and distribution-percentile with device area. Then, we will discuss the applications of the minimum-value distribution to dielectric breakdown. Particularly, we will show how the preservation of this basic extreme-value property of dielectric breakdown can provide an unbiased tool to distinguish one voltage(field) acceleration model from another, based on either experimental interpretation or theoretical formulation of such models. Next, we will discuss the recent advances in applying the maximum extreme-value distributions to memory operations in Sect. 4. An important distinction between dielectric breakdown and memory operation is that dielectric BD refers to failure probability statistics 3) with the logarithmic random variables whereas resistive random-access memory (RRAM) conductance 4,5) and static random-access memory (SRAM) minimum failure voltage (V min ) distributions 6) require the functional or reliability (or survival) probability statistics with linear random variables.
2. Extreme-value statistics and scaling properties 2.1. Minimum extreme-value distribution Weibull distribution, or the type 3 extreme-value distribution, possesses the unique minimum-value character. 1,2) It has been shown to be the most appropriate description 3,[7][8][9][10] of the random nature of dielectric BD in the form of the cumulative density function (CDF) given by: 11) where x is the statistical variable, which can be time-to-BD (T BD ) or voltage-to-BD (V BD ). α is the scale-factor at 63.2% and β is the shape-factor (a.k.a. the Weibull slope). This CDF is commonly given in the Weibull plot as In order to connect the experimental data obtained from one area (A 1 ) to another (A 2 ), a statistical spatial distribution for generated BD spots must be known. Poisson distribution is a first-order choice since it assumes uncorrelated and randomly distributed BD spots across a dielectric area. The functional probability function (or reliability function) for the formation of the first conducting filament as a result of BD is given by 1 e x p . 4

Poisson Poisson
Thus, an area-scaling relation can be readily shown as: for any two areas, A 1 and A 2 . The relation given in Eq. (5) is also known as Poisson area-scaling. Consequently, the scalefactor, α, inversely depends on area as this inverse relation is universally observed over all dielectric materials. The two respective relations in Eqs. (5) and (6)

Maximum extreme-value distribution
The type 1, or Gumbel distribution is a maximum-value distribution 1,2) which also satisfies both requirements outlined above. The Gumbel CDF as the survival probability is given: x where x is the statistical variable, e.g. current or conductance. η and σ are the scale-factor and shape-factors of the Gumbel model, respectively. Note here we use the notation R to distinguish the functional or reliability probability function (sometimes also called as survival function) from the failure probability function, F. This is an important distinction of Gumbel distribution from Weibull distribution. 6) Therefore, the choice in practical applications between Gumbel distribution and Weibull distribution is not an arbitrary one but rather fundamental. It strongly depends on the nature of a physical process which can warrant such a choice as we will discuss later in the applications of dielectric BD and memory operations. The linearized form (G), also known as the Gumbel function, of the CDF Gumbel model is given: Note that the scale-factor, η, corresponds to x at G = 0 or R = 36.8%. Analogous to dielectric BD, to establish a connection between device area and random variables x, e.g. RRAM conductance, a spatial distribution of formed filaments is required. 4,5) In this case, we also adopt the Poisson distribution to be compatible with the forming process rather than a binomial distribution model. 6) Using Eqs. (3) and (7), it can be shown that the CDF's of two devices are related to their corresponding areas, A 1 and A 2, as follows Alternatively, the area-dependence of Gumbel functions are given as As a result, the relation between the scale-factors, h A 1 and h A 2 of any two devices with different areas (A 1 and A 2 ), is given below 3. Applications to dielectric breakdown

Experimental characterization and reliability projection
Historically, the lognormal distribution was widely considered as the statistical model for the description of dielectric breakdown over almost 30-40 years prior to the advances of physicsbased models [7][8][9][10] at the end of the 20th century. These groundbreaking works based on percolation theory have provided the very foundation for the use of Weibull statistical model for dielectric breakdown. In addition, the direct experimental work using a large number of samples up to ∼4000 unequivocally demonstrates that Weibull statistical model is the appropriate model and thus rejects the lognormal distribution. 12) Dielectric breakdown (or loss of insulating property) occurs as generated defects due to applied stress form a percolating path between two electrodes. The shortest times or the lowest voltages to form such a path determines the extremes out of many nearly completed or unformed percolating paths. [7][8][9][10] In other words, those nearly formed paths would take longer times to complete their paths, even they may not be always experimentally measurable. Thus, the statistics to describe such a phenomenon represent the minimum-value distribution, a.k.a. the weakest-link statistics.
One common example of applications of the Weibull minimum-value model is to characterize the time-to-breakdown  (T BD ) distributions and their area-dependence as shown in Fig. 1 (a). 12) Using Eq. (5), the translated distributions of different areas to the reference area clearly reveal a universal distribution, proving the validity of Eq. (5). Figure 1(b) illustrates the areadependence of the scale-factors (α) follows a power-law with an inverse area-dependence. Note the good agreement of the extracted b values using two different methods. 11) This inverse power-law area-dependence of Eq. (6) reflects the fundamental property of the minimum-value distribution. It is critical to note that if the translated distributions do not merge together without sufficient overlap into a universal line as shown in Fig. 1(a), the use of this inverse power-law to extract the Weibull slope (β) is invalid as we will discuss in Sect. 3.3. Figure 2 illustrates the reliability projection procedure from statistical BD data collected at high stress-voltages or temperatures with small devices to low operating-voltages or temperatures with a product chip of larger areas. It is evident that a distribution choice is the first requirement to perform such a reliability projection. The important role of Weibull slopes (β) can be also seen in areaand percentile-extrapolation in Fig. 2. 11,13) 3.2. Identification and demonstration of correct voltage/field acceleration models More importantly, the selection of an appropriate acceleration model is the most significant in affecting the final projection results of lower-voltages at use conditions and has been subject to intensive debate over many decades. The scaling push for higher transistor performance has led to a drastic reduction in dielectric thickness (t diel ) and the shrinking of the reliability margin. 13) Moreover, the introduction of low-κ dielectrics (SiCOH) in BEOL interconnects 14) and new middle of line (MOL) spacer materials such as SiBCN and SiOCN 15) further raises reliability concerns as the BD properties of these new materials are not well known. To briefly review the T BD voltage (field) acceleration models, we summarize four models commonly discussed in the literature: the exponential law of field/ voltage dependence, i.e. E model, 16,17) the exponential law of reciprocal field/voltage dependence, i.e. 1/E model, 18,19) the √E model, 20,21) and the power-law voltage/field-dependence: 22,23) and m represent the voltage or field acceleration factors for these four models. These models are widely used to fit experimental data with an additional parameter, the pre-factors, T BD0 or Z in Eqs. (12)- (15).
A major limitation of dielectric stress for BD study is that the experimental time window usually spans from 1 to ∼10 4 s in wafer-level testing. To expand beyond this conventional time window, several different institutions 13,[24][25][26][27][28][29][30] have carried out long-term stress of up to three years at relatively lower stress-voltages. Nevertheless, though long-term stress over many years are certainly beneficial, its throughput is rather limited because of the long turn-around time. This exercise is clearly impractical for technology qualification and development including evaluation of multiple process recipes and material choices. Alternatively, without relying on direct long-term stress, the weakest-link principle (a.k.a. Poisson area-scaling) as the universal dielectric BD property can be used as a criterion to judge a most appropriate acceleration model. This is referred to as the self-consistent acceleration Poisson statistics (SCAPS) method. 22,23) This method was originally developed to demonstrate the powerlaw voltage dependence as the most reliable description of the acceleration model in SiO 2 gate dielectrics in the direct tunneling (DT) regime 22) and in the Fowler-Noidheim regime. 23) The application of this methodology for high-κ gate dielectrics 31) and MOL spacer materials 32) is shown in Fig. 3. This figure plots T 63 versus electric fields with several different areas in the form of four different models of Eqs. (12)- (15). First, the parallel lines from fitting experimental T 63 data indicate that the area-independent acceleration factors are consistent with the weakest-link BD principle. Secondly, a systematic deviation from the parallel lines means that the extrapolation lines of the T BD data from different areas would encounter a cross-over at lowervoltages. This would mean that T BD of smaller areas would be shorter than that of larger areas beyond the cross-over point, an unphysical result in a violation of the weakest-link property for a given model. Figure 4 shows a comparison of T BD voltage acceleration power-law model for both thin and thick SiO 2 stress in DT and FN regimes using both long-term (up to 3 years) and the SCAPS methodology. It is evident that the normalized T BD data from different areas yield the comparable power-law exponents in agreement with the results of long-term stress, demonstrating the validity of SCAPS methodology. Moreover, the normalized T BD data of SCAPS methodology covers a similar BD time span ⩾10 7 s without actually doing long-term stress. The results of these long-term stress for the demonstration of different voltage/field acceleration models are summarized in Table I. In addition, some researchers also employed a method of expanding time window to much shorter time from milli-seconds to nano-seconds to verify the power-law model in dielectric BD [35][36][37][38][39] and in RRAM set operation. 40) These results are also included in Table I.
It is interesting to note that while new spacer materials of SiBCN and SiOCN follow the power-law model, the acceleration of Si 3 N 4 dielectrics is best described by a 1/E model, an observation consistent with a previous publication using pulsed voltage tests and covering ten-orders of time. 41) We have also shown previously that BD data for SiO 2 stressed in the FN regime follows a 1/E or 1/V model, 23) but the powerlaw model can provide a good approximate description for the 1/E model due to the fact that 1/E dependence arises from current field-dependence. 23) Based on these results, we can rule out the E-and √E-models as the appropriate acceleration models. Coupling the long-term stress data with the BD data obtained from the SCAPS method, we can clearly see in   Comparison of experimental T BD data from long-term stress up to 3 years with the experimental T BD data using different areas based on SCAPS methodology as discussed in text. In the case of the SCAPS methodology, the measured T BD data from different areas is normalized to a reference area of 0.5 μm 2 using Eq. (5). m represents the power-law exponent defined in Eq. (15). The data of the long-term stress of 2.2 nm and 6.2 nm are from 13) and 24) with the device areas of 2.475 × 10 5 μm 2 and 10 5 μm 2 , respectively. Fig. 3 that the power-law acceleration model or 1/E model is the correct acceleration models for a wide range of dielectric materials. It is worthwhile to point out that in the case of applying the SCAPS methodology to support a voltage or field acceleration model, the verification of Poisson areascaling is also required as properly done in the case of FEOL gate dielectrics 22,23) and MOL spacer dielectrics. 32) 3.3. Common misunderstandings and mistakes regarding the use of the Weibull model Although the Weibull model has been widely accepted over last twenty years, the incorrect use of this model to extract its parameters (α, β) unfortunately spreads widely, especially in reliability community. First of all, many researchers and engineers fail to recognize that given a statistical model such as Weibull in Eq. (1), T BD is defined to be the sole random variable without any additional sources of statistical variables. In reality, as in the case of BD, random changes in dielectrics thickness (t diel ) from sample to sample can introduce additional random variable, t diel . This variation in t diel can become larger enough to cause the BD distributions to violate Poisson area-scaling relation in Eq. (5), particularly in back end of line (BEOL) and middle of line (MOL) applications. Figure 5 displays the t BD distributions from different sizes of BEOL line-via structures with low-k dielectrics. 42) As compared with Fig. 1(a). Figure 5(a) shows that the translated distributions fail to merge together according to Eq. (5) with the β values of 1.47 ± 0.27 by direct fit of these three distributions. In contrast, the T 63 -versus-area relation in Fig. 5(b) yields a fictious "β" value of 3.19 ± 0.89. It is evident that these two β values vastly disagree with each other. This disagreement stems from the fact that these distributions are actually non-Weibull with an extra random variable due to thickness variation even though they may appear to fit well with the straight lines. The variability issue and its impact on dielectric BD has been Table I. The number of publications supports the respective acceleration models. The number of data sets is given in the parenthesis. The data and the references considered here satisfy the criterion of T BD span in seconds ⩾7 decades.    extensively reviewed in. 43) As discussed above, thousand samples are required to distinguish a Weibull distribution from other distributions. To avoid the use of thousand samples, the application of Poisson area-scaling in Eq. (5) is required to verify whether the experimental data obey the weakest-link property such as the Weibull model. Because of violation of Poisson area-scaling as seen in Fig. 5(a), the use of Eq. (6) to extract the β value is completely invalid because the t BD distributions are non-Weibull even if they may appear to be Weibull-distributed. Unfortunately, this kind of practice without carefully verifying whether the BD data follow Poisson area-scaling widely spreads in reliability community and lead to much confusion and misunderstandings in literature. For example, in the case of use SCAPS methodology to support the E-model in BEOL low-k dielectrics, 34) the authors fail to check whether their T BD data actually fail to follow Poisson area-scaling. 34) However, they used the incorrectly extracted β value from area-scaling, Eq. (6) in the application of the SCAPS methodology for their T BD data. Therefore, their conclusion to claim the supportive evidence of E-model cannot be warranted. 34)

Applications to memory operations
RRAM technology has received a great deal of attention in its research and development for neuromorphic and in-memory computing. In this section, after reviewing the RRAM forming process, we will summarize the recent advances in applications of maximum extreme-value distribution to RRAM conductance 4,5) and SRAM V min statistical characterization. 6) Figure 6 shows the forming IV curve of a typical RRAM device as well as its switching IV characteristics. The switching operation between a lowresistance state (LRS) and high-resistance state (HRS) can occur us1ing a procedure of reset and set operations as shown in Fig. 6.

Two-stage forming process in RRAM devices
Analogous to dielectric breakdown, the forming process for the initial creation of a percolating filament by performing either ramp-voltage stress (RVS) in Fig. 6 or a constantvoltage stress (CVS) as commonly performed in BD tests. 4) These two methods are shown to be equivalent if the results of CVS tests are properly converted to voltage domain or vice versa using the appropriate conversion procedure. 44) Figure 7(a) compares the converted forming voltage distributions from CVS tests in good agreement with the directly measured V Forming 4) whereas Fig. 7(b) shows that the translated V Forming distributions by applying Eq. (5) merge into a universal line.
One of the important questions is whether forming process follows a two-stage process in RRAM devices 4,45,46) similar to dielectric BD as extensively reviewed in Ref. 43. Unlike RVS method with compressed voltage window, the advantage of CVS technique is that it reveals the filament evolution process involving two stages commonly known as progressive BD phenomenon. Figure 8 displays the current timeevolutions using a CVS test, 4) showing the initial current jumps defined as nano-path nucleation, the 1st soft BD (SBD) in the 1st stage. Then the currents of these samples continue to grow with substantial fluctuations and eventually reach to the current compliance in the 2nd stage. More multiple paths or filaments are generated in the 2nd stage and continue to grow in competition and eventually culminate into the completion of a switching filament as the current compliance is researched. Note many incomplete paths/ filaments remain as a result of stress as we will discuss their impact on variability in the last section. The schematics in Fig. 9 illustrates the forming process: (a) the initial generation of the nano-conducting path (1st SBD), (b) the generation of multiple paths and their growth and conglomeration. (c) the formation of the final conducting filament with many incomplete or nearly formed filaments.
The time-to-the nano-path nucleation (1st SBD) and the time-to the filament forming (HBD) are plotted in Fig. 10 for three different areas. These data are normalized to the smallest area according to Eq. (5). It is evident that the forming (HBD) distributions merge together according to Poisson area-scaling. A common misunderstanding in reliability community is that only Weibull distribution follows Poisson area-scaling or the weakest-link characteristics of dielectric BD as govern by Eq. (5). It is worthwhile to point out that Eq. (5) can directly be derived from Eq. (4) without knowledge of the specific CDF forms either Weibull model or Gumbel model. The merged universal distributions in Fig. 6(b) and in Fig. 10 clearly reveals some curvatures at high percentiles, i.e. non-Weibull distribution. This proves that Poisson area-scaling is not only limited to the Weibull distribution but also applicable to the case of non-Weibull distribution, consistent with Eq. (5).

RRAM switching conductance statistics
As discussed above in the case of dielectric BD or forming process, the first spot presents the complete conducting  filament among many nearly complete filaments. Consequently, this conductance or current through this filament is the highest in comparison with other filaments.
In this regard, the distribution of the measured conductance or currents represents the statistics of the maximum-values out of potentially completed filaments. This important aspect is sharply in contrast to dielectric BD in which a minimum extreme-value statistical process controls the outcome of the competition among many potentially formed filaments. This consideration renders two important requirements: first the cumulative probability function corresponds to the functional or reliability (or survival) probability distribution rather than the failure distribution as in the case of BD. Figure 11 shows some typical reset IV characteristics of a large number of RRAM devices in the LRS state after the forming process. To demonstrate the validity or applicability of the Gumbel statistical model with the maxima-value distribution, we plot the experimental distributions of RRAM conductance at V = −0.1V in the Gumbel plot in Fig. 12(a). The distributions are also translated using Eq. (10) to verify the Poisson area-scaling in the Gumbel plot as shown in Fig. 12(b). These results clearly reveal the translated distributions overlay well as expected from Eq. (10).
To further verify the validity of the Gumbel model in its application of RRAM conductance, we plot the scale-factor, (η) as a function of area in Fig. 13. The derived shape-factor (σ) using Eq. (12) is used to compare with the merged distributions in Fig. 12(b). The results clearly demonstrate the maximum-value distribution model can well capture the experimental data with reasonable agreement. These results support the validity of the Gumbel model as the maximavalue distribution to characterize the RRAM conductance or current statistics. Figure 13 also shows the forming-voltages as a function of area in comparison with the area-dependence of the LRS conductance. It exhibits the remarkably distinctive characteristics of the area-scaling of their respective scale-factors for the minimum-value distribution [Eq. (6)] and the maximum-value distribution [Eq. (11)]. This suggests any arbitrary choices to replace extreme-valued statistical process can be misguided in research and technology development with unwanted consequences.

Applications to SRAM V min statistics
Recently, the Gumbel model has successfully been applied to investigate the V min voltage distribution in the SRAM circuits. 6) The V min value is generally defined as the smallest supply voltage for which a SRAM array remains functional. At any voltages lower than this V min value, one cell of this SRAM array would fail by definition. Therefore, the V min values measured from a population of SRAM arrays actually represent the maximum-operational voltages from the viewpoint of SRAM functionality. This consideration leads to the realization that V min issue is consistent with the property of maximum extreme-value statistics. 6) The Gumbel model has been successfully applied to characterize SRAM V min distribution. Figure 14 shows the cumulative distributions of SRAM V min measurements for two different array sizes as well as the normalized distributions according to Eq. (10). These results clearly demonstrate the applicability of the Gumbel model to characterize the SRAM V min distributions.   Table I. The reference area is the largest area of 10 4 μm 2 . G 0 is the quantum conductance of 7.75´10 −5 S.
The slight deviation of the V min distribution for 20 Mb SRAM arrays at lower percentiles from that of 1 Mb arrays is believed to be caused by the silicon process variation from wafer-to wafer-edge. 6) It is interesting to note Poisson area-scaling of Eqs. (5) and (10)

Implication on variability
It is difficult to directly compare the shape-factors (β versus σ) of V Forming distributions versus LRS conductance distributions because the random variables are plotted in the log-scale and the linear-scale, respectively, as discussed above. Nevertheless, it can be seen that the results in Fig. 12(b) reveal that the spread of LRS conductance data is much larger than that of formingvoltages in Fig. 7(b) and time-to-forming in Fig. 10, particularly with respect to the results of the respective model predictions. The deviations from the merged distributions are particularly evident as shown in Fig. 12(b). This large difference stems from the inherent consequences of minimum and maximum extremevalue distributions which characterize the forming-voltage and RRAM conductance, respectively. For forming-voltage, its minimum extreme-value characteristics inherently exclude any effect of nearly formed percolation paths because these incomplete paths would require longer times or higher voltages to form as illustrated in Fig. 9(c) even though that they are not experimentally measurable due to the termination of the stress work. On the other hand, the maximum-value statistics merely guarantee that the minimum-valued conducting filaments are contributing the highest current in the conductance. Nevertheless, this does not necessarily prevent contributions from other nearly or unformed percolating filaments to the current measurements. Therefore, in the case of forming-voltage statistics, the variability arises solely from sampling uncertainties if external variability sources are excluded such as dielectric thickness variations. In contrast, in the case of RRAM conductance statistics, nearly or unformed filaments can all contribute to the total current as additional and intrinsic variability sources if the external sources are absent. The fact that the deviated conductance values at lower or higher percentiles are smaller than those of the model predication may suggest strong contributions from nearly or unformed filaments without any completed filaments in these particular RRAM devices.
In the case of SRAM V min statistics, the inherent variation is small as shown in Fig. 14 even though the distribution is controlled by a maximum-valued stochastic process. This is because the V min measurements correspond to the largest functional voltages while any smaller voltages are excluded analogous to the forming process or dielectric breakdown. These physical considerations cannot be realized without a solid understanding of the role of a statistical distribution model as discussed in the introduction.

Conclusions
We have reviewed the fundamentals of the Weibull and Gumbel statistical model (minimum-value versus maximumvalue distribution) in connection with the area-scaling  Fig. 12(a). The fit line using Eq. (11) yields the σ value to be 0.214 for the LRS conductance. properties of their respective scale-factors and distributionpercentile. In application of the minimum-value distribution such as the Weibull model, we show the use of the weakestlink character can allow us to objectively demonstrate a voltage/field acceleration. In characterization of forming process in RRAM technology, we show that the minimumvalue statistics can be either Weibull or non-Weibull depending on the nano-path nucleation to switching filament formation. On the other hand, the proper consideration of the physical random process of interest leads to a consistent and correct choice of the maximum-value distribution, Gumbel model, for RRAM conductance and SRAM V min voltage measurements in excellent agreement with experimental data. These examples demonstrate the fundamental roles of extreme-value distribution in dielectric breakdown and memory applications, laying the foundation for future applications and pave the paths to investigate the impact of variability encountered in realistic applications in RRAM technology development.