Modeling aspects to improve the solution of the inverse problem in scatterometry

The precise and accurate determination of critical dimensions (CDs) of photo masks 
and their uncertainties is relevant to the lithographic process. Scatterometry is a 
fast, non-destructive optical method for the indirect determination of 
geometry parameters of periodic surface structures from scattered light intensities. 
Shorter wavelengths like extreme ultraviolet (EUV) at 13.5 nm ensure that the measured 
light diffraction pattern has many higher diffraction orders and is sensitive to the 
structure details. 
We present a fast non-rigorous method for the analysis of stochastic line edge roughness with amplitudes 
in the range of a few nanometers, based on a 2D Fourier transform method. The mean scattering 
light efficiencies of rough line edges reveal an exponential decrease in terms of the diffraction orders 
and the standard deviation of the roughness amplitude. Former results obtained by rigorous finite 
element methods (FEM) are confirmed. The implicated extension of the mathematical model of scatterometry by an 
exponential damping factor is demonstrated by a maximum likelihood method used to reconstruct the geometrical 
parameters. 
Approximate uncertainties are determined by employing the Fisher information matrix and 
additionally a Monte Carlo method with a limited amount of samplings. It turns out 
that using incomplete mathematical models may lead to underestimated uncertainties 
calculated by the Fisher matrix approach and to substantially larger uncertainties 
for the Monte Carlo method.


1.
Introduction. The investigation of micro-or nanostructured surfaces regarding their structure geometries and dimensions can be performed in a rapid and non-destructive way by the measurement and analysis of light diffraction. Nonimaging metrology methods like scatterometry (cf. Fig. 1) are in contrast to optical microscopy non diffraction limited and grant access to the geometrical parameters of periodic structures (cf. the cross section in Fig. 2) like structure width (critical dimension CD), period (pitch), sidewall angle or height of trapezoidal bridges (lines) [36,26]. An important application of scatterometry is the evaluation of structure dimensions on photo masks and wafers in lithography [44,43]. In the semiconductor industry both the feature sizes and the required limit of measurement uncertainty decrease continuously. Besides conventional metrology techniques like atomic force, electron and optical microscopy, scatterometry is an important tool for the characterization of such structures (cf. e.g. [56,19]). However, scatterometry requires a-priori knowledge for an appropriate modeling of the light diffraction pattern and reconstruction of geometrical profiles. Typically, the surface structure is sought in a certain class of gratings described by a finite number of parameters, and these parameters are confined to certain boxes.
The conversion of measurement data into desired geometrical parameters depends crucially on a high precision rigorous modeling by Maxwell's equations [42,2,10] which reduce to the two-dimensional Helmholtz equation if geometry and material properties are invariant in one direction. The typical transmission conditions of electro-magnetic fields turn into continuity and jump conditions for the transverse field components, and the radiation conditions at infinity are well established. For the numerical solution, a lot of methods have been developed (cf. e.g. [38,51,37,30,31,9,29]). We use the finite element method (FEM) and truncate the infinite domain of computation to a finite one by coupling with boundary elements (cf. e.g. [54,3,13] and compare the alternative approach in e.g. [45]). To compute highly oscillatory fields, generalized finite element methods are available (cf. e.g. [27,34,7,12]).
Apart from the forward computations by solving the Helmholtz equation, the solution of the inverse problem, i.e., the reconstruction of the grating profiles and interfaces from measured diffraction data, is the true task in scatterometry. This problem is related to the optimal design of diffractive optics (cf. [53]) and is similar to the inverse problems for wave equations in other fields of applications (cf. [50,11,25]). Like many inverse problems, the inverse problem of scatterometry is ill-posed [50] and prior knowledge is required. A common approach for its regularization is to set up an equivalent low dimensional optimization problem with a weighted least squares function of the residuals that is minimized using iterative algorithms [20,1]. The prior knowledge in this case includes information not only about the geometrical profile of the investigated probe, but also knowledge of the variances of the measured data, i.e., the statistical error model of the measurements has to be declared.
Typically the surface structure is modeled by a certain class of gratings whose geometrical profile and distribution of material properties is defined by a small number of parameters. A symmetric polygonal domain composed of a finite number of trapezoidal layers with different materials is an example for such a grating class (cf. Fig. 2). The values of the parameters are assumed to be restricted to intervals according to the known quality of the manufacturing process. If the optimization problem is solved by minimizing the least square function, then the weight factors of the squared residuals are usually chosen as the reciprocal measurement variances. Thus they represent the prior knowledge of the underlying statistical error model of the measurement process.
In previous publications [20,21] of the authors and [41] the reconstructions have been obtained with fixed weight factors representing a relative measurement uncertainty of about 1-3% for EUV measurements. A comparison of the reconstructed profiles using EUV-scatterometry and the results obtained using atomic force and electron microscopy [39] has revealed that scatterometry can underestimate the sidewall angle, an important feature of the EUV-mask, by several degrees. Imperfect modeling is supposed to be one of the main reasons for this result [20,40,28]. In particular, to get more reliable simulations and reconstructions, effects like line edge roughness (LER) and thickness variations in the capping and the multilayer system beneath the absorber lines have to be taken into account. But the impact of the weighting factors themselves on the reconstructed profile parameters has to be considered too. A wrong assumption about the corresponding measurement uncertainties can lead to an unacceptable bias in the solution of the inverse problem.
To overcome these pitfalls efficiently, we have employed a maximum likelihood estimation (MLE) (cf. [35]) and have extended the likelihood function first of all by including the statistical error model of the measurements. In this approach the measurement uncertainties of the scatterometric data are described as a standard deviation function that depends only on two independent constants. Along with the geometrical parameters of the mask, these two constants are treated as unknown parameters too, i.e., they have to be determined during the optimization of the likelihood function. The results for the critical dimensions are substantially improved if a MLE method is used instead of the standard least squares approach (cf. [23] for the details).
Torcal-Mila et al. [52], and also Kato and Scholze [28] have suggested approximative analytical expressions for the impact of line edge roughness on the scattered efficiencies. They have applied Fraunhofer's diffraction method on gratings with randomly disturbed periodicity, i.e., the Fourier transform of the reflectivity function of perturbed binary gratings were investigated. They found damping of the mean efficiencies with increasing diffraction orders, which was confirmed by rigorous FEM simulations for a real EUV mask [18]. In these FEM investigations, large computational domains containing many line-space structures with stochastically chosen widths were used. Hence, these results were still obtained for a simplified model of rough line edges, i.e., without modeling the line edges as a stochastic process with a prescribed autocorrelation function analogously to what is often done in the metrology of rough geometries. There are several publications [4,32,33], where the modeling of line edge roughness as a stochastic process starts with an exponentially decaying autocorrelation function for the position p(r) of an edge point at distance r: p(r) = σ 2 e −(r/ξ) 2α where σ is the standard deviation, ξ is the linear correlation length, and α is a roughness exponent (also referred to as the Hurst exponent). Randomized line edge profiles can be generated by calculating or approximating the associated power spectrum density function PSD(r −1 ) belonging to the autocorrelation function p(r) and subsequently applying an inverse Fourier transform with a random phase uniformly distributed in the range of [0, 2π]. For instance, Bergner et al. [4] use a similar approach to generate rough line edge profiles of 2D-binary gratings to calculate its impact on the angular dependence of the specular, 0th order reflectance at wavelengths around 633 nm. Torcal-Milla et al. [52] have investigated gratings with rough edges in the visible optical range by applying the Rayleigh-Sommerfeld approach for near field simulations and the Fraunhofer approximation for the far field pattern. They found light intensities with exponential attenuation factors depending on σ and the diffraction order. The values for σ they applied were in the range of several µm and they used a simplified statistical approach without considering a roughness exponent α. Schuster et al. [49] have studied the impact of LER for silicon gratings on the basis of sinusoidal perturbations for the line positions with amplitudes in the range of 2-8 nm and for wave-lengths of 400 and 250 nm, respectively.
One goal of the present paper is to resolve the LER-induced correction to the mean efficiencies for higher diffraction orders on the basis of randomization using PSD functions for roughness amplitudes of only a few nanometers. In other words, LER is controlled by an autocorrelation function depending on a standard deviation σ, a correlation length ξ and a roughness exponent α. The key concept is to replace the time consuming rigorous simulation for the trapezoidal line-space structures by approximate computations for the scattering by arrays of slits. These slits are strips with rough boundary lines and the light intensity can be approximately computed by a fast 2D-Fourier transform of the light distribution of a binary source applying the Fraunhofer approximation.
There is another important aspect, namely, to supplement the evaluation model of EUV scatterometry by extending the likelihood function. Indeed, the reflectivity of an EUV mask is also essentially affected by the capping and multilayer system beneath the absorber lines [20,22]. The substrate system acts at the usually applied small angles of the incidental light as a mirror. Thickness variations in the substrate system cause significant variations in the reflected efficiencies of the EUV photo mask. To enhance our reconstruction model, we additionally consider the heights of the two capping layers and the thickness of the multilayer system as model parameters of the likelihood function, which are to be determined together with the critical dimensions of the mask.
In Section 2 the EUV measurement setup is described and typical cross sections of EUV photo masks are presented to illustrate the critical dimensions and further parameters of EUV masks. Then the mathematical model of scatterometry is discussed in Section 3 including important extensions concerning measurement errors and line edge roughness. In Section 4 further investigations to prove the exponential damping of the efficiencies in terms of the diffraction orders and the standard deviation of the line edge roughness are presented. Section 5 describes the extended MLE approach and the methods to estimate the uncertainties of the reconstructed parameters. The scatterometric reconstructions of the CDs of an EUV photo mask are presented in Section 6. Improvements and impacts on the uncertainties are demonstrated. Using simulated diffraction patterns as input, the inverse problem is solved with the maximum likelihood method. Results obtained with two different mathematical models, the first complete and the second incomplete, are compared. Uncertainties are determined with two different methods for each of these models. Section 7 closes the paper with a discussion of the results and the conclusions.
2. Measurement setups and typical profiles of EUV photo masks. For EUV scatterometry we use the standard scatterometry approach, i.e., non-specular diffracted light is measured for different wavelengths and directions of the incoming radiation. The measurements are carried out using the EUV reflectometer [46]. The principle scheme of scatterometry is illustrated in Fig. 1.
The reflectometer is operated at the soft X-ray radiometry beam line [48] in the synchrotron radiation laboratory of PTB at BESSY. The beam line provides monochromatized radiation in the spectral range from 0.7 nm to 35 nm, including the EUV spectral range around 13.5 nm. The beam size at the sample is about 1 mm in both directions and the divergence is below 0.5 mrad. The area of interest is placed in the incoming photon beam and either wavelength, angle of incidence or the angle of detection are scanned. It is also possible to measure 2-dimensional scatter distributions by moving the detector out of plane with regard to the direction of the specular reflection. In order to evaluate the quality of the manufacturing process, we may assume that our sample is a simple mask with a periodic or doubly periodic surface structure. Here we consider the case of an array of periodically allocated lines over a Bragg mirror in form of a multilayer system (cf. the line-space structure in Figs. 1 and 2). The main part of the diffractive field, corresponding to an incoming light ray of a direction perpendicular to the direction of lines, is the superposition of a finite number of rays, which are called diffractive modes and point into the plane perpendicular to the lines. For further processing we only used the measured diffraction efficiencies (light intensities) for the diffraction modes, no diffusely scattered radiation. The upper graphics in Fig. 1 shows the efficiencies depending on the index of the diffractive modes. The indexing is in correspondence with the angle of the direction of the outgoing diffractive modes. For the structures investigated, EUV scatterometry offers the advantage of working in the regime with the wavelength much shorter than the characteristic dimensions of the structures to be investigated (a few 100 nm). Therefore many diffraction orders can be measured that provide information on the higher harmonics in the spatial frequency range corresponding to smaller structure details. Usually these diffraction orders are in the range of about ±10 for EUV scatterometry, since in this sector their light intensities are significantly greater than the background noise of the measurement setup.
Typical EUV photo masks have several fields of periodic line-space structures, with different critical dimensions and pitch (period). The absorber stack of the mask consists of three layers, i.e., a thin anti reflection (ARC) layer (TaO) on top, the main EUV absorber layer (TaN) and a thin buffer layer (SiO 2 ) which protects the underlying multilayer reflective coating during the manufacturing process of the mask. The reflective multilayer coating beneath these line structures consists of 49 periodically repeated groups of molybdenum (Mo) and silicon (Si) layers. The multilayer is terminated at the surface with a slightly thicker Si-capping layer. In the open reflective areas (spaces between lines), this capping layer is also protected by a thin SiO 2 layer. Fig. 2 shows schematically this design together with some of the most important profile parameters like top-CD, bottom-CD and height of the absorber layer.
3. Mathematical model of scatterometry. Scatterometry is characterized by the propagation of electromagnetic waves through different materials. Maxwell's equations give a general description for the propagation of electromagnetic waves. In scatterometry it is common to use time-harmonic fields. Applied to our specific geometry, i.e. for geometries invariant in one direction (z-direction) and for incoming waves in directions perpendicular to the z-direction, the time-harmonic Maxwell equations reduce to the two dimensional Helmholtz equation Here, u is the transversal field component (z-component of the electric or magnetic field depending on the polarization) that oscillates in the z-direction and k (x, y) is the wave number that is assumed to be constant for areas filled with the same material. In particular k = k + and k = k − is constant for y sufficiently large and small, respectively. The incoming ray is modeled by a plane wave mode (x, y) → c exp(j k + D · (x, y) ) with a direction vector D and satisfies the quasi periodic boundary condition u(x+ d, y) = u(x, y) q for the constant q := exp(j k + D · (d, 0) ) and the period d of  the surface structure. Additionally to (1), the solution u has to satisfy the quasi periodic boundary condition and the radiation condition in the upper and lower half planes. The latter condition means, roughly speaking, that the diffracted field is a superposition of quasi periodic modes propagating into a direction away from the surface. The resulting boundary value problem is solved by a finite element method for elliptic partial differential equations (cf. e.g. [54,3,13]). We used the software package DIPOG 1 for our calculations. Now it is not hard to see that the solution of our boundary value problem in the upper and lower half plane is the superposition of a finite number of quasi periodic plane wave modes plus an evanescent wave decaying exponentially in the direction away from the surface. The plane wave modes are called diffractive modes and their indices (orders) are chosen in accordance to their direction. The zero index mode is the reflected and transmitted mode in the direction of the modes according to Snell's law for the reflection by a simple planar surface, and with growing order the angle between the plane wave direction of the mode and the surface normal increases. Now the efficiency of the diffractive mode is nothing else than the ratio of energy from the incident wave propagating into the plane wave mode divided by the incoming energy. Here energy means the amount of energy passing a unit area parallel to the surface plane per unit time. Hence, in the theoretical model energy is radiated only in a finite number of directions corresponding to quasi periodic plane wave modes. In practical measurements, the periodic structure of the grating leads to several reflection peaks that are related to different diffractive orders. Reflection efficiencies are given in turn by the ratio of the peak maximum to the intensity value of the incident beam.
Finally, the mathematical models for the measurement process consist of a nonlinear operator that maps the geometry parameters onto the aforementioned efficiencies. The unknown parameters are mapped into the measured efficiencies. The vector of parameters p := (p j ) N j=1 fixes the grating geometry and the solution of the boundary value problem for every set of parameter values gives a vector of efficiencies values f 1 (p), . . . , f M (p), defining point-wise the nonlinear operator ( Additionally, there is a noise on the measured efficiencies, which is assumed to be Gaussian distributed. Unfortunately, in realistic geometries the ideally periodic line-space structures are slightly perturbed. In previous studies with stochastic perturbations of absorber line shapes (line widths) and positions (x-coordinates of lines), it was shown that there is a significant systematic damping for the measured efficiencies especially of higher diffraction orders [28,52,15]. In a Fraunhofer approximation an analytical expression for the efficiencies f j of the perturbed masks was derived, i.e.
where k j = 2πn j /d, n j is the order of the diffractive mode and d the period of grating. The symbol σ r denotes the variance of the line edge position. In rigorous finite element simulations the expression for the analytical damping factor was confirmed [18]. Note, that these results were obtained by means of a simplified model of line roughness, i.e., by using computational FEM domains of large periods for the cross section of the EUV mask containing many line-space pairs with stochastically chosen widths, but still with straight edges. In the subsequent Sect. 4 we present a new alternative method for the analysis of stochastic line edge roughness described by power spectrum density functions. The boundary value problem is solved approximately in full 3D by a fast non-rigorous method.
Recently, investigations on the reflectivity of the EUV multilayer mirror have revealed high sensitivity of light diffracted efficiencies against multilayer variations. This comes from the fact that the multilayer of our geometry is adjusted to the wavelength of the incident beam (λ = 13.5 nm) and small deviations yield a strong loss of reflectivity [22]. The multilayer consists of 49 times four periodic layers as given in Fig. 2. To model multilayer roughness we de-tune the multilayer system (MLS) multiplying all the layer heights by a scaling factor κ, which is handled like the set of geometry parameters. The choice κ = 1 leaves the MLS unperturbed, whereas, for example, κ = 1.01 stretches the thickness of all layers of MLS by one percent. A similar high sensitivity is observed for thickness variations of the two capping layers covering the MLS. Therefore, we included the parameters h fc and h sc for the heights of the first (SiO 2 ) capping layer and the second (Si) capping layer as additional parameters in the mathematical model.
There are additional stochastic errors besides the influence of profile geometry, the absorber line roughness and MLS variations on the efficiencies. From experimental experience it is known that, during the time of a measurement, power fluctuations of the incident photon beam affects randomly the output signal. A second random contribution is given by the background noise of the measuring instrument itself. To summarize, our complete mathematical model for scatterometry measurements is given by where the error j is Gaussian distributed with a standard deviation according to The first term on the right-hand side models the influence of power fluctuations and the second the background noise. A simplified and incomplete model is obtained from Eq. (5) depends on the distance r = |y 1 − y 2 | only, i.e. x(y 1 , y 2 ) = x(r). Furthermore, we assume the exponentially decaying autocorrelation function where σ is the standard deviation of the edge positions, ξ is the linear correlation length along the line, and α is called roughness exponent. Recent publications, e.g. Mack and Bergner et al. [4,32,33], are starting with such an exponentially decaying function for modeling a stochastic process which ends up in randomized line edges or surfaces. Our investigations are addressing the 1D case of randomly rough lines. Randomized line edge profiles x are generated by calculating or approximating the associated power spectrum density function PSD(r −1 ) that belongs to the autocorrelation function x(r), and then applying to the calculated PSD a random complex exponential phase term, being uniformly distributed in the range of [0, 2π]. Subsequently, the inverse Fourier transform of that disturbed PSD provides a rough line edge profile x. By this means the rough line edge shown in Fig. 3 was generated for a standard deviation σ=1 nm, a roughness exponent α=0.5 and a correlation length ξ=10 nm. Fig. 4 demonstrates the impact of different roughness exponents α on the PSD and the generated rough lines for a correlation length ξ of 50 nm. The edges of the aperture slits are created by repeating this process independently from each other. Bergner et al. [4] use a similar approach to generate rough line edge profiles of 2D-binary gratings to calculate its impact on the angular dependence of the specular, 0th order reflectance at wavelengths around 633 nm.
To get a sufficient resolution, the pixel size (stepsize of discretization for the Fourier transform) is 0.1 nm in both directions of the 2D apertures broadening upon a total range of 1µm × 1µm. The investigated rough apertures are typically composed of four periodically arranged slits, i.e., the corresponding grating period is 250 nm and the mean width of the rough slits is 125 nm (cf. Fig. 5). There is no restriction to choose other line-to-space ratios (cf. Fig. 7a). We choose several slits per apertures to improve the spatial averaging of the irradiance pattern associated with the randomized aperture.

Fraunhofer approximation.
Fourier optics is well known for being a cornerstone for the analysis of imaging, diffraction, coherence and propagation through random media [16,14]. The mathematical description of the propagation of an optical field from one location, for example a diffractive aperture, to another is one of its most essential tasks. In general the propagation behavior of electro-magnetic Under these ideal conditions a monochromatic plane wave with wavenumber k and orthogonal incidence on a diffractive aperture is considered. Its propagating radiation in a parallel observation plane far away from the aperture can be elegantly expressed as the Fourier transform of the field distribution in the source plane applying the Fraunhofer approximation. According to Huygens' principle each point of the wave front in the source plane can be considered as the source of superimposing spherical waves generating secondary wave fronts. That is, the field distribution U 2 (x, y) in an observation plane parallel to the source plane (cf. Fig. 6) can be calculated by means of the field distribution U 1 (ξ, η) of the source plane: here given in Rayleigh-Sommerfeld's diffraction notation with Σ is the domain of the source plane. It can be shown very easily (cf. [55,8]) that for propagation distances that are very long compared to the size of the aperture, i.e., for distances z with z max U 2 (x, y) from equation (8) can be written as Equation (11) represents the Fraunhofer diffraction expression. It consists of multiplying the source field U 1 by the characteristic function of the slit domain Σ, of applying the Fourier transform together with the frequency variable substitution and of multiplying by a complex exponential function. Note that the multiplicative complex exponentials in front of the double integral disappear, provided only the irradiance is of interest. This is usually the case when calculating the Fraunhofer pattern of an aperture. We apply this Fraunhofer expression with a wavelength λ of 13.5 nm and a distance z of 1 m to calculate the irradiance pattern of the rough apertures and the corresponding unperturbed non-rough aperture in this paper. Fig. 7 shows an example for an aperture composed of four periodically arranged slits with a width of 200 nm and a period of 250 nm, i.e., the bridge-to-slit ratio is 1:4. The irradiance pattern along the x-direction at the central position y = 0 is shown for the unperturbed aperture. Note that depending on the bridge-slit ratio equivalent to the line-space ratio of 2D binary grating, the irradiance becomes zero for special diffraction orders. These diffraction orders are not considered for the evaluations and only those with values significantly greater than zero are selected. In the example of Fig. 7 this is the case for the orders n j = ±5.

4.3.
Irradiance pattern relative to those of 'non-rough' structures. The key for analyzing the impact of roughness is to compare the mean light intensities calculated for ensembles of rough apertures to that of the unperturbed non-rough aperture, i.e., the corresponding aperture whose slits are composed of straight lines. Any systematic impact on the mean efficiencies should then be identified obviously. In previous investigations that made use of computationally expensive FEM simulations for a 2D computational domain [18], this strategy has already been applied successfully. Note that the applied roughness model of these studies was very simple, i.e., only the center locations and widths of neighboring lines were stochastically chosen and unfortunately no perturbations along the line direction could be treated in the former, profile-oriented FEM approach.
The relative deviations of light intensities in dependence on the diffraction orders for several ensembles of rough apertures are depicted in Fig. 8. The three examples in the upper part of Fig. 8 differ from each other by an increasing standard deviation σ of the autocorrelation function (cf. Eq. (7)) from 2 nm to 5 nm used to generate the rough edges of the apertures. The roughness exponent α was fixed to 1.0 and the correlation length ξ was set to 10 nm. Each ensemble has eleven samples of rough apertures whose relative deviations are depicted as circle symbols. Their mean values are marked by diamond symbols. The bridge to slit ratio was 1:1, i.e., both parameters have a nominal value of 125 nm. Only the orders with intensities significantly greater than zero are considered.
A systematic nonlinear decrease of the mean efficiencies for higher diffraction orders along with slightly increasing variances is observed. This is established for different degrees of roughness expressed by the different σ values for three ensembles. Only a few ten samples for each ensemble are necessary to reveal a stable bias of the mean efficiencies.
Quite similar outcomes are obtained for the three ensembles generated with different correlation lengths ξ = [10, 60, 120] nm as shown in the lower part of Fig. 8. Here α was changed to 0.5 and σ was fixed to 3 nm. One can recognize that the correlation length ξ effects the variances around the mean values of the light intensities. They rise slightly in particular for higher diffraction orders. Nevertheless, the essential impact on the biased mean values is stemming from the imposed σ values and appears to follow an exponential function depending on the diffraction order and σ, just like the above-mentioned findings of the 2D FEM simulations.
Consequently, we conclude that the revealed attenuation of the mean light intensities with higher diffraction orders can be modeled by Here f nj ,ref denotes the light intensities of the unperturbed aperture at diffraction order n j and f nj ,pert the corresponding mean values of the generated ensemble of rough apertures, d is the period of the bridge-slit structure and σ r depicts that value for the standard deviation which represents the best-fit results for the mean normalized deviations applying equation (13). In fact, the solid lines in Figs. 8 are depicting these best-fit results for σ r obtained by minimizing the Euclidean norm Note that, according to our hypothesis (13), the evaluated σ r values are good estimations of the imposed standard deviations characterizing the generated ensembles of rough apertures. Fig. 9 summarizes these findings for many different samples of rough apertures representing 2D-binary gratings. For three different values of σ = [1, 2, 3] nm and nine values of the correlation length along the edges ξ = [5, 10, 20, 40, 60, 80, 100, 120, 150] nm, the bias of the mean efficiencies expressed by σ r according to Eq. (13) has been evaluated. The roughness exponent α was set to 1.0 and 0.5. Only a slight increase within a range of maximal 5% is found for the determined σ r compared to the imposed standard deviation σ of the associated rough ensemble.   . Evaluated values for σ r for ensembles of rough apertures (2D-binary gratings) generated with standard deviations σ, linear correlation lengths ξ and roughness exponents α; equation (13) was applied to fit the mean biased efficiencies.

4.4.
Comparison to previous 2D FEM approach. Finally, we can compare the results directly with the normalized deviations provided by the above-mentioned rigorous 2D-FEM simulations over randomly perturbed cross-sections of an EUV photo mask composed of different layers [18]. The chosen example has been calculated with 24 lines per computational domain overspreading the profile cross-section with a randomized distribution of line and space widths of the neighboring lines (cf. Fig. 3 a in [18]). The results shown in Fig. 10 have been obtained by adapting the bridge and slit widths in the 2D Fourier transform method to the corresponding values, i.e., to 93.33 nm for the bridges and 186.67 nm for the slits and choosing α=0.5 and ξ=10 nm . The imposed σ value for the edge roughness was 5.6 nm for both cases. The mean normalized deviations and the evaluated σ r values for the exponentially damped efficiencies show an excellent agreement. Note that for the rigorous FEM approach 1000 samples per ensemble were applied, whereas for the Fourier approach only 32 samples were calculated to get the depicted stable bias with relatively small variances. Different values of α and ξ have no significant impact on the comparison, i.e., good agreements are revealed as well. 5. Maximum likelihood approach and uncertainty estimations. The estimation of desired geometry parameter values from measured scattering data y j requires the solution of the inverse problem of Eq. (5). The inverse problem is typically solved by finding the parameters that minimize a weighted least squares objective function. The weighting factors in the least squares function are usually determined as the reciprocal squared measurement uncertainties of the scattering efficiencies. In other words, good estimators of the uncertainties are needed. However, in a recent publication it was shown that a wrong choice of weight factors leads to a bias in the reconstructed parameter values. The application of maximum likelihood estimation can resolve this problem by determining the relevant uncertainties from measurement data while the inverse problem is solved [23].
In the maximum likelihood method parameters are estimated by maximizing the likelihood function that is attributed to the considered problem. For our chosen error model, the likelihood function reads a · f j (p, σ r , κ, h fc , h sc ) The corresponding maximum likelihood estimator (MLE) is given bŷ where typically the negative logarithm of the likelihood function is minimized.
The optimization scheme provides the parameter setθ that contains the desired geometry parameter p, the roughness strength σ r , the MLS stretching factor κ and the noise parameter (a, b). Note that for the simplified model the parameters σ r = 0, κ = 1 and h fc , h sc are constant and do not occur in Eq. (14). A significant advantage of the complete model is the possibility to determine a realistic value of the roughness strength σ r from scatterometry measurements. In [22,24,47,17] the application of the proposed model to real measurement data is described and comparisons with the results of atomic force microscopy from the same EUV photo mask are given.
According to the "Guide to the expression of uncertainty in measurement" (GUM and GUM-Supplement 1) [5,6] uncertainties can be expressed by the propagation of distributions using Monte Carlo methods (MC). For the calculation of uncertainties within the MC method we draw samplings from normal distributed light diffracted efficiencies. Here, we used a = 0.03, b = 0.003 % and a normal distributed stretching factor κ with mean µ κ = 1 and standard deviation σ κ = 10 −3 . Furthermore, normal distributions with means µ h fc = 1.2 nm and µ hsc = 12.9 nm as well as standard deviations σ h fc = 0.01 nm and σ hsc = 0.5 nm are chosen for the height of the first and second capping layer, respectively. Using the complete model we drew 50 samples and calculated the standard deviation for each parameter that provides the uncertainties.
In a second approach we estimated the covariance matrix for each set of reconstructed parametersθ by the inverse F −1 of the negative second derivative of the logarithm of the likelihood function [35] (Fisher information matrix), i.e.
The standard deviation σ for the ith parameter reads 6. Results. In this paragraph we demonstrate how uncertainty estimations are affected when different mathematical models are applied to the same set of data.
For the investigations, we reconstructed the critical dimensions: top-CD, bottom-CD and height. We simulated 50 measurement data sets with the complete model described in the previous section. Each data set contains M =69 data points that consist of the -10th to 12th diffraction orders for three slightly different wavelengths. The incident photon beam was adjusted to a fixed angle of 6 • . Notice, the number of data points influences directly the magnitude of the parameter variances [21]. We applied the complete and the simplified model to solve the inverse problem for each data set according to the MLE. The standard deviation was determined from the 50 parameter estimates (uncertainty estimation for the MC method). On the other hand, we calculated the Fisher information (FI) matrix for every data set and determined the standard deviations for every geometry parameter according to Eq. (16). In Figure 11 results for the complete model are shown. To illustrate the difference in the quantification of uncertainty the identical parameter estimates for 50 different reconstructions are shown in both panels. The associated error bars show the 95% confidence interval derived from MC method (right panel) and FI method (left panel). For the MC method, the uncertainty is obtained as variance of the respective reconstruction parameters. Hence, all error bars have the same size. Uncertainties are small and consistent with the known reference values (unperturbed input parameter) for both methods. The FI method is able to provide the parameter uncertainties from only one data set. This makes the method computationally cheaper.
In Figure 12 we show results for the case when the simplified model is used to reconstruct the critical dimensions (geometry parameters) form simulated scatterometry data produced with the complete model. This corresponds to a case, where during the evaluation of experimental data relevant aspects are not included in the mathematical model. The parameter values are varying stronger for the simplified model than for the complete model. Due to possible multiple maxima of the likelihood function, problems with the minimizer in the optimization scheme can arise (this can be seen clearly for the height, lower left panel). The FI method locally measures the curvature of the likelihood function and estimates from these values the standard deviation. Therefore, the parameter variations can be higher than those suggested by the uncertainties of the FI method. The MC method automatically takes parameter variations into account and in the result the uncertainties become large. For a quantitative comparison we calculated the average of FI uncertainties and compare the values with MC uncertainties for every model (see,  Table 1). Again, both methods provide relatively small uncertainties for the complete model. When we use the simplified model for the evaluation the estimated uncertainties for both models are increasing. The MC method yields, however, substantially larger uncertainty estimates than the FI. As a consequence, the deviations of the reconstructed critical dimensions from the known reference value are inconsistent with the uncertainties estimated by the FI approach.
In principle, to reach accurate results from the MC method often a large amount of samples ( 10 6 ) is necessary. For computationally expensive systems, like the inverse problem of scatterometry, the GUM-Supplement 1 suggests lower sampling rates (between 50 and 100 samplings) [6] as feasible approximations. In Fig. 13, the convergence rates of the complete and the simplified model are shown for different geometry parameters. The estimated variances are sufficiently accurate for our qualitative comparison of the methods that was discussed in this paper.
7. Discussion and conclusions. We have proposed a 2D-Fourier transform method as a simple and efficient algorithm to complement previous investigations on the systematic impact of line edge roughness on light diffraction patterns of periodic line-space structures. In particular, this has been done in the EUV regime, where the light diffraction pattern is characterized by many significant wave modes with higher diffraction orders. The irradiance of illuminated rough apertures far away from the source plane is numerically calculated very efficiently as the 2D-Fourier transform of the light distribution in the aperture plane and then compared to those of the unperturbed, 'non-rough' aperture. Rough random boundaries are generated for the aperture slits by PSD functions ensuring realistic line edge profiles comparable to those of 3D AFM measurements along the sidewalls of the absorber lines. For apertures of a typical size of 1 µm in both directions and a pixel size of 0.1 nm, there is already a strong spatial averaging in each single sample of a rough aperture.
Consequently, a few samples (e.g. ten) are sufficient to determine a stable estimate for the bias of the mean values at different diffraction orders. Compared to the diffraction pattern of the unperturbed aperture, the mean efficiencies of the rough apertures show a systematic exponential decrease for higher diffraction orders. The standard deviation of the edge fluctuations σ r and the diffraction order n j govern the revealed exponential damping factor. In fact, this dependence has been investigated for ensembles of rough apertures with different values for the imposed standard deviation σ, for the linear correlation length ξ, and for the roughness exponent α. Compared to the σ used for the generation of our simulated geometry, only a slight increase within a range of 5% has been found for the σ r determined a posteriori by our damping formula. Thus former results on the damping, obtained by rigorous FEM computations, have been confirmed on the base of a much more realistic model of line edge roughness. In other words, the implicated model extension for scatterometry by a damping factor for the calculated efficiencies allows to determine the standard deviation σ r of line edge roughness along with the critical dimensions of the periodic surface structure. Furthermore, we have reconstructed grating profile parameters and evaluated their uncertainties from simulated scatterometry diffraction patterns. We investigated two different models with two independent methods for the evaluation of the approximate uncertainties. For the complete model, the uncertainties are always smaller and both methods yield similar results (FI-method, MC-method). With the Fisher information matrix approach, the uncertainties can be determined from a single reconstructed parameter set, while for the Monte Carlo method a sufficiently large number of sample reconstructions (here: 50) have to be carried out.
The simplified and incomplete model used in the second part of our study includes no information about roughness and variations of the multilayer, whose effects were both present in the model with which the simulated test data were produced. As a result, much larger uncertainties were obtained with the Monte Carlo method than for the complete model. The Fisher information matrix method, in contrast, underestimates the uncertainties. The small uncertainty values were clearly inconsistent with the much larger variance of the reconstruction results that were obtained from using the simplified model.
Altogether, the Monte Carlo method and the Fisher information matrix method give comparable results as long as the model captures all essential features of the measurement process. If the model itself is incomplete, the Fisher information matrix method leads to inconsistent results. For these reasons, the Monte-Carlo method should be employed to obtain reliable uncertainty estimates in scatterometry. The Monte Carlo method, however, is computationally quite expensive. Smart sampling methods of surrogate modeling may allow for more efficient Monte Carlo simulations in future.