Efficient uncertainty quantification of large two-dimensional optical systems with a parallelized stochastic Galerkin method

It is well-known that geometrical variations due to manufacturing tolerances can degrade the performance of optical devices. In recent literature, polynomial chaos expansion (PCE) methods were proposed to model this statistical behavior. Nonetheless, traditional PCE solvers require a lot of memory and their computational complexity leads to prohibitively long simulation times, making these methods non-tractable for large optical systems. The uncertainty quantification (UQ) of various types of large, two-dimensional lens systems is presented in this paper, leveraging a novel parallelized Multilevel Fast Multipole Method (MLFMM) based Stochastic Galerkin Method (SGM). It is demonstrated that this technique can handle large optical structures in reasonable time, e.g., a stochastic lens system with more than 10 million unknowns was solved in less than an hour by using 3 compute nodes. The SGM, which is an intrusive PCE method, guarantees the accuracy of the method. By conjunction with MLFMM, usage of a preconditioner and by constructing and implementing a parallelized algorithm, a high efficiency is achieved. This is demonstrated with parallel scalability graphs. The novel approach is illustrated for different types of lens system and numerical results are validated against a collocation method, which relies on reusing a traditional deterministic solver. The last example concerns a Cassegrain system with five random variables, for which a speed-up of more than 12× compared to a collocation method is achieved. © 2015 Optical Society of America OCIS codes: (000.5490) Probability theory, stochastic processes, and statistics; (050.1755) Computational electromagnetic methods; (260.2110) Electromagnetic optics; (080.3630) Lenses. References and links 1. X. Chen, M. Mohamed, Z. Li, L. Shang, and A. R. Mickelson, “Process variation in silicon photonic devices,” Appl. Opt. 52, 7638–7647 (2013). 2. D. B. Xiu and G. E. Karniadakis, “The Wiener-Askey polynomial chaos for stochastic differential equations,” SIAM J. Sci. Comput. 24, 619–644 (2002). 3. D. Vande Ginste, D. De Zutter, D. Deschrijver, T. Dhaene, P. Manfredi, and F. Canavero, “Stochastic modelingbased variability analysis of on-chip interconnects,” IEEE Trans. Compon., Packag., Manuf. Technol. 2, 1182– 1192 (2012). 4. T. El-Moselhy and L. Daniel, “Variation-aware stochastic extraction with large parameter dimensionality: review and comparison of state of the art intrusive and non-intrusive technique,” in 2011 12th International Symposium on Quality Electronic Design (ISQED 2011), 14-16 March 2011, Santa Clara, CA, USA, (2011), pp. 508–517. #250848 Received 30 Sep 2015; revised 6 Nov 2015; accepted 10 Nov 2015; published 17 Nov 2015 (C) 2015 OSA 30 Nov 2015 | Vol. 23, No. 24 | DOI:10.1364/OE.23.030833 | OPTICS EXPRESS 30833 5. C. Chauvière, J. S. Hesthaven, and L. C. Wilcox, “Efficient computation of RCS from scatterers of uncertain shapes,” IEEE Trans. Antennas Propag. 55, 1437–1448 (2007). 6. Z. Zubac, D. De Zutter, and D. Vande Ginste, “Scattering from two-dimensional objects of varying shape combining the Method of Moments with the Stochastic Galerkin Method,” IEEE Trans. Antennas Propag. 62, 4852– 4856 (2014). 7. Z. Zubac, D. De Zutter, and D. Vande Ginste, “Scattering from two-dimensional objects of varying shape combining the Multilevel Fast Multipole Method (MLFMM) with the Stochastic Galerkin Method (SGM),” IEEE Antennas Wireless Propag. Lett. 13, 1275–1278 (2014). 8. T.-W. Weng, Z. Zhang, Z. Su, Y. Marzouk, A. Melloni, and L. Daniel, “Uncertainty quantification of silicon photonic devices with correlated and non-Gaussian random parameters,” Opt. Express 23, 4242–4254 (2015). 9. R. F. Harrington, Field Computation by Moment Methods (IEEE Press, 1993). 10. A. Keese and H. G. Matthies, “Hierarchical parallelisation for the solution of stochastic finite element equations,” Comput. Struct. 83, 1033–1047 (2005). 11. J. Fostier and F. Olyslager, “An asynchronous parallel MLFMA for scattering at multiple dielectric objects,” IEEE Trans. Antennas Propag. 56, 2346–2355 (2008). 12. A. J. Poggio and E. K. Miller, “ Integral equation solutions for three dimensional scattering problems,” in Computer techniques for electromagnetics, R. Mittra, ed. (Pergamon Press, 1973), pp. 159–264. 13. F. Olyslager, D. De Zutter, and K. Blomme, “Rigorous analysis of the propagation characteristics of general lossless and lossy multiconductor transmission lines in multilayered media,” IEEE Trans. Microwave Theory Tech. 41, 79–88 (1993). 14. A. Biondi, D. Vande Ginste, D. De Zutter, P. Manfredi, and F. Canavero, “Variability analysis of interconnects terminated by general nonlinear loads,” IEEE Trans. Compon., Packag., Manuf. Technol. 3, 1244–1251 (2013). 15. P. Manfredi, D. Vande Ginste, D. De Zutter, and F. Canavero, “Uncertainty assessment of lossy and dispersive lines in spice-type environments,” IEEE Trans. Compon., Packag., Manuf. Technol. 3, 1252–1258 (2013). 16. W. C. Chew, J. M. Jin, E. Michielssen, and J. Song, Fast and Efficient Algorithms in Computational Electromagnetics (Artech House, 2001). 17. D. Vande Ginste, H. Rogier, F. Olyslager, and D. De Zutter, “A fast multipole method for layered media based on the application of perfectly matched layers–the 2-D case,” IEEE Trans. Antennas Propag. 52, 2631–2640 (2004). 18. D. Vande Ginste, E. Michielssen, F. Olyslager, and D. De Zutter, “An efficient perfectly matched layer based multilevel fast multipole algorithm for large planar microwave structures,” IEEE Trans. Antennas Propag. 54, 1538–1548 (2006). 19. D. Pissoort, E. Michielssen, D. Vande Ginste, and F. Olyslager, “Fast-multipole analysis of electromagnetic scattering by photonic crystal slabs,” J. Lightwave Technol. 25, 2847–2863 (2007). 20. B. Michiels, J. Fostier, I. Bogaert, and D. De Zutter, “Full-Wave Simulations of Electromagnetic Scattering Problems With Billions of Unknowns,” IEEE Trans. Antennas Propag. 63, 796–799 (2015). 21. R. W. Freund, “A transpose-free quasi-minimal residual algorithm for non-Hermitian linear systems,” SIAM J. Sci. Comput. 14, 470–482 (1993). 22. R. D. da Cunha and T. Hopkins, PIM 2.2 The Parallel Iterative Methods Package for Systems of Linear Equations User’s Guide (Fortran 77 version), UKC, University of Kent, Canterbury, UK. 23. Ö. Ergül and L. Gürel, “Efficient parallelization of the multilevel fast multipole algorithm for the solution of large-scale scattering problems,” IEEE Trans. Antennas Propag. 56, 2335–2345 (2008). 24. J. Fostier and F. Olyslager, “A provably scalable parallel multilevel fast multipole algorithm,” Electron. Lett. 44, 1111–1113 (2008). 25. D. Pissoort, E. Michielssen, D. Vande Ginste, and F. Olyslager, “Fast-multipole analysis of electromagnetic scattering by photonic crystal slabs,” J. Lightwave Technol. 25, 2847–2863 (2007). 26. M. F. Pellissetti and R. G. Ghanem, “Iterative solution of systems of linear equations arising in the context of stochastic finite elements,” Adv. Eng. Softw. 31, 607–616 (2000). 27. I. Ocket, B. Nauwelaers, J. Fostier, L. Meert, F. Olyslager, G. Koers, J. Stiens, R. Vounckx, and I. Jager, “Characterization of speckle/despeckling in active millimeter wave imaging systems using a first order 1.5D model,” in “Millimeter-wave and Terahertz Photonics”, D. Jäger and A. Stöhr, eds., Proc. SPIE 6194, 19409–19420 (2006). 28. J. Fostier and F. Olyslager, “An open-source implementation for full-wave 2D scattering by million-wavelengthsize objects,” IEEE Antennas Propag. Mag. 52, 23–34 (2010).


Introduction
Variability analysis and uncertainty quantification (UQ) have become a major concern during the design step of optical systems and components as manufacturing tolerances and process variations can have a dramatic influence on the performance [1].In particular, even small variations of geometrical dimensions or material properties affect the electromagnetic behavior of the structures under design.To model these variations, one may simply combine statistical analysis with traditional, deterministic, full-wave solvers.The widely used Monte Carlo (MC) technique is an example of such an approach, repeatedly solving a large number of deterministic problems (samples), leading to an easy to implement and robust analysis.Unfortunately, MC has a slow convergence rate, yielding a high computational cost.For large full-wave problems, as typically encountered during the UQ of optical lens systems, this method rapidly becomes non-tractable.To more efficiently assess variability, methods based on Polynomial Chaos Expansions (PCE) have been devised [2].The basic idea of PCE methods is that any variation of a set of (geometric or material) parameters can be represented as a linear combination of polynomials that depend on these input parameters.Subsequently, the pertinent system equations, e.g.Maxwell's equations, are solved, taking these polynomial variations into account.This leads to a stochastic description of the variability of the desired output parameters, which can for example be electric field strengths.Traditionally, PCE-based methods can be subdivided into two classes.First, the class of non-intrusive PCE methods, such as the Stochastic Collocation Method (SCM), relies, like the MC method, on reusing the deterministic code to solve the system equations.In contrast to MC, SCM chooses the samples in a more clever way, depending on the Probability Density Function (PDF) of the input parameters.Second, the class of intrusive PCE methods, such as the Stochastic Galerkin Method (SGM), requires a thorough modification of the solver that tackles the system equations.In literature, it is argued that the SGM often leads to better accuracy than the SCM.In the domain of electrical engineering, intrusive methods are already successfully applied in the variability analysis of on-chip interconnects [3,4] and scattering problems [5][6][7].Recently, in the domain of photonics, the non-intrusive SCM method has been applied for the UQ of a silicon-on-insulator based directional coupler [8].In this paper, the focus is on the UQ of large optical lens systems.Thereto, an intrusive full-wave SGM scheme is proposed.The full-wave character of the problem is described by means of a set of boundary integral equations (BIE) that are solved by means of the Method of Moments (MoM) [9].To expedite the computations, the Multilevel Fast Multipole Method (MLFMM) has been combined with a SGM-MoM solver.In [7] it was shown that such an approach leads to the traditional O(N log N) computational complexity, N being the number of unknowns, but scaled with a factor that depends on the number of polynomials.Whereas in [7] relatively small scattering problems were handled, here, we aim to model large optical setups.Therefore, we present the parallelization of the full-wave intrusive SGM-MLFMM solver and we propose an effective preconditioning scheme to further accelerate the computations.Parallelization of SGM applied to elliptic partial differential equations is reported in [10], but as explained further it is still prohibitively slow to deal with the optical systems presented in this paper.In [11], an asynchronous parallelization of MLFMM for deterministic structures consisting of multiple dielectric objects is described.To our best knowledge, however, present paper is the first that proposes the parallelization of a full-wave SGM-MLFMM solver for Maxwell's equations, capable of handling both dielectric and perfectly electric conducting (PEC) objects.This paper is organized as follows.In Section 2, we present the theoretical framework of the SGM-MLFMM paradigm for solving BIEs.Section 3 deals with the implementation of the algorithm with a focus on its parallelization and on the design of a preconditioner.In Section 4, we report simulation results for several large optical structures, such as lens systems.We validate our method by comparing the results with a traditional SCM.Finally, in Section 5, we give some concluding remarks.

Theoretical framework
We start with a very general description of the electromagnetic problem geometry under consideration, which allows deriving a rigorous theoretical framework.Consider two-dimensional (2D) dielectric objects with a refractive index n i , or equivalently, by means of their permittivity ε i and permeability µ i , and perfectly electrically conducting (PEC) objects, residing in free space (Fig. 1).The geometry of these objects is stochastically described by means of a set of M random variables (RV) that are collected in the random vector ξ ξ ξ = [ξ 1 ξ 2 ... ξ M ] with domain Ω.One object can depend on zero, one or more RVs.The objects are illuminated by an incoming transverse magnetic (TM) electromagnetic wave E i z .In the sequel, an exp( jωt) time dependence, with ω the angular frequency, is assumed and suppressed throughout the text.Starting from Maxwell's equations, the scattering problem is cast as a boundary integral equation (BIE), which for dielectric objects was first written down in [12].For conciseness, we limit the description to the case of a single object with contour C, for which the pertinent BIEs are given by [13]:

Perfect electrical conductor (PEC)
where G o represents the Green's function of the free space background medium with wavenumber k 0 given by: and G represents a similar Green's function for the i-th medium with wavenumber k i = k 0 n i , corresponding to the material the object is made of.The unknowns are the z-oriented electric field E z and the magnetic field H t tangential to the contour C. C + and C − represents the path of integration along the contour C when integrating just outside and just inside the object respectively.To solve the BIE, the contour is divided into a number of segments N seg .The unknown magnetic field H t is expanded into pulse basis functions b i (ρ ρ ρ ) defined over these segments, while the unknown electric field E z is expanded into triangular basis functions t i (ρ ρ ρ ) as follows: This discretization leads to a number of N = 2N seg scalar unknown expansion coefficients H t,i and E z,i .To create a traditional MoM linear system of equations, triangular testing functions t i (ρ ρ ρ) for H t and pulse testing functions b i (ρ ρ ρ) for E z are used.The resulting linear system of equations is written as: or and: or or where the integrations are done over line segments l i and l j .
When the geometry of the objects is stochastic, meaning the location of several segments of the contours may depend on one or more of the RVs, the resulting MoM system becomes stochastic and is in general written as: where the traditional N × N interaction matrix Z Z Z, the known right-hand side (RHS) N-vector V V V , and the vector I I I collecting the N unknown expansion coefficients, all become dependent on the random vector ξ ξ ξ .The reader is encouraged to consult [3,14,15] and the references therein to gain familiarity with the SGM in the domain of electrical engineering.Here, similarly as in [6], where the SGM was combined with the MoM for small scattering problems, we start from expansions of the stochastic quantities introduced in Eq. ( 13) into polynomial basis functions.These PCEs are given by: where φ k (ξ ξ ξ ) are multivariate polynomials, i.e. products of M univariate polynomials for the individual RVs ξ i chosen according to the Wiener-Askey scheme [2] and such that they are orthonormal with respect to the probability density function (PDF) W (ξ ξ ξ ) with domain Ω of the random vector ξ ξ ξ , as follows: where δ jk is the Kronecker δ and with the inner product defined as: The M univariate polynomials are multiplied following the total degree rule, i.e., so that sum of the orders is at most P. Given this total order P, the number of polynomials K + 1, as used in the PCEs Eqs. ( 14), ( 15) and ( 16), is determined as: The expansion coefficients in Eqs. ( 14) and ( 15) are found via projection: where X X X represents Z Z Z and V V V .Inserting PCE Eqs. ( 14), ( 15) and ( 16) into Eq.( 13) yields: Galerkin projection of both sides of Eq. ( 21) onto the orthogonal polynomial basis functions leads to: which, after using the orthonormal property given by Eq. ( 17), again results in an equivalent system of equations, in which the stochastic dependence is eliminated: and where the summation is taken for all non-zero values of γ klm .The linear system given by Eq. (23) shows that the traditional O(N 2 ) complexity for a standard deterministic MoM is now scaled with factor that corresponds to the number of nonzero values of γ klm and follows an O(K 1.5 ) scaling law of [7].Moreover, the total number of unknowns in ( 23) is actually N stoc = (K + 1)N.In conclusion, although SGM-MoM is accurate for small scattering problems, the approach rapidly becomes non-tractable for the variability analysis of a large electromagnetic structures.Therefore, in the next section, MLFMM will be introduced.This then further allows parallelization, which is necessary to expedite the computations.
3. Implementation of a parallel SGM-MLFMM solver with a preconditioner

SGM-MLFMM
A vast amount of literature is available describing the traditional, deterministic MLFMM scheme [16].This scheme has, e.g., also been applied to scattering from and radiation by intricate dielectric objects such as printed circuit board antennas [17,18] and to photonic crystal waveguides [19].The reader is encouraged to consult these references to gain familiarity with the MLFMM scheme.Here, for conciseness, we only repeat the gist of it and immediately introduce a stochastic variant.This algorithm allows improving the computational complexity of the matrix-vector product (MVP) during the iterative solution of Eq. ( 23) and avoiding the storage of The boundaries of all dielectric objects are discretized into N finite segments, corresponding to the pertinent BIE-MoM approach.Next, all these segments are recursively subdivided in an L-level MLFMM quad-tree.At each level l = 1, 2, . . ., L, the boxes are circumscribed by a hypothetical circle of radius R l .Any interaction between two segments in the BIE-MoM scheme, corresponding to one element of the matrix Z Z Z(ξ ξ ξ ), is readily rewritten as the interaction between many elementary line sources s i (ξ ξ ξ s ) with strength J s i (ξ ξ ξ ).In Fig. 2, one such single line source, located at ρ ρ ρ (ξ ξ ξ s ) and residing in source box B , and a single observer, located at ρ ρ ρ(ξ ξ ξ o ) and residing in source box B, is shown.Note that the locations ρ ρ ρ and ρ ρ ρ depend on mutually independent subsets ξ ξ ξ s and ξ ξ ξ o of the total set of RVs ξ ξ ξ , as in MLFMM algorithms the source and observation boxes must be separated sufficiently far from each other.The centers of the source and observation box are located at the deterministic locations ρ ρ ρ c s and ρ ρ ρ c o respectively.We invoke the well-known plane-wave decomposition of the pertinent Green's function G(ρ ρ ρ(ξ ξ ξ o ), ρ ρ ρ (ξ ξ ξ s )) of the background medium to rewrite the MVP as follows [16].First, during the so-called aggregation step, the radiation pattern of box B is sampled into 2Q + 1 outgoing plane waves (OPWs): where k(ϕ q ) = k(cos ϕ q x + sin ϕ q ŷ).The samples are taken at angles ϕ q = 2πq /(2Q + 1), q = −Q, . . ., Q, and the number of samples is typically chosen such that the radiation pattern is reconstructed with a desired number of digits of accuracy, denoted as d 0 .In a well-constructed MLFMM tree, any accuracy up to machine precision can be reached [16], provided Q is chosen to be: The stochastic nature of this aggregation step is indicated by the ξ ξ ξ dependence.In the proposed SGM-MLFMM approach, PCE expansion are again invoked: such that via projection, each PCE-coefficient of the OPWs is computed as: where A A A k are PCE coefficients of the aggregation matrix, defined as: and I I I l are PCE coefficients of the current density which contain PCE coefficients of the elementary current strengths J s i (ξ ξ ξ ).
Second, during the translation step, a deterministic and diagonal translation matrix T converts the PCE-coefficients of the OPWs about the center of box B to incoming plane waves (IPWs) about the center of box B, as follows: where the numbers T qq (k, |ρ cc so |, ϕ cc so ) represent the 2Q + 1 (non-zero) diagonal elements of the translation matrix, given by: s | the distance between the centers of the boxes and ϕ cc so the angle between vector ρ ρ ρ c o − ρ ρ ρ c s and the x-axis.
Third, during the disaggregation step, the IPWs are evaluated at the observation points.In case of a single, elementary line source at ρ ρ ρ (ξ ξ ξ s ), the field at a single observer at ρ ρ ρ(ξ ξ ξ o ) in box B is nothing else than the pertinent Green's function, now expanded using the plane-wave formalism, as follows: To obtain an efficient multilevel scheme, the interaction between boxes occurs at well-chosen levels in the MLFMM tree.Up-and downsampling of OPWs and IPWs happens via Fast Fourier Transforms (FFTs).
In the deterministic MLFMM simulation, the total cost of all steps is O(N log N).The aggregation step and disaggregation steps are calculated with cost O(N).In the SGM-MLFMM approach, as the aggregation and disaggregation steps now also depend on subsets of ξ ξ ξ , the computational cost increases up to O(KN) [7].Similar observations are valid for the memory complexity.

Parallelization
An efficient implementation of the deterministic MLFMM has allowed handling problems with up to one million of unknowns on a single workstation.Parallel implementations on several nodes has led to tackling deterministic problems with billions of unknowns [20].In our stochastic case, however, the total number of unknowns is not only determined by the spatial discretization into N segments, but also by the number of polynomials, i.e.K + 1.Therefore, the UQ of medium-and large-scale problems, such as the variability analysis of the optical lens systems presented in this paper, should be performed via a parallel algorithm, leveraging the computational and memory resources of every available computing node.
Parallel solvers require a parallel iterative method (e.g.TFQMR [21]) and a parallel algorithm to compute the MVP.For the former, libraries like PIM [22] can be readily applied.From SGM Eq. ( 12), a straightforward way to parallelize the MVP may be revealed, namely the distribution of the matrix-vector products Z Z Z k I I I l among several processes.Each process then computes only a subset of all required matrix-vector products Z Z Z k I I I l .This idea is applied in [10].Despite its simplicity, this approach suffers from the fact that a given product Z Z Z k I I I l in itself is not parallelized.Each such product can be handled by a single process, but as the dimensions of each matrix Z Z Z k correspond to the number of spatial discretization elements N, the simulation of large structures with high N is still prohibitively expensive.As opposed to this approach, we proprose a parallelization scheme where each product Z Z Z k I I I l in itself is parallelized among all of the processes.This scheme has the advantage that the geometrical discretization elements are distributed among the processes.We provide an overview of the underlying concepts and refer to the extensive literature that exists for parallel deterministic MLFMM algorithms where appropriate.The MLFMM tree is created for the complete electromagnetic structure under consideration.The boxes at a prespecified level-of-partitioning (LoP) in this tree are partitioned and assigned to different processes.If a certain box is a assigned to a certain process, the entire subtree of that box and all corresponding geometrical segments and unknown stochastic expansion coefficients (K + 1 coefficients per segment) are also assigned to that process.The partitioning is done in such way that approximately the same number of expansion coefficients is assigned to each process.To ensure a good spatial locality between boxes assigned to a given process, boxes are ordered according to a Hilbert space filling curve (SFC) prior to partitioning.Smaller dielectric regions will be partitioned among fewer processes whereas large dielectric regions (e.g. the background medium) will be partitioned among many processes.For a given LoP and all levels below, it holds that a box is assigned to only one single process.All K + 1 radiation patterns (both outgoing and incoming) in such a box are also assigned to that process and the process is responsible for their computation.For levels higher than the LoP, it becomes increasingly difficult to achieve a good load balance because the number of boxes decreases for higher levels.Nevertheless, the amount of required computations on those levels does not decrease because of the size of the radiation patterns increases with higher levels.To deal with this, we propose to extend the hierarchical partitioning strategy [23,24] developed for deterministic MLFMM to the stochastic case.As illustrated in Fig. 3, rather than assigning a box and its radiation patterns as a whole to a single process, boxes are shared between an increasing number of processes as the number of radiation pattern samples increases.Specifically, from the LoP onwards, for every next level, each of the K + 1 radiation pattern samples in a box are partitioned in an increasing number of 2, 4, 8, 16, ... etc. partitions.A process then holds only a single partition of the radiation pattern samples in its memory.It has been shown that, in the deterministic case, such a hierarchical partitioning scheme is able to effectively balance the load among processes [24].Processes are responsible for the computation of all radiation process 0 process 1 process 2 process 3 3. Organization of MLFMM boxes and partitioning scheme for an arbitrary structure.On the lowest level each box is handled by one process; on higher levels, K radiation samples are shared among all processes.OPW B i k is k-th PCE coefficient of the box on the level i.
pattern samples and expansion coefficients that are locally held in memory using the execution of the MLFMM algorithm.Some of these computations rely on data that is not locally stored and thus needs to be communicated through the interconnection network.As we are dealing with multiple dielectric objects, we opted for the asynchronous approach described in [11].At a given point in time, different processes can perform different kind of tasks: certain processes might be communicating while others are performing (different kinds of) calculations.All computations that need to be performed by a process are partitioned into work packages.These work packages are sorted in a priority queue.Some work packages depend on data to be received from other processes and can only be scheduled once these data have actually been received.Similarly, certain work packages result in (intermediate) data that needs to be sent to other processes.The priority queue ensures that work packages are handled in such order that the overall idle time of processes is minimized.Organization of these working packages and priorities of the operations are well described in [11].

Preconditioner
For any iterative process, the number of iterations needed to obtain a result with a predefined accuracy is an important factor.This number depends on the condition number of the system matrix [25] .By rewriting Eq. ( 23) in matrix form, it can be seen that the equivalent system matrix of the SGM possesses a block-structure [4].The zeroth-order PCE coefficient which corresponds to the mean of the Z Z Z(ξ ξ ξ ) matrix, i.e.Z Z Z 0 , is located on diagonal blocks.Higher PCE coefficients of the matrix only marginally contribute to these diagonal blocks.For small variations, which is the case in most practical applications, the equivalent SGM system matrix is thus block-diagonal dominant and a block-Jacobi preconditioner could be used [26].However, given our MLFMM-approach, the submatrix Z Z Z 0 is never stored.Moreover, calculation of its inverse for large structures would rapidly become prohibitively expensive.Therefore, a different type of preconditioner is proposed here.Within Z Z Z 0 , there are so-called near interactions that are calculated with a classical MoM approach.These interactions are found on and around the main diagonal of the Z Z Z 0 matrix.Our preconditioner is based on these "near" blocks.The size of these blocks, i.e. the number of interactions which are calculated in this classical MoM fashion, determines the efficiency of the preconditioner.On the one hand, if the blocks are large, then preconditioning becomes stronger, but the memory, setup time and time required for one MVP also increases.On the other hand, if these blocks are chosen small, then the preconditioner may become rather useless, especially for large electromagnetic structures.

Numerical results
All simulations were performed on a system supporting the Message Passing Interface (MPI), which was used to implement the parallel SGM-MLFMM solver.As indicated further, some simulations were performed on one node with multiple cores, while others were performed on several nodes connected by an InfiniBand network.The linear system of equations is solved with Parallel Iterative Methods (PIM) using TFQMR.

Validation example: lens system with translational variation
As a validation example, we consider a lens system of two circular lenses with a permittivity ε r = 4 and a PEC aperture, as shown in Fig. 4. The size of the lenses is 2000λ and 5000λ , respectively, and they are separated by a distance of 30000λ .The size of the gap in the PEC shield is 2500λ and the shield is placed at a distance of 20000λ from the leftmost lens.The structure is illuminated with a Gaussian beam of width 500λ impinging upon the center of the leftmost lens along the optical axis.A deterministic simulation of this system was presented in [27], where good accuracy between Gaussian-beam, 1.5D and 2D full-wave methods was reported.Moreover, the 2D full-wave method has been validated up to 100 million of unknowns for both dielectric and PEC cylinders for which analytical solutions exist [28].
30000λ The structure and the total field calculated in a region with a width of 40000λ and a height of 8000λ is presented in Fig. 5.One pixel corresponds to a cell of size 10λ × 10λ .A block-Jacobi preconditioner was used on blocks with dimension 16λ × 16λ .The iterative precision was set to 10 −3 .For this deterministic simulation, the number of unknowns is 705008 and the iterative precision is reached after N iter = 369 iterations.To induce variations, the y-coordinates of the positions of the lenses are now uniformly varied between −λ /20 and λ /20 of their nominal values.The uniform distribution is chosen as all realizations are equally probable and it also leads to the largest variation in the field values.Stochastic simulations are performed by two methods: (i) a robust and easy to implement SCM and (ii) the novel SGM-MLFMM, in order to validate the accuracy of the latter.As our results are validated with a non-intrusive SCM that merely reuses results from the deterministic 2D algorithm [27], the output variability of our novel stochastic SGM-MLFMM is expected to match well with Gaussian beam based methods.To present the influence of the variations, the average field density and its standard deviations around the two focal points (indicated with F 1 and F 2 in Fig. 4), i.e. in an area with a width of 2000λ and a height of 1000λ , are presented in Fig. 6.The influence of the geometrical variations are clearly appreciated from this figure.It is also observed that left from the PEC slit the variations are concentrated mainly around the focal point, i.e.where the field value reaches its maximum.At the right side of the PEC slit, the variations are more evenly distributed in the neighbourhood of the focal point.To give an indication of the variation, we mention that at the point where the standard deviation is maximal and equal 0.0828 V/m, the average field density is 1.5287 V/m, i.e. the output variation of the field is about 0.0828/1.5287≈ 5.4%.This location is near the right focal point, indicated with the red arrow in Fig. 4. To validate our method, its accuracy is compared to SCM, as presented in Table 1.The accuracy of both solvers is set to 10 −3 .Both methods converge to the same result for different polynomial orders and even for P = 1 we get accurate results.For all polynomial orders, the number of iterations is about the same as in the deterministic case.This is important, since one MVP is computationally expensive.For example, for P = 3, the total number of unknowns is N stoc = 7 050 080 and the number of iterations is 372.This means that our preconditioner is efficient for this stochastically varying structure.Note that for the SCM method, the number of iterations is equal to the average number of iterations for all considered realizations of the RVs.Regarding the computational cost, SGM shows a clear advantage over SCM.To make a fair comparison, both SGM and SCM simulations were performed on the same machine.In the case of SCM, a wrapper function sequentially executes the parallelized deterministic code for different realizations of a random vector.Using a single node containing two quad-core CPUs (8 cores in total) running at 2 GHz, where simulations are performed with 8 parallel processes, and for P = 2, the SGM-MLFMM computation takes about 5h, while the SCM simulation for all realizations takes about 8 h.For larger P, we need more nodes to get results in a reasonable time.To give an indication, running the SGM-MLFMM for this lens setup with P = 4 and N stoc = 10 575 120, takes about 1h, when using 3 nodes and 16 parallel processes per node.This simulation was performed on the Flemish Supercomputer Center (VSC) infrastructure where one node has a dual-socket octo-core CPU (16 CPU cores in total) running at 2.6 GHz.On the same machine we also performed a benchmark test for the MVPs for a varying number of processes and we calculated the speedup and parallel efficiency.The speedup is defined as the ratio of the runtime on a single process T 1 and the runtime T n using p processes: In the ideal case the speedup factor is equal to the number of processes that is used.The parallel efficiency η is the ratio of the speedup and the theoretical maximum speedup: The graphs for the scalability and speedup are presented in Fig. 7.All simulations were performed for the P = 2 case.It is visible that our algorithm scales very well with an increasing number of processes.The efficiency decreases due to the increasing amount of data that needs to be communicated.

Application example 1: Lens system with rotational variation
As a second example, we consider the same lens system as in Section 4.1, but now, the variations of the lenses' positions are induced by rotation in the (x, y)-plane of the lenses around their centers.The lenses' rotations are described by a uniform stochastic process with a maximum deviation of 1/60 of a degree.The average field pattern looks like in Fig. 6, but the field pattern for standard deviation is different and is shown in Fig. 8.We can observe that the standard deviation is now slightly smaller.However, the relative variation is higher as the maximal variations occurs at points where the field strength is low.The of iterations required to obtain an iterative precision 10 −3 is higher in this case, e.g. for P = 2, 1018 iterations are required.This is a consequence of the larger variations on of the field and the loss of symmetry with respect to the optical axis of the setup.Additionally, in Table 2, we provide results for the point close to second focal point at which the standard deviation is the highest.In this case, we can see that P = 3 is sufficient to obtain convergence and with P = 2 we get acceptable results.Due to the phase effect, P = 1 provides a slightly different result than for that obtained for the higher orders.Simulation with P = 3 and N stoc = 7 050 080 unknowns takes about 1h on 3 nodes with 16 parallel processes.
A typical quantity of interest describing such lens system is the local intensity, defined as: where c is speed of light in vacuum and n is the refractive index.To get a better insight in the variation of the local intensity when lenses are prone to the PDFs of the local intensity for the second focal point and the point with maximum variance are presented in Fig. 9.It is clearly visible that the local intensity in both observation points is considerably affected by the variation of the lenses.

Application example 2: Cassegrain antenna system with induced variations
As a final example, we consider a Cassegrain antenna system, with a 35-meter parabolic main reflector and a 6-meter hyperbolic sub-reflector.Additionally, system consists of one flat mirror one mirror.The structure is based on a similar problem described in [28].operating frequency is chosen to be GHz, so we perform one harmonic simulation at that frequency.The structure is excited with a Gaussian beam with a waist m impinging upon a coated lens with a 4 meter diameter.This is a lens with a relative permittivity of ε r = 4 and a λ /4 coating with a relative permittivity ε r = 2, such that reflections are largely eliminated.The uncertainty is induced by varying the relative positions of every part (lens, mirrors, one reflector, one sub-reflector) of the system.In particular, the y-coordinate of every part is again a random variable such that the relative variation w.r.t.its nominal center is uniformly distributed within the interval [−λ /20, λ /20[.In this way, we introduce 5 RVs that describe the stochastic nature of the geometry.In Fig. 10, the average field density of the Cassegrain antenna system is shown.The standard deviation of the total field for this structure is presented in Fig. 11.We can see that the highest variation of the field is induced around the focal points, but due to the phase effect variation, a standing wave pattern is also visible.The simulation is performed by using 4 nodes with 16 CPU cores per node (64 parallel processes in total).The block size of the preconditioner is chosen to be 16λ × 16λ .For P = 2, K + 1 = 21 and N stoc = 5 073 579, the system was solved after 2630 iterations and 2h 44 min.For P = 3, K + 1 = 56 and N stoc = 13 529 544, the system was solved after 2767 iterations and 7h 47 min.The setup time and time are negligible compared to the time.for a different block-Jacobi size of 32λ × 32λ , we get a better run time.For example, then, for P = 3, the system is solved in 3h and 50 min after 783 iteration with a negligible increase of the setup time.This shows that one should carefully select the preconditioner size.We also remark that the time needed to solve this problem using the same computational resources with the SCM and with P = 2 is about 30h, which again demonstrates the necessity of the advocated parallel SGM-MLFMM solver.

Conclusions
In this paper, we have described the parallelization of the MLFMM-based SGM solver.The paper provides a theoretical background together with the complexity issues of the algorithm.Since this complexity grows fast with the number of spatial and stochastic unknowns, the solver leverages parallelization which allows simulation of large optical structures.Moreover, to decrease the number of iterations in the iterative solver, a block-Jacobi preconditioner is proposed.It was shown that a carefully chosen size of the preconditioner can reduce the computational time by a factor of two.Compared to a more traditional SCM, the selected examples clearly demonstrate the effectiveness of our novel algorithm, with speed-up factors of more than 12×, still maintaining excellent accuracy.Moreover, they show the need for the advocated intrusive stochastic modeling algorithm when dealing with large-scale optical problems.

Fig. 1 .
Fig.1.Canonical problem geometry.Objects are described by their electrical properties µ i , ε i and their geometries are defined with contours C i .Stochastic variations of the geometry are introduced and indicated by means of a set of random variables ξ i , i = 1, ..., M.

Fig. 2 .
Fig. 2. Typical MLFMM constellation of a source box B and an observation box B.

Fig. 5 .
Fig. 5. Field density |E z |(V/m) for the deterministic simulation for the configuration of Fig. 4.
Fig. 6.Mean and standard deviation of the field density |E z |(V/m) around the focal points.

Fig. 7 .
Fig. 7. Speedup and parallel efficiency for a varying number of parallel processes.

2 Fig. 9 .
Fig. 9. PDF of the local intensity I (mW/m 2 ) for two points behind the rightmost lens.

Fig. 10 .
Fig. 10.The average total field density |E z |(V/m) the Cassegrain antenna system illuminated with a Gaussian beam incident from the bottom onto the coated lens.The results are obtained with a polynomial order P = 3.

Fig. 11 .Fig. 12 .
Fig. 11.The standard deviation of the total field density |E z |(V/m) of the Cassegrain antenna system illuminated with a Gaussian beam incident from the bottom onto the coated lens.The results are obtained with polynomial order P = 3.

Table 1 .
Mean and standard deviation for the point indicated on Fig. 4, i.e. the point close to the second focal point with maximum variance.

Table 2 .
Mean and standard deviation for the point close to the second focal point with maximum variance.