An Efficient Framework for Analysis of Wire-Grid Shielding Structures over a Broad Frequency Range

A computationally efficient MoM-based framework for broadband electromagnetic simulation of wire-grid shielding structures is presented in the paper. Broadband capability of the approach is attained through supporting MoM by an adaptive frequency sweep combined with rational interpolation of the observable implemented via StöerBulirsch algorithm. The performance increase is gained by employing CUDA-enabled CPU+GPU co-processing. For large-size problems exceeding the amount of memory available on the GPU device, a hybrid out-of-GPU memory LU decomposition algorithm is employed. The demonstration examples are provided to illustrate the the accuracy and high efficiency of the approach.


Introduction
The prediction of the performance of metallic shielding structures is an important topic in electromagnetic (EM) engineering especially in electromagnetic compatibility.The general problem of the EM field penetration through threedimensional shields can only be solved by using numerical techniques, and the availability of the relevant computationally efficient tools is of continuing interest.In this paper, the attention is focused on shielding structures in the form of wire grids (meshes).The structures of this kind may be found in a wide variety of practical situations ranging from simple protective screens [1], [2] through convenient models of equipment cabinets [3] up to complex doublelayer grid-like shields designed to reduce aggressive highintensity (EM) effects caused by lightning strikes [4], [5].The most suitable approach to the analysis of spatial gridlike structures composed of arbitrarily arranged conductors, is the full-wave Method-of-Moments (MoM) formulated in the frequency domain (FD).When a wide-band response of a structure is required, the MoM solution must be constructed repeatedly at many frequencies, and this can take unacceptably long time.Thus, techniques to minimize the computation time and reduce the overall computational cost of broadband simulation are of continuing interest.Available approaches may be broadly classified as model-driven methods and data-driven methods, respectively [6].Modeldriven methods are mostly aimed towards reducing the system model complexity, while preserving (to the possible extent) their input-output behavior.Representative examples of the approach are Model-Order Reduction (MOR) techniques relying on Krylov subspace methods usually combined with Lanczos or Arnoldi iteration [7], [8], and Asymptotic Waveform Evaluation (AWE) together with its extension to multipoint expansion technique referred to as Complex Frequency Hopping (CFH) [9][10][11].In IE-MoM context, both MOR and AWE require MoM-generated system matrix to construct the reduced model or Padé-type rational approximation to the solution of the integral equation, respectively.Incorporation of the relevant procedures for computing the Krylov vectors or successive derivatives of the system matrix, respectively, into the existing MoM-based codes is not easy task, and usually requires a large amount of programming work.Therefore, alternate techniques that can be implemented without much effort are highly desirable.The expected features can be assigned to data-driven methods utilizing only the input-output data to construct an approximant to the system response.The most representative examples of the data-driven approach are the Cauchy method [12], [13] and the so-called Vector Fitting (VF) [14].The principal advantage of the approach is that existing full-wave electromagnetic solvers can be used without modifications.
In this paper, the data-driven approach is employed to create a framework for fast full-wave wideband analysis of wire-grid shielding structures.The framework is built around a derivative-free version of the rational-interpolation Cauchy method combined with a simple binary search (bisection) algorithm for adaptive frequency sampling (AFS) over a broad frequency band [15].The rational interpolation is implemented implicitly via Stöer-Bulirsch (SB) algorithm [16], [17].The framework consists a highly optimized in-house MoM-based solver implemented on heterogeneous CPU/GPU platform.The relevant implementation issues are considered from the position of a user of a rather typical low-cost PC-style multi-core CPU-based workstation with a single GPU device.This perspective is justified by the fact that such the platforms have recently gained popularity in scientific computing as an inexpensive heterogeneous massively parallel architecture available to the masses.

MoM Formulation
The approach employed in this study for analysis of wire-grid structures adopts the frequency-domain mixedpotential electric field integral equation (EFIE) formulation developed in [18], [19] for electromagnetic scattering and radiation by arbitrary configuration of conducting bodies and wires.For completeness of presentation, the formulation is briefly outlined here.For convenience, we start with a thin perfectly conducting curved wire residing in a simple medium ( , μ) subject to an incident (impressed) electromagnetic field (E i , H i ) (the time-harmonic dependence (exp (jωt)) is assumed and suppressed).The EFIE for the total axial current I(l) on the wire follows directly from enforcing a boundary condition that the axial component of the tangential electric field vanishes on the surface of the wire, namely where conventional notation A for the magnetic vector potential and φ for the electric scalar potential is employed, and 1 l denotes the unit vector locally parallel to the wire axis.For the wire approximated by a series of broken line segments, the unknown current is expanded as a linear combination of one-dimensional RWG-type triangle functions with where l and r i denote the arc length along the wire axis and a position vector of the i-th point subdividing the wire into segments, respectively.The EFIE-MoM procedure employing RWG functions for both expansion and testing leads to the matrix equation where Z is an N × N MoM-generated dense system matrix with complex entries given by (m, n = 1, 2, . . ., N ) where A n (r) and φ n (r) denote A and φ values, respectively, from n-th basis function at the observation point specified by r with The unknown current expansion coefficients constitute a column vector I N ×1 .The elements of the excitation vector V N ×1 are weighted averages of the incident electric field over sub-domains occupied by weighting function, and can be expressed as (m = 1, 2, . . ., N ) ) The current expansion coefficients from the solution of (4) introduced to (2) yield the approximate current distribution on the structure, and all electromagnetic properties of this latter can be evaluated from this current in the usual way.The overall procedure is readily extended to multi-wire structures with wire junctions.
The system matrix Z in (4) frequency dependent and, therefore, the matrix equation must be set up and then solved repeatedly for each individual frequency, ( f = ω/2π), within a set of discrete frequencies of concern.The equation ( 4) can be solved by standard methods of linear algebra.In this paper, only the LU decomposition is considered in CPU/GPU context, since the technique is widely used in MoM simulations.

Adaptive Frequency Sampling
An adaptive frequency sampling (AFS) scheme employed in this study consists in performing repetitive bisection of each of initially chosen frequency intervals until a specified convergence criterion is met for the observable H ( f ).The overall efficiency of sampling process is improved through adaptive interpolation of H ( f ) by a rational function fitting the support points { f i , H i ; i = 0, 1, 2, . . ., N (= L+M)} selected in the interval halving process.The R L, M is constructed implicitly via a recursive tabular Neville-type algorithm developed by Stöer and Bulirsch (SB), which generates the so-called diagonal rational interpolant [16].
The AFS-SB process is controlled by error surrogates.For this purpose, the two auxiliary fitting models, namely, R L−1, M+1 and R L+1, M−1 are constructed over the same set of support points as that for R L, M .Then, the absolute values of relative errors between R L−1, M+1 and R L, M , and R L+1, M−1 and R L, M are calculated at the midpoint f m of a currently handled frequency subinterval.When one of the errors (or both of them) exceed(s) the assumed convergence tolerance, a new sample H m is computed at f m via MoM, the number of support points for the rational fitting model is thus increased by one, and the bisection process continues until the convergence criterion is met. .

Hybrid CPU/GPU Implementation
The implementation of the Method-of-Moments supported by AFS-SB algorithm as outlined in Sec. 2 and 3 has two main components: 1) a block of procedures to perform recursive bisection of the frequency subintervals, compute the values of the rational fitting models at frequency nodes, control the convergence error, and 2) MoM-based electromagnetic simulator (solver) providing numerically "exact" values of the observable.
The main program loop is executed on the CPU host.The host manages the AFS-SB algorithm, the GPU device memory allocation, host-device data transfers, and invokes the MoM solver.A noticeable feature of this latter is that the two most computationally intensive phases of MoM, that is, the assembly of the system matrix and the solution of the matrix equation have been accelerated by employing the GPU device.The CPU sequential procedures for computing the elements of the system matrix have been mapped to parallel GPU platform as described in [20].To enable the solution of relatively large-size problems with MoM-generated matrix exceeding the amount of memory available on the device, a hybrid out-of-GPU memory CULA-panel-based LU decomposition algorithm have been implemented [21].The algorithm proceeds iteratively with the following two distinct phases: i) panel factorization, and ii) the update of the trailing submatrix.The panel is factorized on the CPU, while the update of the trailing submatrix is handled both on the CPU and GPU using highly optimised MKL/CUDA kernels.
A flowchart shown in Fig. 1 explains how the CPU/GPU computations are organized.The interested reader is referred to [20], [21] and [23] for more details.

Results
The methodology described in previous sections has been implemented on the following two heterogeneous hardware systems (platforms): System A based on the Intel quad-core i7 3820 processor running at 3.6 GHz with 64 GB of main memory and equipped with a single NVIDIA GTX 680 card with 4 GB of main GPU memory running at 1006 MHz System B based on the Intel eight-core Xeon E5-2630 processor running at 1.8 GHz with 128 GB of main memory having one Tesla K20Xm card with 5 GB of global memory running at 732 MHz.
The software tools used on the CPU side were BLAS from MKL 11.3 coming with the Intel Composer XE 2017 beta suite, and CULA R18 on the GPU side [24].

Performance of GPU-accelerated MoM
The Method-of-Moments solution to the integral equation involves three phases, namely, assemble of the system matrix, calculation of the excitation vector, and solution of the linear system.Except for a very small problems in terms of the number of unknowns, the excitation-vector calculation time is negligible compared to the execution times of remaining two phases, and therefore is neglected in the performance considerations.
As a starting point to exhibit the performance of the hybrid CPU/GPU MoM implementation, graph on the top of Fig. 2 shows how the MoM matrix-fill time, the linear system solution time, and the combined (total) execution time depend on the size N of the matrix for a four-core CPU implementation (system A).As expected, the linear system solution time represents generally dominant portion of the total execution time.A careful examination of the timing results reveals that the time of the matrix-fill phase which is O(N 2 ) prevails over the linear system solution time only for rather small problems with N not exceeding about 3700.If the matrix-fill routine is ported from CPU to GPU using CUDA, and a GPU-accelerated version of LAPACK zgesv() routine from CULA Tools is used for solving the linear system, the timing data for the two solution phases changes as displayed on the center graph in Fig. 2.
To get more detailed insight into the GPU acceleration effects, the measured speedup defined as a quotient of the CPU and GPU runtimes is given on the bottom graph.The shaded area indicates regions where matrices fit GPU memory and can thus be solved using CULA.As can be seen from the relevant curves the matrix-fill process is about 5.9 times faster on the GPU than on the CPU.For larger matrices (N ≥ 9200), the matrix-assembly time drops below 10 % of the combined runtime, and becomes insignificant compared to the linear-system solution time as the problem size increases.The results clearly indicate that even with an inexpensive consumer graphics card (GTX 680), the speedup of about 5.9×, 1.6× and 1.75× can be achieved for the fill, Fig. 2. System A: measured matrix-assembly (fill) time, linearsystem solution time, and the combined runtime against matrix size for reference four-core CPU implementation (top), four-core/single-GPU GTX680 implementation (center), and resulting speedup for GPU-accelerated implementation (bottom).
solution and total execution times, respectively.This is particularly worth noting for large problems exceeding the amount of memory available on the device.
For system B, the speedup curves remain constant for problem size N > 11000, and the speedup figures are about 4.4×, 3.6× and 3.7× for the fill, solution and total times, respectively.

Wire-grid Shields
To demonstrate the potential and efficiency of the proposed approach in solving practically-oriented problems, a lightning protection system (LPS) directly hit by lightning is considered.The LPS serving as a building shielding structure from the lightning electromagnetic pulse (LEMP) effects has the form of a grounded wire-mesh cage (see Fig. 3a).The perfectly electrically conducting (PEC) ground plane is assumed and taken into account by the method of images.The dimensions of the building are (length x width x height) 40x40x10 m [25].The LPS is assumed to be made of perfectly conducting wires of radius 4 mm forming a square mesh with width 1 m.For readability, a coarse mesh is shown in Fig. 3a.It is assumed that the lightning strikes a corner of the building, and the lightning channel is represented by a vertical lossy monopole antenna with distributed loading [26].To be specific, the monopole of the height 2 km and radius of 5 cm is uniformly loaded with series inductance of 4.5 µH/m and series resistance of 1 Ω/m [25].The antenna is fed at its base by a delta-gap unit-voltage source.
The knowledge of currents and voltages induced in electrical circuits inside a protected volume is crucial for the design of LPS.To mimic the situation of interest, a large loop formed by two horizontal parallel wires of radius 0.89 mm placed at the distance of 2 m with the lower wire laid at height of 0.5 m above the ground plane is placed in the volume protected by the shield (see Fig. 3b, c).The length of each wire is 20 √ 2 ≈ 28.3 m.The loop is short-circuited at one end while the other end is loaded by a lumped resistance R = 30 Ω and grounded below the resistor.The entire structure (LPS, lightning channel, loop) is discretized into linear segments of 1 m in length.The total number of wire segments is 8371, and the number of unknowns (basis functions) associated with the structure is 11571.Within the study aimed at predicting the electromagnetic environment in a volume protected by the wire-grid LPS, a partial goal has been to determine the absolute value of the loop current/current transmittance where I is the current flowing through the 30 Ω resistance and I g is the current at the lightning channel base, that is, at the feed-point of the equivalent monopole antenna.
Figure 4 shows the requested transmittance in the frequency range from 1 kHz to 20 MHz.The results provided by GPU-accelerated MoM combined with AFS-SB frequency sweep (MoM-SB/GPU) are compared in the Figure with those obtained from a commercial full-wave electromagnetic simulator FEKO [28].For readability, the transmittance is presented in linear and logarithmic scales.As can be seen, the two sets of data compare favourably.The AFS-SB MoM computations were performed for the frequency range from 1 kHz to 20001 kHz with the initial frequency step ∆ f = 1 MHz and the convergence tolerance δ = 1 %.The algorithm generated 107 non-uniformly spaced frequency samples of T I (jω) with the frequency step locally decreased to 15.625 kHz.To achieve the same convergence level with uniform sampling, 1281 samples taken with the frequency step of 15.625 kHz would be required.Hence, a noticeable reduction of the number of samples is achieved due to adaptive  sampling.The total execution time required for the reconstruction of the transmittance T I (jω) was about 80 min for the system A, and 18 min for the system B. The next demonstration example consists in evaluating the electric field shielding efficiency (SE) of a wire-cage shield with a slit.The dimensions of the cage shown in Fig. 5 are 0.16 m x 0.2 m x 0.2 m.The radius of perfectly electrically conducting grid wires is 0.5 mm, and the width of a square grid is 0.5 cm.A slit of 10 cm wide and 2 cm high is located on one side of the cage (see Fig. 5).The cage consists of 16504 wire segments and the total number of unknowns associated with it is 24744.The cage is assumed to be illuminated by a vertically polarized plane electromagnetic wave of a unit electric field intensity (E =1 V/m) incident from the direction specified by spherical angles θ = 90 • and φ = 0 • .Figure 6 shows the shielding efficiency of the cage with slit computed for the dominant component of the electric field E z at the center of the cage, i.e., at the point (x, y, z) = (0, 0, 0.1) m.The computations were performed for the frequency range from 10 MHz to 4000 MHz with the initial frequency step ∆ f = 210 MHz and the convergence tolerance δ = 1 %.For the investigated structure, the AFS-SB algorithm generated 72 non-uniformly spaced frequency samples of SE with the frequency step locally decreased to 13.125 MHz.Using these samples as support points, the SE values were calculated every 13.125 MHz in the frequency range of interest employing the rational interpolant.To validate the MoM-SB/GPU results, FEKO was launched for the selected frequency testing points and both sets of data are shown shown in Fig. 6 for comparison.Again, the MoM-SB/GPU results and FEKO results compare very well.The total execution time needed to compute 72 samples required for rational interpolation of SE frequency run is about 6 h and 1 h 20 min for systems A and B, respectively.Achieving the same convergence level for approach employing uniform sampling would require 305 samples taken with the frequency spacing of 13.125 MHz.The efect of reducing the number of the required EM simulations is backed up within the considered framework by another effect, i.e., hardware acceleration.The last and most challenging example involves a wirecage shield with rectangular aperture.Inside the cage, there is a wire (having a radius of 0.89 mm) running along the cage length.The wire is placed 6 cm above the bottom of the cage and midway between its sides and is terminated at both ends with 50 Ω loads (see Fig. 7a) [27].The dimensions of the cage are 0.36 m x 0.4 m x 0.6 m, and its square mesh (1 cm x 1 cm) is assumed to be made of perfectly conducting wire segments having a radius of 1 mm.The aperture of 12 cm wide and 5 cm high is located on the top of the cage and placed at the distance of 3 cm from the edge (see Fig. 7b).The model consists of 23898 wire segments and the total number of unknowns associated with the model is 35877.The structure is illuminated by a vertically polarized plane electromagnetic wave of a unit electric field intensity (E = 1 V/m) incident from the direction specified by spherical angles θ = 45 • and φ = 45 • , and the task is to find the induced signal on the load placed at (x, y, z) = 0.18 m, 0.4 m, 0.06 m over a decade frequency bandwidth from 200 MHz to 2000 MHz.
Figure 8 shows the requested frequency response in terms of the current magnitude in the load.Since the response exhibits sharp resonant peaks due to resonances of the cavity and wire, the frequency range from 1200 MHz to 1400 MHz is enlarged in Fig. 9 and the current amplitude and phase are shown separately.As can be seen from the Figure, the waveforms obtained via AFS-SB MoM approach agree very well with the reference data from FEKO.Sharp peaks in the frequency response are captured and resolved.The last and most challenging example involves a wirecage shield with rectangular aperture.Inside the cage, there is a wire (having a radius of 0.89 mm) running along the cage length.The wire is placed 6 cm above the bottom of the cage and midway between its sides and is terminated at both ends with 50 Ω loads (see Fig. 7a) [27].The dimensions of the cage are 0.36 m x 0.4 m x 0.6 m, and its square mesh (1 cm x 1 cm) is assumed to be made of perfectly conducting wire segments having a radius of 1 mm.The aperture of 12 cm wide and 5 cm high is located on the top of the cage and placed at the distance of 3 cm from the edge (see Fig. 7b).The model consists of 23898 wire segments and the total number of unknowns associated with the model is 35877.The structure is illuminated by a vertically polarized plane electromagnetic wave of a unit electric field intensity 200 MHz and 390.625 kHz.The total wall-clock simulation times are about 34 h 4 min and 7 h 25 min for systems A and B, respectively.

Conclusion
The computationally efficient MoM-based framework for broadband EM simulation of wire-grid shielding structures has been demonstrated in the paper.Broadband capability of the approach is attained through supporting MoM by a simple binary search procedure combined with rational interpolation of the observable implemented through Stöer -Bulirsch algorithm.This latter provides indirectly a single high-order diagonal rational interpolant over the whole frequency band of interest.The method offers a possibility of controlling a priori the convergence tolerance level and thus the accuracy of final results.As the data-driven technique, the AFS-SB algorithms can be easily interfaced with existing frequency-domain simulators.The significant performance increase of the whole application is gained by employing CUDA-enabled CPU+GPU co-processing, that is, by porting computationally intensive parts of the relevant code onto the GPU.The use of a hybrid out-of-GPU memory panelbased LU decomposition algorithm offers the LU decomposition capability to matrices whose size is limited only by the amount of memory (RAM) available on the host.The overall approach presented in this paper constitutes an example of the integration of the advanced computational electromagnetics techniques with modern architecture of a rather typical low-cost PC-style workstation for the purpose of increasing efficiency of the application-oriented EM simulator.

Fig. 1 .
Fig. 1.Flowchart for the GPU-accelerated version of the MoM.

Fig. 3 .
Fig. 3. Model of the LPS hit by lightning (a) and arrangement of protected circuit (b, c).

Fig. 7 .Fig. 7 .
Fig. 7. Wire-grid cage with slit and wire midway between the sides (a), top view of the cage with slit arrangement (b).Fig. 7. Wire-grid cage with slit and wire midway between the sides (a), top view of the cage with slit arrangement (b).

Fig. 8 .
Fig. 8. Amplitude of the current response of cavity with rectangular aperture.The frequency response over a frequency bandwidth from 200 MHz to 2000 MHz was computed from 150 frequency samples generated by AFS-SB algorithm with the convergence parameter δ = 1 %.The samples were spaced nonuniformly with the frequency step varying between the initial step of 200 MHz and 390.625 kHz.The total wall-clock simulation times are about 34 h 4 min and 7 h 25 min for systems A and B, respectively.

Fig. 9 .
Fig. 9. Amplitude and phase of the current response of cavity with rectangular aperture for frequency band 1200-1400 MHz.