Lightweight Machine-Learning Model for Efficient Design of Graphene-Based Microwave Metasurfaces for Versatile Absorption Performance

Graphene, as a widely used nanomaterial, has shown great flexibility in designing optically transparent microwave metasurfaces with broadband absorption. However, the design of graphene-based microwave metasurfaces relies on cumbersome parameter sweeping as well as the expertise of researchers. In this paper, we propose a machine-learning network which enables the forward prediction of reflection spectra and inverse design of versatile microwave absorbers. Techniques such as the normalization of input and transposed convolution layers are introduced in the machine-learning network to make the model lightweight and efficient. Particularly, the tunable conductivity of graphene enables a new degree in the intelligent design of metasurfaces. The inverse design system based on the optimization method is proposed for the versatile design of microwave absorbers. Representative cases are demonstrated, showing very promising performances on satisfying various absorption requirements. The proposed machine-learning network has significant potential for the intelligent design of graphene-based metasurfaces for various microwave applications.


Introduction
Metasurfaces, composed of periodic or quasi-periodic two-dimensional (2D) arrays of subwavelength units, have emerged as one of the most thriving types of artificial electromagnetic surfaces, owing to their fascinating and tailorable electromagnetic properties [1,2]. In contrast to traditional bulk metamaterials [3][4][5][6], metasurfaces exhibit extreme thicknesses which enable engineering electromagnetic waves in phase, amplitude, and polarization through a compact and easiley fabricated system, providing great freedom in manipulating light-matter interactions at the sub-wavelength scale [7,8]. Such promising approaches prove their feasibility in numerous applications, from basic devices of holograms [9], electromagnetic absorbers [10], and polarizers [11], to more complex systems of information encryption [12,13], signal processing [14], and intelligent recognization [15].
Microwave absorption is one of the most important applications of metasurfaces, which are extremely useful in various engineering aspects [16,17]. Metasurface absorbers [18][19][20] can provide devisable bandwidth, ultra-thin thickness, and angular robustness, as compared to conventional microwave-absorbing materials or devices. Combining nanomaterials and metasurfaces provides a brand-new solution for excellent microwave absorption performance with optical transparency [21,22]. Recent advances in the study of 2D materials, particularly graphene, provide a novel viewpoint for the active control of electromagnetic waves throughout a wide spectrum [23,24]. Graphene possesses remarkable physical properties including monoatomic thickness, optical transparency, and unique electrical tunability attributable to its gapless and symmetrical band structure [25,26]. Notably, the electrostatic control of carrier concentration in graphene allows the dynamic manipulation of electromagnetic waves by adjusting graphene's Fermi energy [27,28]. For example, garphene has been experimentally implemented for microwave absorbers [21,29]. Most

Graphene-Based Metasurface Absorber Model
The microwave metasurface absorber studied in this article by intelligent design consists of patterned graphene sandwich structures [30]. The top layer of the absorber is a graphene sandwich structure, which is based on a polyethylene glycol terephthalate (PET) substrate with a dielectric constant of 3. A thin ITO ground set is the bottom layer. It is worth noting that all those materials are optically transparent, so that the metasurface made of such materials would be optically transparent. This structure utilizes graphene's dynamic conductivity by applying different bias voltages and can be used for tunable broadband absorption. The graphene layer is modeled as an infinitesimally thin resistive surface characterized by a sheet resistance R g given by the well-established Kubo formula [42]. The bias voltage directly changes the sheet resistance of the patterned graphene layer, resulting in the dynamic change in the absorption performance at different frequency ranges. The sheet resistance of the graphene layer can be simplified as [43]: where e represents the electron charge constants, andh and k B are the Planck's and Boltzmann's constants, respectively. ω represents the operation angular frequency. T is the room temperature and E F is the Fermi energy of graphene proportional to the external bias voltage.h = 1/2τ is the phenomenological scattering rate(τ is the electron-phonon relaxation time). We consider T = 300 K and τ = 0.2 ps. The electromagnetic performance of such a metasurface absorber critically relies on the patterned graphene layer [26,44]. By modifying the geometry of the patterned graphene layer and the sheet resistance of graphene, versatile absorption performances can be obtained. However, the conventional design of metasurface structures relies on massive numerical simulations for computing the electromagnetic response of different parameter combinations, which is time-consuming and redundant. In this paper, we utilize neural networks and suitable machine-learning techniques to propose an efficient, user-friendly, and high-performance design system for graphene-based metasurface absorbers.

Machine-Learning Model
We propose the machine-learning prototype utilizing an MLP network and a transposed convolution technique to realize the fast prediction of reflection coefficients in the range of 6-20 GHz, according to the combination of several geometrical parameters. In our work, a combination of 5 parameters is used as the input information and the reflection coefficients can be inferred as the output of the machine-learning model. There are 4 geometrical parameters, p, d, l, h. p represents the period of meta unit, d is the length of the middle square hole, l is the length of graphene in the y-axis, and h is the thickness of the PET substrate, as shown in Figure 1. The tunable sheet resistance R g of the graphene layer is another parameter considered in the machine-learning model. Samples in the dataset are uniformly distributed from a reasonable range of the linear space R 5 , which restricts those parameters not to violate topology and are in accordance with physics. Moreover, all parameters are normalized to [0,1] before being put into the model, as Equation (2) illustrates:s = s s max − s min (2) where s represents the value of any of the 5 parameters,s represents the normalized value, s max is the maximum value of this parameter while s min is the minimum value. Normalization reduces the impact of numerical differences between different parameters and makes the training process more stable and effective. Thus, the excellent performance of the forward prediction network is guaranteed when taking those 5 normalized parameters as input.
The convolution techniques that are adopted in CNN are used in lots of fields such as digital image and voice processing [45,46]. They can combine the information in local fields in learning and work as feature-extraction tools. The convolution operation decreases the spatial dimensions and produces an abstract representation of the input image as we go deeper down the network [47]. Recent research into inverse design of metasurfaces uses 1D convolution to extract spectrum features and build inverse model to predict design parameters directly [48]. Here, we add transposed convolution techniques to our forward prediction network to serve the inverse design system better. Transposed convolution is also known as deconvolution, which is not appropriate as deconvolution implies removing the effect of convolution, which we are not aiming to achieve. It is used as a efficient upsampling tool in the modern image semantic segmentation [49,50] and super-resolution algorithms [51]. Deconvolution can also be used to observe the featurelearning performance of the intermediate convolution layer, and is mostly used in image processing and pattern recognition. In our work, since the network model is finally used for inverse design, transposed convolution techniques are used to upsampling the reshaped features from hidden layers and the reconstruct reflective spectrum, which improve the effect of inverse design, to a certain extent.
That strategy also helps improve the performance of prediction in some boundaries of input sampling space and reduce the training parameters of this machine-learning model. That is, we make the model more lightweight and easier to train without performance degradation by introducing transposed convolution layers. The architecture of our deep-learning network is shown in Figure 2. There are two fully connected layers with 100 neurons and 700 neurons in the linear block and three 1D transposed convolution layers in our transposed convolution block. Neurons in the linear block are activated by the Leaky ReLU (rectified linear unit) activation function while those in transposed convolution block are activated by the ReLU activation function.
The total number of trainable parameters in our model is 82,250, which is significantly less than models from recent research on forward predicting light spectrum by AI. The training process of this network is the optimization of a loss function. Minimum square error (MSE) loss is a simple and suitable loss function for our network, which is defined as: withŷ i denoting the output of MLP network when the input is x i , and n denoting the amount of training samples. We call this machine-learning model a forward prediction network (FPN), which can perform accurate simulation replacing numerical electromagnetic simulations. The training progress is discussed in Section 3.2. The FPN learns knowledge of electromagnetic theory from data and reproduces the calculation correctly gradually.

Inverse Design System
For inverse design, the FPN model can be seen as a black-box function. That is, the trained FPN model is used as a function F(x), defined as: Obviously, F is continuous when x ∈ X, and the input space is 5 dimensional and the output space is 281 dimensional. Partial derivatives ∂y i /∂x j exist and can be calculated for every i = 1, 2, . . . , 281 and j = 1, 2, . . . , 5. Thus, F is the first-order differentiate. F(x) is an abstract function, not like a normal sine or polynomial function that can be mathematically expressed easily. We only need to know the input and output, while the relationship between input and output is learned from the machine-learning training process, which is also called "knowledge" of AI. We can utilize this "knowledge" for our further research and do not need to understand its fundamentals. The Jacobi matrix of F(x) for ∀x ∈ X can be easily calculated numerically by back propagation using machine-learning toolbox. Since we need to design a specific absorber, that means we have some pre-defined requirements on the absorption spectrum. This can also be represented by a reflective spectrum, since the transmissive wave is neglectable in our design. In the absorber design, we typically want the band of absorption to be as wide as possible, or the intensity of absorption to be as strong as possible. Based on such requirements of absorber design, we can set optimization goals on the reflective spectrum according to different requirements. The optimization goal L can be set as: where S denotes the points set of the designed absorption frequency range, and F(x) i represents the i-th element of the output vector y = F(x). Here, w i is the optimization weight of F(x) i , which can fine tune the optimization results. The optimization problem can be presented as: where X is the input space of our FPN. Therefore, fulfilling the absorber design requirements turns into solving an optimization of the first-order differentiable continuous functions with constraints. Selecting an optimization algorithm in the conventional convex optimization field such as steepest descent, conjugate gradient or Lagrange multiplier method, and penalty function method can make our inverse design system work. Since the Jacobi matrix of F(x) can be obtained directly from the machine-learning toolbox, the optimization process can also be performed in a machine-learning prototype effectively.
The absorption in the frequency range S can reach the optimized state by minimizing L.
The working mechanism is shown in Figure 3, where ∇ x L(x) is computed by Chain rule in calculus: To implement the inverse design system, we define a new machine-learning model with nearly the same structure as the forward prediction model. The new model does not need input as FPN. The first layer of the new model is working as model input, but is trainable instead. After the first layer, the network architecture is the same as our pretrained FPN model. Then, we fix the weight and bias in the new model except for the first layer and set those quantities to exactly the same as pre-trained FPN model, which means these parts work as the black-box function F(x). In the training process, the parameters of the first layer are optimized by our optimization method while other parameters of the new machine-learning model remain unchanged.
In the inverse design system building, we realize the differentiable property of the machine-learning model and utilize the model as a black-box function. CST generates data in a complex numerical calculation manner while the machine-learning model can mine the implicit knowledge contained in the generated data. Therefore, for this graphene-based metasurface absorber, the principle of numerical calculation can be concisely and accurately represented by machine learning, which is crucial to establishing a fast and effective inverse design system. The insights gained from machine learning have great potential to expand to other nanomaterials applications.  Figure 3. Illustration of inverse design system optimization process. Before training begins, an initial seed x 0 is generated randomly. In iterative step k, loss L and the its gradients to x: g k = ∇ x L(x) are firstly computed through the pre-trained model.ĝ k is derived by the adaptive subgradient method [52] based on g k in all iterations before k-th to determine the desecending direction d k , which is the same dimension as x. λ is the descending step of each iterative, which is a scalar in (0, 1). x is updated with the iterative paradigm. After a fixed number of iterative steps, the optimized result can be then solved. Combinations of optimized parameters of any design requirements can be obtained within seconds.

Dataset Collection
To train the forward prediction neural network, we first need a dataset of our graphenebased metasurface with adequate data to sample space. Simulations for 7000 combinations of parameters are conducted by Matlab-CST co-simulation. The metasurface is first modeled and the simulation condition is set in commercial software CST Microwave Studio. In the simulations, periodic boundary conditions were set along the x and y directions and the Flouquet port excitation was applied along the z direction. In the process of the generation of data, parameter combinations are firstly uniformly sampled in Matlab. The built-in Visual Basic interface of CST is utilized in Matlab to change the dynamic parameters that we need. In each iteration, CST Microwave Studio fetches one parameter combination from Matlab, runs the corresponding numerical simulation, and passes calculated spectrum data back to Matlab. The values of parameters are uniformly sampled from constrained space, which keeps the topology and physical rules correct. In this way, 7000 pairs of data containing parameter combinations and spectrum are finally generated and organized into the dataset. The sampling process and the range of parameters are shown in Figure 4 and Table 1.

Performance of Forward Prediction
The hyperparameters for training models are shown in Table 2, which is not heavily optimized but can achieve our objectives. The training process is achieved in the online platform Kaggle with GPU acceleration by Nvidia Tesla P100. This GPU has the NVIDIA Pascal GPU architecture, which is optimized to support novel deep-learning applications. Figure 5a shows the loss of training and validation data during the training process, for the cases with and without normalization of the input parameters. It can be seen that the train loss with normalization converges to below 10 −5 , one order lower than the one without normalization. The validation loss with normalization also converges to 3 × 10 −5 , much lower than the one without normalization, 1.37 × 10 −4 . Moreover, both the validation and the train loss with normalization drop very quickly in the first 1000 epochs, much faster than those without normalization. The comparison of model training with and without normalization shows that the normalization of the input parameters speeds up the convergence of loss and makes the test error significantly lower. The average percent error for each spectrum is defined as the difference between the prediction of the FPN model and the ground truth from CST simulation, divided by the latter. Figure 5b shows the average percent error with and without normalization. It can be seen that the error for each spectrum point is less than 1% after 1000 epochs of training with normalization, which shows the extraordinary prediction accuracy of our FPN models.

Loss function MSELoss
To better show the performace of our network architecture, we build an MLP model for comparison. The MLP model has two hidden layers, the same as our FPN, with a fully connected last hidden layer with a 281-dimension vector output. The nodes of the four layers (including input and output) are 5, 100, 700, 281. Therefore, the total trainable parameters are 268,281, about 4 times our FPN model. We also define a measurable criterion to evaluate the performance of different models more visually. The distance D refers to the error of one prediction of the model defined as: Figure 5b shows the prediction of the reflective spectrum by models with and without transposed convolution layers. It is clear that, in some extreme situations, the distance of our FPN remains quite low while the MLP performs poorly in some frequencies.
The model is trained well quite quickly, with less than 25 min required to achieve very promising accuracy for the forward prediction. The forward-prediction performance of evaluating examples not used for model training is shown in Figure 5e,f. The our FPN can run thousands of simulations in milliseconds and its result perfectly matches the ones of CST simulation. Owing to its concise and lightweight architecture and extremely promising prediction precision, our model is able to be trained quickly and efficiently. Once we collect some minor datasets in other frequency regimes with specifying geometries, the model is also scalable and valid across terahertz or infrared regimes by transfer learning, which is worth further investigation.

Results of Inverse Design System
The absorptivity A(ω) of the proposed metasurface can be calculated by: Here, r(ω) and t(ω) are the reflection and transmission coefficients, with t(ω) being negligibly low.
As a first example, we set the targeted frequency at around 10 GHz to find the highest absorptivity as possible,as shown in Case 1 of Figure 6. It is seen that a minimum reflectance reaching −50 dB could be optimized at 10 GHz, reaching a peak absorption over 99.99%. As a second example, the absorption frequency range is targeted as 9-14 GHz, as shown in Case 2 of Figure 6. After optimization, a wide-band absorption spectrum can be obtained and more than 99% absorptivity achieved in the entire band of 9-14 GHz. We can also set two targeted frequencies separately as the optimization regions for seeking a metasurface absorber with dual-band absorption. As shown in Case 3 of Figure 6, there is nearly 99.9% absorption in the dual peaks at 10 GHz and 16 GHz. In Case 4 of Figure 6, we show an ultra-wide band absorption optimization, where over 90% absorption have been achieved within the frequency band 7.55-18.8 GHz, covering both X-band and Ku-band. design with thickness of substrate h fixed at 3.5 mm. Cases 9-12: inverse design with sheet resistance of graphene R g fixed at 250 Ω. The colored area is the optimization area. FPN indicates the optimized reflection spectra given by inverse design system while CST represents the reflection spectra from CST simulations with the optimized parameter combinations. Figure 6 shows the reflective spectrum of FPN prediction and the CST simulation result based on optimized parameter combinations. The corresponding parameter combinations are shown in Table 3, verifying the effectiveness and performance of our machine-learning model based the inverse design of versatile metasurface absorbers. Those cases are chosen from arbitrary given requirements, which reveal the universal applicability of our inverse design system.
In most situations, the dielectric substrates are typically of standard thicknesses. Substrates with non-standard thicknesses are hard to fabricate or of high cost. If, for example, we fix the thickness h to be 3.5 mm, the design system can still give promising results. In Figure 6, Cases 5-8 show the optimization with the remaining four design parameters to achieve the required absorption performance. It is seen that most requirements can still be satisfied well. Since the thickness of substrate h = 3.5 mm is relatively thick and affects the resonance frequency evidently, the absorption frequency bands in Cases 7 and 8 show a slight red shift as compared to those in Cases 3 and 4.
On the other hand, since the tunable patterned graphene is more difficult to fabricate than unchangeable graphene, it is also meaningful to examine the performance of our inverse design system with the fixed-sheet resistance of graphene. Here, assuming the graphene layer has a fixed-sheet resistance of 250 Ω, different absorbers can still be designed well. Figure 6 Cases 9-12 show the inverse design results by optimizing the remaining four geometrical parameters while fixing the sheet resistance of graphene. We can see that the absorption performances in Cases 9-12 are obtained very similarly to those of Cases 1-4, with little degradation in absorptivities. All these results indicate that our inverse design system has good design flexibility. The satisfactory performance of our model indicates its potential in other nanomaterials applications. When it comes to a new material system, the reasonable parameter-sampling space should be given, firstly, according to professionals of engineers. Then, similar training procedures and inverse-design-system building could be applied, as such a model is scalable to other material systems. Table 3. Parameters combinations for cases in Figure 6.

Conclusions
In this work, we proposed a novel machine-learning-model-based inverse-design system for designing graphene-based metasurface absorbers with versatile absorption performance. Transposed convolution layers were introduced in our forward-prediction architecture for reducing the model size, which improves performance. With the key parameters of the metasurface normalized as input, the forward-prediction model can quickly predict the reflective spectra of the absorbers with high accuracy, as compared to numerical simulations. Based on the well-trained machine-learning model, we built an inverse-design system to optimize versatile-absorption performance. Given the optimization goal for specified absorption frequencies, the system can find the optimized results in a sampling space in seconds. The insights gathered from this paper could help with the intelligent design for other type of graphene-based metasurfaces or devices.