Fast Inverse Design of Nanophotonics using Di fferential Evolutionn and Back-Propagation

Deep learning technology have been used as a new approach for forward simulation and inverse design of nanophotonic structures. Deep learning technology greatly reduces the time of optical simulation and enables us to use back-propagation (BP) algorithm to optimize design parameters. But BP is very sensitive to the initial values and hard to converge to the optimal value for some initial values. In this research, we propose a hybrid optimization strategy that combined differential evolution (DE) with BP algorithm for the inverse design of multilayer nanofilms structures. The proposed method effectively utilizes the global parallel exploration capability of DE and the local exploitation capability of gradient descent based on BP. It can alleviate the sensitivity of the initial values for the BP algorithm and effectively compensates for the slower convergence properties of the DE. The results suggest that the hybrid DE-BP algorithm can greatly speeds up the inverse design process of multilayer nanofilms and can search in a larger parameter space that even exceeds the parameter range of the training dataset that are used to train the forward prediction neural networks.


Introduction
Light-matter interactions at the nanoscale has many unique properties and offers a wide range of applications in different areas.The inverse design of nanostructures to achieve specific optical properties has always been one of the important research contents in this research field.There are mainly two traditional ways to design nanophotonic devices: intuition-based approaches [1][2][3] and simulated-driven optimization [4][5][6][7][8].The intuition-based method relies on prior knowledge to adjust the parameters of nanophotonic devices to match the desired performance, it will face considerable challenges when the parameters are too complex to be analyzed intuitively.In contrast, simulated-driven optimization is a more intelligent design approach, because it does not rely on prior knowledge but a certain optimization algorithm to search the optimal solution.However, with the increasing complexity of nanostructures, the simulation and optimization of the nanostructures are time-consuming and computationally expensive.
In recent years, deep learning (DL) technology that based on data-driven has been introduced as a new approach for the inverse design of nanophotonics [9][10][11][12][13].For example, Peurifoy et al. demonstrated that neural networks (NNs) are significantly faster than traditional numerical simulations in optical forward computation and can handle more complex problems through back-propagation (BP) algorithm [9].Besides, Yungui et al. applied such a method for the design of the multilayered spatial optical differentiator [13].Moreover, Dianjing et al. proposed a tandem network for the problem of non-uniqueness that the same spectrum can be created by many different structural parameters [10].
It is well known that inverse design of nanophotonics is an ill-posed inverse problem.If there is no additional information, the solution to this problem is unstable or undetermined.Usually it has to employ regularization methods to get stable solutions.The proposed tandem network is a regularization methods that constrains the inverse design network to output only a set of design parameters corresponding to the target spectrum, but this regularization methods cannot guarantee that such a set of design parameters is the optimal design point [10,14].The BP algorithm optimizes the input parameters according to the obtained gradient, so the solutions of design parameters are likely to fall into saddle points and local optima not far from the initial structure [15,16].This means that the solution space searched by the BP algorithm is limited, and the regularization comes from the limitation of the solution space.Therefore, the choice of initial structure value is the key to successful inverse design of nanophotonics through the BP algorithm.In short, although the proposed DL-based methods have made great progress in the inverse design of nanophotonics, the disadvantages mentioned above need to be resolved.
In this paper, we adopt differential evolution (DE) combined with BP algorithm [17] for the inverse design of multilayer nanofilms structures.DE algorithm, proposed by Storn and Price [18], belong to the class of evolutionary algorithms.It is one of the most competitive evolutionary algorithms for the global optimum in the continuous parameter space [19].We first use DE to global explore the continuous structural parameter of the multilayer nanofilms, and then use BP algorithm to explore its neighborhood.The combination of the two methods effectively compensates for the slower convergence properties of the DE and the sensitivity of the initial values of the BP algorithm.

Inverse design of multilayer nanofilms with DE-BP method
The whole design process is mainly divided into two steps: firstly, a NNs is trained for optical forward computation.Next, we use DE combined with BP (DE-BP) to optimize the design parameters.Compared with the traditional numerical method (e.g., finite-difference time-domain method [20], finite element method [21] or transfer matrix method [22,23]), an trained forward prediction NNs (FPN) has a faster speed for optical properties calculation [9].As shown in Fig. 1, FPN is used for forward optical calculations.The input of the FPN is the thickness of the nanofilms that can be expressed by a set of parameters D = [d1, d2, …, d16].The output is 201 spectral sampling points between 400 and 800 nanometers, and represented by a vector S = [S1, S2, …, S200, S201].Here we consider a 16-layer nanofilms consisting of alternating layers of Al2O3 and Si3N4.
To train a FPN to approximate forward optical calculation, we first generate 155000 samples using S 4 software [24], where the maximum thickness of each layer limit to 100 nm.It is convenient to choose scale-invariant units in electromagnetic problems.The characteristic length-scale in this research is set to one micron (1000nm).That means all lengths are normalized by one micron, for example, the maximum thickness is set to 100/1000=0.1 unit.We selected 5000 of these samples as test dataset, and the remaining samples were randomly divided into training (90%) and validation (10%) datasets.Next, we train the FPN using the training dataset.The validation dataset is used to preliminary evaluate the capability of the FPN and adjust the hyperparameters of the FPN.The test dataset is used to finally evaluate the generalization ability of the trained model.Although generate a large amount of training data is time consuming, it is only a one-time consumption and can be used repeatedly in future inverse designs.Once the neural network is trained, it can accurately predict the transmission coefficient of multilayer film (see Supplement A for detailed training process and results of the FPN).Fig. 1.A flowchart of forward calculation and inverse design method.The thickness of each layer of the nanofilm serves as the input to the FPN, and the discretized transmission spectrum serves as its output.
For inverse design, it usually starts with a given set of design parameters, calculates their optical response through electromagnetic simulation or FPN model, and then updates the design parameters according to the difference between the output response and the target response.This process is performed iteratively through some optimization methods [5][6][7]9,18,[25][26][27][28].In this work, we demonstrated the use of DE-BP as a design method to optimize multilayer nanofilms.DE is used for global search of the whole feasible structural parameter space, and BP is used for local search to improve convergence rate.DE is a simple yet powerful population-based global searching algorithm for solving various optimization problems over continuous spaces.However, there are two main defects in DE, premature convergence and stagnation [19,29,30].The improvement of previous studies mainly focuses on the following four aspects: control parameter settings [31][32][33], strategy selection [34][35][36], population topology [37,38] and mixing with other optimization algorithms [39,40].In this paper, we propose a new evolutionary strategy to address premature convergence by deleting individuals similar to the best one in the current population and replacing them with newly generated ones.To reduce the risk of stagnation, we chose a relatively large population and a random number for the crossover rate [30,31].To improve the convergence rate, BP is adopted here for local search.In complex high-dimensional problems, we show that BP has a faster convergence rate than DE for local search (see Supplement B for details).
The whole optimization process is shown in Fig. 2. The first stage of the proposed method is an improved DE algorithm.The first population is initialized with 16dimensional vectors, which represents the thickness of the 16 layers nanofilms.The vectors are uniform random sample from 0.001-0.1,which corresponds to the minimum film thickness and the maximum thickness respectively.The thickness have been normalized by the characteristic length-scale 1000nm.In each iteration, we delete individuals that are close to the optimal individual of the current population based on the Euclidean distance between them.This ensures that all individuals do not converge in one direction.is a predetermined threshold.When the objective function value of an individual is less than , we think it has found a near-optimal solution and output it to the text for saving.After the differential operation, if the minimum objective value of the current generation is less than the previous generation , it is considered that the DE has found another individual that may produce a near-optimal solution.In order to avoids individual slight perturbation and systematic error, the difference value of and is set to 0.01.When the DE algorithm finds a suitable individual of the population, the individual continue to be optimized by the BP algorithm using the Adam optimizer.The cost function for this optimization is defined as follows: (  The maximum generation number is set to 100 ( ).Our method will perform parallel searches in these generations.The design parameter will save to text when the error sum between its predicted spectrum and the target spectrum is less than 0.1 ( ).The learning rate of the Adam optimizer is set as 0.001 and the number of iterations of each local optimization is set as 100.Fig. 3 shows the transmission spectra of several of the solutions, which were calculated by S 4 software.See Supplement D for parameter values of these solutions and other more examples.

Performance comparison of BP and DE-BP
One disadvantage of using BP for inverse design is that it is very sensitive to the initial values.For some initial values, BP algorithm cannot converge to the optimal solution.In order to find an optimal solution using BP algorithm, it need to try a large number of different initial values [15].This is a time-consuming task in complex highdimensional parameter spaces with a small number of near-optimal solutions.In order to illustrate this problem more vividly, we use a simple function of two variables (socalled the peaks function) to demonstrate the performance of BP with different learning rates (see Supplement E for details).In this simple task, BP still cannot guarantee that the optimization path converges to the global optimal at every initial position.
It can alleviate the sensitivity of the initial values for the BP algorithm using DE global search algorithm combined with BP algorithm.As shown in Fig. 4, we compare the time consumption of BP and DE-BP for inverse design of nanofilms with different layers.For each structure type, we selected 50 samples from its test dataset for inverse design.From the results shown in the figure, the maximum and average running time required by the DE-BP is significantly less than that of the BP, especially for more layer films structures.The minimum running time of both methods is within a few seconds and BP has a shorter average minimum running time (Fig. 4 (c)).This means that BP converge faster for a suitable initial value.However, the choice of initial value is blind for an inverse design.The shorter maximum and average running time of the DE-BP indicates that the DE-BP is more robust for inverse design.
In order to further compare the robustness between the BP and DE-BP algorithm, we make inverse design with initial values that are sampled from a Gaussian distribution instead of uniformly random sampling, in which the mean value is set to match the target value and the standard deviation is set to be 0.001-1, which corresponds to 1nm-1000nm because all lengths have been normalized by the characteristic lengthscale 1000nm.This parameter space is larger than the range of random uniform sampling 0.001-0.1 and also exceeds the parameter range of the training dataset that are

Conclusion
In conclusion, we proposed a hybrid optimization method consisting of DE and BP for the inverse design of multilayer nanophotonic structures.This method effectively utilizes the global parallel exploration capability of DE and the local exploitation capability of gradient descent based on BP.It effectively compensates for the slower convergence properties of the DE and the sensitivity of the initial values of the BP algorithm.In addition, it effectively reduces the possibility of premature convergence and stagnation of DE.The results suggest that the hybrid DE-BP algorithm can greatly speeds up the inverse design process of multilayer nanofilms and can search in a larger parameter space.

Figures
Figure 1 A owchart of forward calculation and inverse design method.The thickness of each layer of the nano lm serves as the input to the FPN, and the discretized transmission spectrum serves as its output.
Flowchart of the proposed DE-BP algorithm.
The test result of the DE-BP method.The target spectrum is randomly selected from the test dataset, and solution 1, 2 and 3 are three near-optimal solutions among the design results.

_
Fl a s t min F

Fig. 3 .
Fig. 3.The test result of the DE-BP method.The target spectrum is randomly selected from the test dataset, and solution 1, 2 and 3 are three near-optimal solutions among the design results.

Fig. 4 .
Fig. 4. Comparison of running time between BP and DE-BP for inverse design of multilayer nanofilms.The scale is logarithmic for (a) Average running time, and (b) Maximum running time.

Fig. 5 .
Fig. 5. Comparison of average running time between BP and DE-BP for distribution with different standard deviations.Examples are selected from multilayer nanofilms for (a) 4 layers, (b) 10 layers and (c) 20 layers, respectively.

Figure 4 Comparison
Figure 4

Figure 5 Comparison
Figure 5