Keywords

1 Introduction

High-power laser facilities for inertial confinement fusion (ICF), such as the National Ignition Facility (NIF) [1], the Laser Megajoule (LMJ) [2] and the Shenguang-III (SG-III) laser facility [3] are ultimately limited in operation by laser-induced damage (LID) of their final optics. The research results in recent years have shown that once the LID are initiated, the LID on the input surface tend to grow linearly with the number of laser shots [4]; while the LID on the exit surface tend to grow exponentially with the number of laser shots [5]. The LID need to be detected in time to avoid irreparable damage to the final optics. The most convenient approach is to set up an inspection instrument based on machine learning in the ICF target chamber center. In the time interval between two laser shots, the instrument needs to complete the damage online inspection for 432 final optics in 48 final optics assemblies (FOA), including image acquisition, image processing, damage analysis, and so on. The machine learning model used in the instrument is trained offline with LID image dataset. When applied to online inspection, only a single forward pass is computed. That is to say, with the help of the trained machine learning model, the instrument can simultaneously acquire images and detect LID in the acquired images online.

In recent years, machine learning has been widely used in ICF experiments, it is mainly focused on solving problems that are difficult to be solved by traditional solutions. Scientists in Lawrence Livermore National Laboratory (LLNL) have done a lot of valuable research [6]. Abdulla et al. conducted ensemble of decision trees (EDT) to identify HR-type false damage sites from candidate sites with 99.8% accuracy [7], which substantially reduces the interference of false damage on the inspection result. Carr et al. also used EDT to distinguish the input and exit surface true damage sites with 95% accuracy. Liao et al. used logistic regression to predict damage growth under different laser parameters (such as cumulative fluence, total growth factor, shot number, previous size, current size and local fluence) [8]. They found that using machine learning can obtain more accurate prediction results than Monte Carlo simulations. Kegelmeyer et al. developed the Avatar Machine Learning Suite of Tools to optimize Blockers, which are used to temporarily shadow identified damage sites from high-power laser exposure [9]. The above are the works conducted by LLNL scientists in the field of damage online inspection for NIF. However, at present, the damage online inspection based on machine learning for SG-III laser facility is still in its infancy stage. Since the imaging technology used in our inspection instrument is inhomogeneous internal reflection illumination, which is quite different from the homogeneous internal reflection illumination technology used in NIF. If the existing methods of NIF are directly used in our experiments, it is difficult to obtain the experimental results with the same accuracy. In addition, deep learning relies heavily on big data for labeled samples, which often requires tens of thousands to millions of labeled samples, and it is difficult to obtain such a huge samples for damage online inspection. In the field of damage online inspection, the current common practice is to use manually feature extraction from the damage sites instead of neural network’s feature learning, thereby reducing the depth of the neural network and the number of training samples.

In this paper, we present the method of damage online inspection and its experimental system, which solves three problems: classification of true and false LID, classification of input and exit surface LID and size measurement of the LID. This fills the gap in the damage online inspection for large-aperture final optics in the SG-III laser facility. The method improves the inspection efficiency and accuracy, which has important practical significance for maintaining the load capacity of a high-power laser facility.

2 Theoretical Model

2.1 Classification Method

Machine learning is an effective method for managing complex classification problems. In this paper, we use the kernel-based extreme learning machine (K-ELM) to solve the automatic classification problem for true and false damage. The K-ELM classification model we used in this paper is as follows [10]:

$$\begin{aligned} f(\mathbf{{x}}) = {\left[ {\begin{array}{*{20}{c}} {K(\mathbf{{x}},{\mathbf{{x}}_1})}\\ \vdots \\ {K(\mathbf{{x}},{\mathbf{{x}}_M})} \end{array}} \right] ^\mathrm{{T}}}{\left( {\frac{\mathbf{{I}}}{C} + {{\varvec{\Omega }}_{train}}} \right) ^{ - 1}}{} \mathbf{{T}} \end{aligned}$$
(1)

where \(K(\mathbf{{x}},{\mathbf{{x}}_i})\) is the kernel function, \(\mathbf{{x}} = [{x^{(1)}},...,{x^{(16)}}]\) is the input sample to be classified, \({\mathbf{{x}}_i} = [x_i^{(1)},...,x_i^{(16)}]\) (i = 1, ..., M) is the training sample, M is the number of all training samples, I is the unit matrix, C is a constant, \({{\varvec{\Omega }}_{train}}\) is a kernel matrix composed of training samples, \({\left( {{\varOmega _{train}}} \right) _{i,j}} = K({\mathbf{{x}}_i},{\mathbf{{x}}_j})\), \((\textit{i}, \textit{j} = 1,..., \textit{M})\), and \(\mathbf{{T}} = {[{y_1},...,{y_M}]^\mathrm{{T}}}\) is a column vector composed of the class labels of the training samples. In our experiment, \(K(\mathbf{{x}},{\mathbf{{x}}_i}) = \exp ( - \gamma {\left\| {\mathbf{{x}} - {\mathbf{{x}}_i}} \right\| ^2})\), and \(\gamma \) is a constant.

We propose autoencoder-based extreme learning machine (A-ELM) to solve the automatic classification problem for the input and exit surface damage sites. The A-ELM consists of two parts: unsupervised feature encoding (sparse autoencoder) and supervised feature classification (ELM). As shown in Fig. 1.

Fig. 1.
figure 1

The overall framework of A-ELM.

This is a neural network with 4 layers, \(n_i\) (\({i} = 0, 1, 2, 3\)) is the number of neurons in the corresponding layer. \({\mathbf{{W}}^{[i]}}\) (\({i} = 1, 2, 3\)) is the connection weight. \({\mathbf{{b}}^{[i]}}\) (\({i} = 0, 1, 2, 3\)) is the bias factors. The bias factors in the input layer and the output layer are both zero, \({\mathbf{{b}}^{[0]}} = {\mathbf{{b}}^{[3]}} = \mathbf{{0}}\). For the damage site X in the image, we use \(n_{0}\) operators \(\mathbf{{f}} = [{f_1},{f_2},...,{f_{{n_0}}}]\) to extract the \(n_{0}\) features. \(\mathbf{{x}} = {[{x^{(1)}},...,{x^{({n_0})}}]^\mathrm{{T}}}\), \({x^{(i)}} = {f_i}(\mathbf{{X}})\), \({i} = 1, 2, ..., {n_0}\). The network outputs only two results, so \(n_{3} = 2\). We use \({\mathbf{{x}}^{[i]}}\) and \({\mathbf{{h}}^{[i]}}\) (\({i} = 0, 1, 2, 3\)) denote the input and output data of the corresponding layer, respectively; here \(\mathbf{{x}} = {\mathbf{{x}}^{[0]}}\), \({\mathbf{{h}}^{[3]}} = \mathbf{{\hat{t}}} = {[{\hat{t}_1},{\hat{t}_2}]^\mathrm{{T}}}\). The forward propagation process is as follows:

$$\begin{aligned} \left\{ \begin{array}{l} {\mathbf{{h}}^{[0]}} = \mathbf{{x}} = \mathbf{{f}}(\mathbf{{X}})\\ {\mathbf{{h}}^{[1]}} = {f^{[1]}}({\mathbf{{W}}^{[1]}}{\mathbf{{h}}^{[0]}} + {\mathbf{{b}}^{[1]}})\\ {\mathbf{{h}}^{[2]}} = {f^{[2]}}({\mathbf{{W}}^{[2]}}{\mathbf{{h}}^{[1]}} + {\mathbf{{b}}^{[2]}})\\ \mathbf{{\hat{t}}} = {\mathbf{{W}}^{[3]}}{\mathbf{{h}}^{[2]}} \end{array} \right. \end{aligned}$$
(2)

where \({f^{[k]}}( \cdot )\) is the activation function in hidden layer k, (\({k} = 1, 2\)). The activation functions could be, but not limited to the following: sigmoid function, tanh function and rectified linear units (ReLU) [10].

For sparse autoencoder, its decoding data \(\mathbf{{\hat{x}}}\) is required to be able to restore the original data x. The decoding data can be described as \(\mathbf{{\hat{x}}} = g({\mathbf{{W}}^{[1]\mathrm{{T}}}}{\mathbf{{h}}^{[1]}} + {\mathbf{{b}}^{[1]}})\), where \(g( \cdot )\) is decoding function, it is also the activation function. To simplify the calculation, we can set \({\mathbf{{b}}^{[1]}} = \mathbf{{1}}\), so the encoding output from the hidden layer 1 can be described as \({\mathbf{{h}}^{[1]}} = {f^{[1]}}({\mathbf{{W}}^{[1]}}{} \mathbf{{x}} + {\mathbf{{b}}^{[1]}})\). The loss function of the reconstruction error is defined as \({L_{loss}} = (1/M)\sum \nolimits _{i = 1}^M {||} {\mathbf{{x}}_i} - {{\mathbf{{\hat{x}}}}_i}|{|^2}\). We use \(L_{2}\) norm regularization term to prevent overfitting: \({\varOmega _{weights}} = (1/2)||{\mathbf{{W}}^{[1]}}|{|^2}\). Since sparse autoencoders are typically used to learn features for classification [11], and in order to discover interesting structure in the input data, we impose a sparsity constraint (Sparsity regularization) on the hidden layer 1. We choose the Kullback-Leibler divergence as sparsity regularization term:

$$\begin{aligned} {\varOmega _{sparsity}} = \sum \limits _{i = 1}^{{n_1}} {\left[ {\rho \log \left( {\frac{\rho }{{{{\hat{\rho }}_i}}}} \right) + (1 - \rho )\log \left( {\frac{{1 - \rho }}{{1 - {{\hat{\rho }}_i}}}} \right) } \right] } \end{aligned}$$
(3)

where \(\rho \) is a sparsity parameter, typically a small value close to zero (such as \(\rho = 0.05\)). \({\hat{\rho }_i}\) is the average activation of hidden neuron i (averaged over the training set), it is defined as \({{\hat{\rho }}_i} = (1/M)\sum \nolimits _{j = 1}^M {f(\mathbf{{w}}_i^{\mathrm{{[1]T}}}{\mathbf{{x}}_j} + b_i^{[1]})} \). Now, we define the cost function for training a sparse autoencoder as follows:

$$\begin{aligned} {J_{\mathrm{{cost}}}} = {L_{loss}} + \alpha \cdot {\varOmega _{weights}} + \beta \cdot {\varOmega _{sparsity}} \end{aligned}$$
(4)

where \(\alpha \) is the coefficient for the \(L_{2}\) regularization term and \(\beta \) is the coefficient for the sparsity regularization term, they are user-specified parameters. The hidden weight \({\mathbf{{W}}^{[1]}}\) can be solved according to the following optimization problem:

$$\begin{aligned} {\mathbf{{W}}^{[1]*}} = \mathop {\arg \min }\limits _{{\mathbf{{W}}^{[1]}}} \left\{ {{J_{\mathrm{{cost}}}}} \right\} \end{aligned}$$
(5)

The Eq. (5) can be solved by the fast iterative shrinkage-thresholding algorithm (FISTA) or conjugate gradient algorithm [12, 13].

The output of autoencoder is used as the input of ELM. According to the theory of Huang et al. the typical implementation of ELM is that the hidden neuron parameters (\({\mathbf{{W}}^{[2]}},{\mathbf{{b}}^{[2]}}\)) of ELM can be randomly generated [14, 15]. So the weight \({\mathbf{{W}}^{[2]}}\) and bias \({\mathbf{{b}}^{[2]}}\) are given randomly:

$$\begin{aligned} \left\{ \begin{array}{l} {\mathbf{{W}}^{[2]}} = rand({n_2},{n_1}),\mathrm{{ }}\\ {\mathbf{{b}}^{[2]}} = rand({n_2},1),\mathrm{{ }} \end{array} \right. \begin{array}{*{20}{c}} {\mathrm{{and}}\; {-} 1 \le w_{ij}^{[2]} \le 1}\\ {\mathrm{{and}}\; {-} 1 \le b_i^{[2]} \le 1} \end{array} \end{aligned}$$
(6)

where \(i = 1,2,...,{n_2};j = 1,2,...,{n_1}\). We use \(\mathbf{{T}} = [{\mathbf{{t}}_1},...,{\mathbf{{t}}_M}]\) to denote the target matrix of training data, where \({\mathbf{{t}}_i} = {[{t_{i,1}},{t_{i,2}}]^\mathrm{{T}}}\). The output data from the hidden layer 2 is \({\mathbf{{H}}^{[2]}} = [\mathbf{{h}}_1^{[2]},...,\mathbf{{h}}_M^{[2]}]\). The hidden weight \({\mathbf{{W}}^{[3]}}\) can be solved according to the following optimization problem:

$$\begin{aligned} {\mathbf{{W}}^{[3]*}} = \mathop {\arg \min }\limits _{{\mathbf{{W}}^{[3]}}} \left\{ {\frac{1}{2}{{\left\| {{\mathbf{{W}}^{[3]}}} \right\| }^2} + \frac{C}{2}\left\| {{\mathbf{{W}}^{[3]}}{\mathbf{{H}}^{[2]}} - \mathbf{{T}}} \right\| } \right\} \end{aligned}$$
(7)

where C is a user-specified parameter, it provides a tradeoff between the distance of the separating margin and the training error. Huang et al. have proved that the stable solutions of Eq. (7) is that [14,15,16]:

$$\begin{aligned} {\mathbf{{W}}^{[3]*}} = {\left[ {{\mathbf{{H}}^{\mathrm{{[2]T}}}}{{\left( {\frac{\mathbf{{I}}}{C} + {\mathbf{{H}}^{\mathrm{{[2]}}}}{\mathbf{{H}}^{\mathrm{{[2]T}}}}} \right) }^{ - 1}}{} \mathbf{{T}}} \right] ^\mathrm{{T}}} \end{aligned}$$
(8)

2.2 Regression Method

We propose hierarchical kernel extreme learning machine (HK-ELM) to solve the size measurement problem for the LID. HK-ELM is a novel method, which consists of two parts: unsupervised multilayer feature encoding (ELM sparse autoencoder) and supervised feature regression (K-ELM) [17,18,19,20]. As shown in Fig. 2.

Fig. 2.
figure 2

The overall framework of HK-ELM.

For the LID X, we use \(\mathbf{{f}} = [{f_1},{f_2},...,{f_{25}}]\) to extract the 25 features; thus, \(\mathbf{{x}} = \mathbf f (\mathbf{{X}}) = [{x^{(1)}},...,{x^{(25)}}]\), \({x^{(i)}} = {f_i}(\mathbf{{X}})\), \({i} = 1, ..., 25\). \({\mathbf{{x}}^{[i]}}\) and \({\mathbf{{h}}^{[i]}}\) denote the input and output data of the i-th layer, respectively; \(n_{i}\) denotes the number of neurons in the i-th layer, \(i = 0, 1, ..., N+2\); \(\mathbf{{x}} = {\mathbf{{x}}^{[0]}}\). The weight \({{\varvec{\beta }}^{[i]}}\) can be solved according to the following optimization problem (\({i} = 0, 1, ..., N\)):

$$\begin{aligned} {{\varvec{\beta }}^{[i]*}} = \mathop {\arg \min }\limits _{{{\varvec{\beta }}^{[i]}}} \left\{ {\sum \limits _{j = 1}^M {{{\left\| {{\mathbf {\hat{x}}}_j^{[i]} - {\mathbf {x}}_j^{[i]}} \right\| }^2}} + {\lambda ^{[i]}}\left\| {{{\varvec{\beta }}^{[i]}}} \right\| } \right\} \end{aligned}$$
(9)

where M is the number of training samples. \({f^{[i]}}( \cdot )\) is the activation function in the i-th layer (\(i = 0, 1, ..., N\)), \({\mathbf {h}}_j^{[0]} = {f^{[0]}}({\mathbf {x}}_j^{[0]})\), \({\mathbf {x}}_j^{[i]} = {\mathbf {h}}_j^{[i - 1]}{\varvec{\beta }}_j^{[i - 1]}\), \({\mathbf {h}}_j^{[i]} = {f^{[i]}}({\mathbf {x}}_j^{[i]})\), \({\mathbf {\hat{x}}}_j^{[i]} = {g^{[i]}}({{\mathbf {h}}^{[i]}}{{\varvec{\beta }}^{[i]T}})\), (i = 1, 2, ..., N; \(j = 1, 2, ..., M\)). \({g^{[i]}}( \cdot )\) is the decoding function in the i-th layer, it is also an activation function. \({\lambda ^{[i]}}\) (\(i = 0, 1, ..., N\)) is the coefficient for the \(L_{1}\) norm regularization, and it is a user-specified parameter. The Eq. (9) can be solved by the FISTA [12]. According to Eq. (1), the output of the kernel layer can be obtained as \({{\mathbf {h}}^{[N + 1]}} = [K({\mathbf {z}},{{\mathbf {z}}_1}),...,K({\mathbf {z}},{{\mathbf {z}}_M})]\), where \({\mathbf {z}} = {{\mathbf {x}}^{[N + 1]}} = {{\mathbf {h}}^{[N]}}{{\varvec{\beta }}^{[N]}}\) is the vector to be inputed to the K-ELM, \({{\mathbf {z}}_i} = {\mathbf {x}}_i^{[N + 1]} = {\mathbf {h}}_i^{[N]}{\varvec{\beta }}_i^{[N]}\) is the output of the training sample \({{\mathbf {x}}_i}\) (\(i = 1, 2, ..., M\)) after passing through the ELM sparse autoencoder. \(K({\mathbf {z}},{{\mathbf {z}}_i}) = \exp ( - \gamma {\left\| {{\mathbf {z}} - {{\mathbf {z}}_i}} \right\| ^2})\) is the kernel function in the neurons inside the kernel layer. The output weight \({{\varvec{\beta }}^{[N + 1]}}\) is

$$\begin{aligned} {{\varvec{\beta }}^{[N+1]}} = {\left( {\frac{{\mathbf {I}}}{C} + {{{\varvec{\Omega }}}_{train}}} \right) ^{ - 1}}{\mathbf {T}} \end{aligned}$$
(10)

where I is a unit matrix, C is a constant. \({{\varvec{\Omega }}_{train}}\) is the kernel matrix, and the elements in the kernel matrix are \({\left( {{\varOmega _{train}}} \right) _{i,j}} = K({{\mathbf {z}}_i},{{\mathbf {z}}_j})\), (\(i, j = 1, 2, ..., M\)), M is the number of training samples. \({\mathbf {T}} = {[{y_1}, {y_2}, ..., {y_M}]^\mathrm{{T}}}\) is a column vector composed of the regression labels of the training samples.

There is no activation function in the neurons inside the output layer. Finally, the output scalar \(\hat{y} = {{\mathbf {x}}^{[N + 2]}} \in {{\mathbf {R}}^{1 \times 1}}\) is

$$\begin{aligned} {\hat{y}} = {{\mathbf {h}}^{[N + 1]}}{{\varvec{\beta }}^{[N+1]}} \end{aligned}$$
(11)

3 Experiment

3.1 Final Optics Damage Inspection (FODI) for SG-III Facility

We developed the FODI system for damage online inspection. FODI system obtains online images in a vacuum target chamber. As shown in Fig. 3. The distance between the imaging and posture adjustment system (IPAS) and the final optics in FOA is 3.7–5.1 m. There is a FODI camera in the IPAS. Each FOA contains 9 large-aperture final optics. The aperture size of the final optics is 430 mm \( \times \) 430 mm. The resolution of FODI camera is about 110 \({{\upmu }}\)m at 3.7 m working distance, 140 \({{\upmu }}\)m at 5.1 m working distance. The CCD image format is 4872 \(\times \) 3248 pixels with 16 bits, and the pixel size is 7.4 \({{\upmu }}\)m. Since what we concerned about is those LID between 100 \({{\upmu }}\)m and 500 \({{\upmu }}\)m, the FODI online image is a low-resolution image for the LID. The vacuum target chamber is a sphere with a diameter of 6 m, which is connected with 48 FOA. The positioning system move IPAS to the target chamber center, the IPAS adjusts the posture of FODI camera to make it aiming at the inspected optic, only the light source of the inspected optic is turned on, after the online image of inspected optic is captured, data-processing system will use machine learning algorithm to analyze the damage sites in image, the results are stored in the database. Master control system controls the entire process to be executed automatically.

Fig. 3.
figure 3

(a) Structure diagram of FODI system (b) The IPAS in the SG-III laser facility

We mark all candidate sites in the FODI online image using the LASNR algorithm [19], and characterize these candidate sites with a feature vector \(\mathbf{{x}} = [{x^{(1)}},...,{x^{(m)}}]\), the meaning of each attribute \({x^{(i)}}\) is shown in Table 1.

Table 1. The 25 attributes (Attrs) associated with each damage site.

In the following experiments, the training LID samples and testing LID samples are taken from online images acquired by FODI system. After collecting these online images, these inspected optics are removed from the SG-III facility and placed under the microscope. We use the microscope to obtain the labels of these LID samples, such as the types of the LID and the size of the LID. After completing the learning on the training LID samples, if the FODI system still performs well on the testing LID samples, the FODI system can perform online inspection for other unlabeled LID on the final optics.

3.2 Classification of True and False LID

Due to the presence of stray light, a significant amount of noise is present in FODI images in addition to true damage sites, as shown in Fig. 4, which are referred to as false damage sites.

Fig. 4.
figure 4

True and false damage sites in an SG-III FODI online image

These candidate sites can generally be divided into these categories: damage site, hardware reflection (HR), damaged CCD pixels (DC), reflection of a damage site (RD), and attachments (Att) [20]. Damage sites are also called true damage sites or true sites, the others are called false damage sites or false sites. We characterize each candidate site with a feature vector \(\mathbf{{x}} = [{x^{(1)}},...,{x^{(16)}}]\), the meaning of \({x^{(i)}}\) is shown in Table 1. We use “\({y_i} = -1\)” to denote the label of the false site and “\({y_i} = 1\)” to denote the label of the true site. In our training and testing samples, which include true sites and all types of false sites, the damage size range is 50–200 \({{\upmu }}\)m. For comparison, we test the accuracy of the EDT with 12 features (denoted as EDT1) proposed in reference [7] and the EDT with 16 features (denoted as EDT2) proposed in this paper. Lastly, we also provide the classification results obtained using the error backpropagation neural network (BPNN) and support vector machine (SVM) methods in Table 2.

Table 2. Testing results of different classifiers (T: true sites, F: false sites).

Table 2 shows that the testing accuracy rate of the K-ELM is the highest among these classifiers. The training speed of the K-ELM is the fastest of all. The testing speed of the K-ELM is only slightly lower than that of the SVM. Overall, K-ELM has the best performance in terms of practical application.

3.3 Classification of Input and Exit Surface LID

Each true site in FODI online image is characterized by a feature vector \(\mathbf{{x}} = [{x^{(1)}},...,{x^{(16)}}]\). We use “\({y_i} = -1\)” to denote the label of the input surface and “\({y_i} = 1\)” to denote the label of the exit surface. The number of training and testing samples are 1527 and 1466, respectively. There are 635 input surface LID and 892 exit surface LID in the training set, there are 613 input surface LID and 853 exit surface LID in the testing set. The LID size range is 50–1200 \({{\upmu }}\)m. In experiments, the parameters of A-ELM are set as \(\alpha = 0.001\), \(\beta = 0.56\), \(\rho = 0.05\) and \({C} = 4.75\). The performance evaluation between EDT2 and A-ELM are shown in Table 3. Where \(ACC_{train}\) is training accuracy, \(ACC_{test}\) is testing accuracy, std is standard deviation, n is the number of decision trees, \({n_1}\) and \({n_2}\) are the number of neurons in the hidden layer 1 and hidden layer 2 respectively.

Table 3. Performance evaluation between EDT2 and A-ELM.
Fig. 5.
figure 5

The comparison between radiometric method and HK-ELM. (a) The predicted sizes are calculated by the radiometric method. (b) The predicted sizes are calculated by HK-ELM. (c) The MRE of the radiometric method. (d) The MRE of HK-ELM.

The Table 3 shows that, from the point of view of the difference between \(ACC_{train}\) and \(ACC_{test}\), the generalization ability of A-ELM is stronger than that of EDT2. From the point of view of the testing accuracy (\(ACC_{test} \pm std\) and Max \(ACC_{test}\)), A-ELM is about 2% higher than EDT2.

3.4 Size Measurement of LID

In our experiment, a total of 450 samples were randomly selected on the inspected optics to form a data set \(T = \{ ({\mathbf{{x}}_i},{y_i})|{\mathbf{{x}}_i} \in {\mathbf{{R}}^{25}},{y_i} \in R,i = 1,...,P\} \), here, \({y_i}\) is the actual size of the i-th LID, it is measured by the microscope, \({P} = 450\), the size range is 50–750 \({{\upmu }}\)m. We randomly divided the data set T into two parts: the training data set \({T_{train}} = \{ ({\mathbf{{x}}_i},{y_i})|{\mathbf{{x}}_i} \in {\mathbf{{R}}^{25}},{y_i} \in R,i = 1,...,M\} \) and the testing data set \({T_{test}} = \{ ({\mathbf{{x}}_i},{y_i})|{\mathbf{{x}}_i} \in {\mathbf{{R}}^{25}},{y_i} \in R,i = 1,...,N\} \), \({M} = {N} = {P}/2\). The numbers of neurons in the i-th layer are \({n_0} = 25\), \({n_1} = 240\), \({n_2} = 240\), \({n_3} = 500\), \({n_4} = 445\), and \({n_5} = 1\). We choose the tanh function as the activation function in the ELM sparse autoencoder, and we choose the Gaussian kernel function as the kernel function in the kernel layer. All the user-specified parameters are set as follows: \({\lambda ^{[i]}} = 1 \times {10^{ - 3}}\) (\(i = 0, 1, 2, 3\)), \(C = 1\), and \(\sigma = 1.49\). The performance evaluation between radiometric method and HK-ELM on the testing samples are shown in Fig. 5. Here, The radiometric method was proposed by LLNL scientists to calculate the size of LID in the FODI image [21].

Figures 5(a) and (b) shows that, compared with the predicted sizes calculated by HK-ELM, there are larger deviations between the predicted sizes calculated by the radiometric method and the actual sizes. Figures 5(c) and (d) shows that the radiometric method has larger MRE than that of HK-ELM. For the LID smaller than the FODI resolution, HK-ELM can achieve ultra-resolution measurement, which meets the technical requirements for precision measurement of LID size in large-aperture final optics with inhomogeneous illumination.

4 Conclusion

The method based on machine learning proposed in this paper solves the three problems of damage online inspection in large-aperture final optics. The three problems are: classification of true and false LID, classification of input and exit surface LID and size measurement of the LID. The method proposed in this paper is suitable for machine learning on small samples. Therefore, it has important practical significance. For damage online inspection in large-aperture final optics, it is difficult to collect a large number of labeled samples. The experimental results show that the method proposed in this paper has achieved satisfactory results on small samples.