Optimization of the thermophysical properties of the thermal barrier coating materials based on GA-SVR machine learning method: illustrated with ZrO2 doped DyTaO4 system

It is a critical issue to reduce the thermal conductivity and increase the thermal expansion coefficient of ceramic thermal barrier coating (TBC) materials in the course of their utilization. To synthesize samples with different composition and measure their thermal conductivity by the traditional experimental approaches is time-consuming and expensive. Most classic and empirical models work inefficiently and inaccurately when researchers attempt to predict the thermophysical properties of TBC materials. In this research project, we tentatively exploit a Genetic Algorithm-Support Vector Regression (GA-SVR) machine learning model to study the thermophysical properties, illustrated with the potential TBC materials ZrO2 doped DyTaO4, which has resulted in the lowest thermal conductivity in rare earth tantalates RETaO4 system. Meanwhile, we employ statistical parameters of correlation coefficient (R2) and mean square error (MSE) to evaluate the accuracy and reliability of the model. The results reveal that this model has brought about high correlation coefficients of thermal conductivity and thermal expansion coefficient (99.8% and 99.9%, respectively), while the MSE values are 0.00052 and 0.00019, respectively. The doping concentration of ZrO2 was optimized to reach as low as 0.085–0.095, so as to reduce their thermal conductivity further and increase their thermal expansion. This model provides an accurate and reliable option for researchers to design ceramic thermal barrier coating materials.


Introduction
In previous decades, much effort has been devoted to optimizing and fabricating oxide ceramic thermal barrier coatings (TBCs) with low thermal conductivity and suitable thermal expansion coefficient [1][2][3][4]. Currently the 7-8wt% yttria-stabilized zirconia (7)(8) has been applied to high-temperature TBCs. Although the 7-8 YSZ provides numerous advantages, such as low thermal conductivity, high melting point, appropriate thermal expansion coefficients, and chemical inertness [5], the phase transition [6] from the metastable tetragonal phase (t′) to monoclinic (m), will arise when the operating temperature exceeds 1200°C, which accelerates the failure of coating. Thus, it is of great importance to develop new TBCs materials with better thermophysical properties than YSZ at high temperature. Wang et al [7][8][9] have investigated the thermal properties of the rare earth tantalates (RETaO 4 , RE=Nd, Eu, Gd, Dy, Er, Yb, Lu) and discovered that the RETaO 4 ceramics are relatively high in toughness and lower in thermal conductivity than 7-8 YSZ, among which, DyTaO 4  Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. lowest thermal conductivity. Moreover, the thermal conductivity of rare earth tantalates can be reduced further by means of doping, substitution and alloying effects.
Besides the thermal conductivity, thermal expansion coefficient is one of the most significant properties of thermal barrier coatings. However, the traditional experimental method to synthesize samples at different composition and measure their thermal conductivity and thermal expansion takes a large amount of time and money. Theoretical calculation is another method to predict the properties. Giardino et al [10][11][12] used density functional perturbation theory (DFPT) to calculate the phonons to obtain the second-order force, and solved the Boltzmann transport equation (BTE) to predict the thermal conductivities. A third method is to integrate the autocorrelation function of micro heat flux based on Green-Kubo formula [13], by which they employed equilibrium molecular dynamics simulation to obtain the thermal conductivity. However, the high-order scattering process will lead to large fluctuation of autocorrelation function, slow convergence and large numerical noise, which resulted in low accuracy of thermal conductivity calculation. Researchers should take the anharmonicity into account while calculating thermal expansion despite high computational cost. For some semi-empirical models, the parameters are obtained from the fitting process, so the prediction ability is not adequate.
Recently, the prediction of composition and properties based on machine learning become more and more popular, including support vector machine, artificial neural network, random forest, and so on. Wang et al [14] proposed a 3-input and 4-input artificial neural network (ANN) model to predict the thermal conductivity of oxidation-water nanofluids. Niu et al [15] established three thermal conductivity prediction models based on different machine learning methods, support vector machine, random forest and nuclear ridge regression, to acquire descriptors for thermal conductivity. Alade et al [16] constructed a support vector regression (SVR) model by which researchers can predict the enhanced thermal conductivity of metal and metal oxide nanofluids. However, the prediction performance has not reached the optimal results, owing to limitation of the data size. In spite of a variety of interfering factors, we make it possible to produce an optimal solution locally. With a small size of samples and low accuracy, it is easy to be overfit [17]. Support vector regression (SVR) can help to achieve high-dimensional, non-linear prediction based on a small size of samples and overcome the shortcomings of neural network such as falling into local optimum, over-fitting phenomenon and relying on experience in sample processing [18]. The prediction effect of the SVR model depends on the choice of its model parameters (penalty parameter C, kernel function parameter g). When we use the SVR model alone, a problem arises as to blind selection of model parameters during the training process. Therefore, it is necessary to adjust the parameters continuously to select the appropriate parameters. This method takes a long time, and it is difficult to select the optimal parameters. Such a procedure often causes the model to be over-fitted or under-fitted, resulting in insufficient prediction accuracy and generalization. With the good optimization characteristics of genetic algorithm (GA), we are able to select the best model parameters to improve model prediction accuracy and generalization. Taking into consideration the limitation of data acquisition in materials science research, researchers propose a Genetic Algorithm-SVR based (GA-SVR-based) prediction model. They put this model into application to optimize the parameters of the SVR, and establish the thermal conductivity prediction model and thermal expansion coefficient prediction model, by which they investigate the effects of ZrO 2 doping concentration on the thermal conductivity and thermal expansion coefficient of DyTaO 4 .

Brief description of the machine learning techniques
2.1. SVR The framework of SVR was developed by Vapnik in 1995 [19]. It is the update of support vector machines (SVM) on regression problems. Based on statistical learning theory, SVM controls the capacity of the learning machine by maximizing the classification interval to achieve the principle of structural risk minimization. SVM also adopt the kernel function to map the sample from the input space to the high-dimensional feature space to attain high-dimensional promotion in space [20]. Suppose that for a given training data , , n n n 1 1 2 2 a nonlinear mapping is used to map the data to a high-dimensional feature space, thereby transforming the nonlinear regression into a linear problem in the high-dimensional feature space.
In the formula (1), ( ) f x is the nonlinear transformation that maps the sample points to the high-dimensional space; w is the weight vector; b is the threshold.
Introducing non-negative slack variables x x , i i * and considering the fitting error comprehensively, the optimization problem of linear regression estimation can be obtained.
* is the average loss on the training set; C is the penalty factor, factor, which determines the degree of fit of the function and the tolerance for deviations greater than the insensitive loss function e balanced.
By introducing the concept of kernel function. We can know that there is no need to know the specific form of the nonlinear mapping from the low-dimensional input space to the high-dimensional feature space, and the kernel function in the original space is used to implement the calculations to be performed in the highdimensional space, thereby obtaining the corresponding decision function.
Commonly used kernel functions include polynomial kernel function, radial basis kernel function (RBF) and Sigmoid kernel function. Considering that the amount of data selected and the input characteristics are small, we choose fewer parameters and the convenient radial basis function [21], and the expression is: In the formula (5), s is the kernel function parameter. For simple calculation, let g = 1/ s 2 . 2 The prediction effect of the SVR model depends on the choice of its model parameters (penalty factor C, kernel function parameter g, insensitive loss function parameter ε, etc.) [22]. This article mainly optimizes the penalty factor C and the kernel function parameter g of the SVR model.

GA
The Genetic Algorithm (GA) originated from the computer simulation study of biological systems. It is a random global search and optimization method that mimics the development of natural biological evolutionary mechanisms. It is an efficient, parallel and global search method, by which researchers can automatically acquire and accumulate knowledge about the search space during the search process, and adaptively control the search process to obtain the best solution. Good parallelism and robustness are characteristic of this method [23].
The basic steps of the GA algorithm go as follows: (1) Population initialization. Choose a coding scheme and initialize a certain number of individuals to form the randomly-generated population of GA in the solution space.
(2) Population evaluation. Use a heuristic algorithm to generate a nesting map for the individuals in the population (the order of the rectangular pieces) and calculate the fitness function value (utilization rate) of the individuals accordingly, and then keep the best individuals in the current population as the searched best solution.
(3) Select operation. According to the fitness of the individuals in the population, the individuals with high fitness are selected from the current population through roulette or expected value method.
(4) Cross operation. Use a certain probability threshold Pc for the individuals selected in the previous step to control whether to use single-point crossover, multi-point crossover or other crossover methods to generate new crossover individuals.
(5) Mutation operation. A certain probability threshold Pm is used to control whether to perform singlepoint mutation or multi-point mutation on some genes of an individual.
(6) Termination of judgment. If the termination conditions are met, the algorithm is terminated, otherwise the second step is returned.

Description of dataset
The data used in this study were obtained from actual experimental data [24]. The point defect is introduced to DyTaO 4 lattice by alloying with ZrO 2 to synthesize the Dy 1−x Ta 1−x Zr 2x O 4 solid solution ceramics (x=0, 0.015, 0.03, 0.045, 0.06, and 0.075, respectively). As shown in figure 1, the original positions of Dy atoms were doped by Zr atoms. This paper uses the experimentally measured data of thermal conductivity and thermal expansion coefficient to organize the matrix of different components and temperature-corresponding performance to construct the prediction model of thermal conductivity and thermal expansion coefficient. This model is utilized to narrow down the optimal doping concentration range of ZrO 2 that reduces thermal conductivity and increases thermal expansion coefficient (The input data of the prediction model is the concentration ratio of each element, and thus the concentration of Zr is used directly to indicate the concentration of doped ZrO 2 , which is hereby explained). The data comprise a total of 54 and 66 datasets for thermal conductivity and thermal expansion coefficient, respectively. The input and output of the two prediction models are shown in table 1 (available online at stacks.iop.org/MRX/8/125503/mmedia).

Computational methodology
The calculation work in this study is carried out in the computer environment. As highlighted under section 3.1, the datasets used in this study fall into two main categories, namely, thermal conductivity and thermal expansion coefficient, with each of the category contains 54 and 66 datasets respectively. The dataset from each category was partitioned into two portionsthe training set used to train GA-SVR prediction model and the test set used to verify the performance of the model. The training set of thermal conductivity dataset consists of 40 sets, and the test set is comprised of 14 sets. There are 50 sets of data in the training set and 16 sets of data in the test set in the thermal expansion coefficient model. The prediction accuracies of the models were assessed in terms of coefficient of correlation (R 2 ) and mean square error (MSE).

Optimization strategy
The optimization strategy utilized in selecting the optimum parameters for the SVR models is highlighted in figure 2. We use GA to optimize the two parameters of the SVR model, the penalty factor C and the nuclear parameter g of RBF. And we select the optimal parameters to construct the thermal conductivity prediction model and thermal expansion coefficient prediction model. The specific parameter setting of GA search (3) The steps of choosing adopt the method of roulette wheel selection. That is, the greater the fitness of the chromosome, the greater the probability of being selected.
(4) The crossover operation step adopts a method of single-point crossover. That is, a crossover point is randomly selected in the chromosomes, and the two chromosomes that crossover are partially interchanged before and after the point to produce a new individual.
(5) The operation of mutation. Because it is a binary code, the mutation gene is randomly selected. If the gene is 0, the mutation will be 1; otherwise, the mutation will be 0.  good fit with the measured value. As shown in figure 4(a), the correlation coefficient R 2 of the training set is 0.99758, and the Mean square error (MSE) is 0.00052. The closer the value of the R 2 to 1, indicating that the model is equipped with a data set. The smaller the value of the MSE, the smaller the error between the predicted value and the actual value. Therefore, the training model is very suitable for thermal conductivity experimental data. The trained SVR model is used to predict the samples of the test set. As shown in figure 4( 4 ceramics is not only related to its intrinsic structure but also related to the introduction of Zr 4+ ions regarded as the point defects, which can enhance the phonon scattering [25][26][27]. Therefore, it is very meaningful to explore the influence of doped ZrO 2 concentration on the thermal conductivity of DyTaO 4 ceramics. The predicted results show that the lowest thermal conductivity is obtained at Zr concentration of 0.085-0.095 in figure 5. But due to the experiments cost, more optimal concentration is not obtained. Therefore, the trained model is used to predict the Zr concentration range with lower thermal conductivity than the experimental results based on the original experimental data. When the concentration of Zr is 0.09, the thermal conductivity of DyTaO 4 is the lowest, which can be seen in figure 5(a). And in our previous experiments, the conclusion is that the ZrO 2 concentration range with lowest thermal conductivity of  DyTaO 4 ceramics is 0.08 to 0.10 based on the experimental results in figure 5(a). We take a Zr concentration value every 0.005 in the range of Zr concentration of 0-0.15, a total of 248 sets of data, and use the thermal conductivity prediction model to predict the thermal conductivity. As shown in figure 5(b), we draw the cloud diagram using the predicted thermal conductivity corresponding to the Zr concentration and temperature. By analyzing figure 5(b), We can visually see that as the concentration of Zr increases, the thermal conductivity first increases, then decreases, when the Zr concentration is near 0.09, the thermal conductivity is minimized. Figures 5(c)-(e) are three sets predicted values of thermal conductivity. As shown in figure 5(c), In the temperature range of 100°C-900°C, the thermal conductivity of the Zr concentration of 0.08 and 0.10 is higher than 0.09., which is consistent with the conclusion of the original experiments. Figure 5(d) present that the thermal conductivity for the ZrO 2 concentration of 0.085 and 0.095 is higher than 0.09. Figure 5

Prediction model of thermal expansion coefficient
As shown in figure 6, after 18 iterations, it basically reaches the optimum, and it terminates after 100 iterations. After GA optimization, C=98.2598, g = 0.6376, and the mean square error (MSE) is 0.0026. Substitute the optimized parameters into the GA-SVR to train the model. Figure 6 shows the prediction model of the coefficient of thermal expansion. Figure 7(a) is the prediction result of the training set, and figure 7(b) is the prediction result of the test set. As shown in figure 6, the correlation coefficients R 2 of the training set and the test set are 0.99907 and 0.99566, which are close to 1, and the mean square error (MSE) is 0.00019 and 0.0013, respectively. It shows that the trained thermal expansion coefficient prediction model has good prediction accuracy, and the model fits the data well.
We use the trained model to predict the values of the thermal expansion coefficient corresponding to different concentrations of Zr, which are used to analyze the effect of Zr doping on the thermal expansion coefficient to explore the best ZrO 2 concentration range for highest thermal expansion coefficient of DyTaO 4 ceramics in the range of 200 ℃-1200 ℃. The predicted results show that the highest thermal expansion is obtained at Zr concentration of 0.11-0.13 in figure 8.
The experimental results of thermal expansion are shown in figure 8(a). According to the experimental data, when the concentration of Zr is 0.12, the thermal expansion coefficient is the largest. In the range of Zr concentration of 0-0.15, a Zr concentration value is taken every 0.005, a total of 341 sets of data, and the thermal expansion coefficient prediction model is used to predict the thermal expansion coefficient. As shown in figure 8(b), we draw the cloud diagram using the predicted thermal expansion coefficient corresponding to the

The interaction of thermal conductivity and thermal expansion
In order for the ceramic thermal barrier coating to be well combined with the metal substrate to have better thermal insulation performance, the ceramic thermal barrier coating must have low thermal conductivity and a suitable thermal expansion coefficient (we generally require a high thermal expansion coefficient and metal bonding match the layers). In response to this feature, we used the previous prediction model to do two exploratory experiments. Experiment one: We find the best ZrO 2 concentration range for reducing the thermal conductivity of DyTaO 4 ceramics within the known optimal ZrO 2 concentration range for increasing the thermal expansion coefficient of DyTaO 4 ceramics from 0.10 to 0.14; Experiment two, on the basis of reducing the best ZrO 2 concentration range of 0.085-0.095 for reducing the thermal conductivity of DyTaO 4 ceramics, search for the best ZrO 2 concentration range for increasing the thermal expansion coefficient of DyTaO 4 ceramics.  were selected as input. The model predictive thermal conductivity is then predicted with a well-trained thermal expansion coefficient. On the basis of low thermal conductivity, look for the ZrO 2 concentration that maximizes the thermal expansion coefficient of DyTaO 4 ceramics. As shown in figure 9, the curves of 0.085, 0.09 and 0.095 are the predicted values of thermal expansion coefficient, and when the doping concentration of Zr is 0.095, the thermal expansion coefficient is the highest. The analysis shows that in the optimal ZrO 2 concentration range for reducing the thermal conductivity of DyTaO 4 ceramics, when the Zr concentration is 0.095, DyTaO 4 ceramics have the largest thermal expansion coefficient.

Optimization of the thermal expansion coefficient at the Zr concentration range with the lowest thermal conductivity
In the second experiment, within the optimal ZrO 2 concentration range for increasing the thermal expansion coefficient of DyTaO 4 ceramics, the components corresponding to the five Zr concentration values of 0.11, 0.115, 0.12, 0.125, and 0.13 are selected as input, and the trained thermal conductivity prediction model was used to predict the thermal conductivity. Looking for ZrO 2 doping concentrations with low thermal conductivity of DyTaO 4 ceramics, as shown in figure 10, we find that the curve corresponding to the concentration of 0.11, 0.115, 0.12, 0.125, and 0.13 are the predicted values of thermal conductivity. The analysis indicates that in the optimal ZrO 2 concentration range for increasing the thermal expansion coefficient of DyTaO 4 ceramics, when the concentration of Zr is 0.11, DyTaO 4 ceramics has the lowest thermal conductivity.

Conclusion
To sum up, the GA-SVR model turned out to be highly efficient and accurate when researchers adopt it in the process of predicting the thermal conductivity and thermal expansion coefficient of Dy 1−x Ta 1−x Zr 2x O 4 . It is of great significance to explore the effects of doped ZrO 2 concentration on the thermophysical properties of DyTaO 4 ceramics. And the GA-SVR thermal conductivity and thermal expansion coefficient prediction model is established, and the experimental results are reliable and accurate.
Consequently, we have drawn the following conclusions: (1) In the range of 100 ℃-800 ℃, the ZrO 2 concentration range with lowest thermal conductivity of DyTaO 4 ceramics is reduced from 0.085 to 0.095.
(2) In the range of 200 ℃-1200 ℃, the ZrO 2 concentration range with highest thermal expansion coefficient of DyTaO 4 ceramics can increase from 0.11 to 0.13.