Machine-Learning-Assisted the Design of Resin Matrix Composites Coating with Ablation Resistance

Traditional experimental methods always cost a lot but produce little when designing and developing new kinds of materials, especially for coating materials. However, with the assistance of machine learning, it is possible to predict the performance of a specific coating without preparation or simulation, which makes the design of material more efficient. In this study, machine learning was introduced to assist the design of resin matrix composites coating with ablation resistance. A structured method for engineering data in the material field was approved. Based on this method, the data from laboratory records and Lange’s Chemistry Handbook were collated into one operational database. All the 190 sets of data were used to train the artificial neural network (ANN) regression model to predict the back-surface temperature of the substrate for specific coating under given ablation condition. The mean absolutely percentage error (MAPE) of the model is 7%. Concerning the characteristics of the material database, a feature engineering method, which combines the Pearson correlation coefficient and random forest (RF) algorithm was performed to identify the main controlling factors of the service performance of the coatings.


Introduction
Material is an important basis for the development of human society. Each major breakthrough in industry and manufacture is inseparable from the renewal of materials. As an applied science integrating physics, chemistry, and metallurgy, the development of material has relied heavily on a large number of experiments and experience summary. With the development of life sciences, environmental sciences, aerospace, chemical engineering and materials science itself, the demand for new advanced material is unprecedented urgency [1]. The traditional trial and error method, which repeats the process of preparation, testing, and analysis, always takes lots of time and cannot meet the requirements of modern society. Comparatively, computer-assisted the design of new material is more efficient and it is also more theoretically predictive, which is an effective supplement to the traditional method.
Computer-assisted the design of new materials can be achieved in different methods. Since the 20th century, the First-Principles, Density Functional Theory, Molecular Dynamics [2][3][4], and other theories  2 have been greatly developed, which can simulate the relationship between the composition, structure and properties of materials at different scales. However, this method still has great difficulties in crossscale design.
Another way, which is also one of the latest and most potent ways, is artificial intelligence. It combines database technology with machine learning algorithms (such as Artificial Neural Network, Support Vector Machine, etc.) to optimize material preparation parameters and properties by using the data mining method [5,6]. Up to now, machine learning methods have gained lots of achievements in the field of material design. Oliynyk et al. trained a model for identifying Heusler and non-Heusler alloys by machine learning, with a high accuracy in 0.94, which is faster and more accurate than human experts [7]. Raccuglia et al. used machine learning methods to train a model to predict the experimental results of vanadium selenite crystallization in MOFs material synthesis [8]. The experimental results show that the success rate of predicting the product formation conditions using machine learning models is as high as 89%, which is far superior to the traditional synthetic strategy. Segler et al. used the Monte Carlo tree search and inverse synthesis analysis to increase the synthesis rate of organic compounds to 30 times of the traditional route [9].
Resin matrix composites coating is an important functional coating. Recent research shows that this kind of coating has a broad application prospect in the field of thermal protection. It is prepared by using a high char yield-based organic material as a binder and fillers such as metals and ceramics. When in high-temperature environment, the coating consumes heat by the physical and chemical changes of its components [10][11][12][13][14].However, the design of resin matrix composites coating following traditional methods experiment the combination of each component constantly, this process is extremely complicated and resource-intensive. Improvement should be made to find the optimal composition ratio in a more efficient way. To authors' best knowledge, machine learning assisted the design of functional coating has been never seen yet. In this paper, data structured method, which specific to the material design field was studied. The applicability of machine learning algorithm on the database of materials science is analyzed through validation experiments. The prior knowledge of material science is introduced into the optimization of the prediction model gained from machine learning.

Data preparation
In the field of machine learning, the quality of data determines the upper limit of the model accuracy [15]. On the other hand, the materials data obtained from engineering practice come from various sources and in different formats. Therefore, it is very significant to design features reasonable and integrate all the data into one standardized database.
The ablation condition and the back-surface temperature of the substrate are always used as the main parameters to characterize the service performance for the thermal protection application of resin matrix composites coating. Besides, parameters used to describe the energy consumption ways during ablation were also chosen as features.(See Figure 1)  During the laser ablation, laser keeps heating up materials due to opto-thermal transformation and could result in the failure of the material. The resin matrix composites coating could effectively protect the substrate because of the following designed function. The organic components decompose and form a residual char layer, which can covers the surface of the substrate and helps insulate heat. The inorganic components in the coating can not only insulate heat but also take away heat energy by its physical changes when the temperature of reaches the critical point. Components in the coating may also undergo chemical reactions during ablation, which bring away heat energy and generate highmelting-point phase. The new high-melting-phase covering the substrate's surface, insulates heat furtherly. According to the process happening during laser ablation, the following features and target listed in Table 1 and Table 2 are selected, where the most left letter in remark represents data type, c means continuous, l means logistic data. The middle letter represents data resource, e means collected from experiment while h means from Lange's Handbook of Chemistry. The most right represents the description mode, A means using the maximum, minimum arithmetic and geometric means to describe the feature. For A+, mass weighted mean is also used.  (1) where and b are learning objectives learned by minimizing the loss function: Although the linear model has advantages of strong interpretability and convenient operation, it is easy to result in over-fitting in practical applications. Introducing regularization into the loss function can alleviate this problem. When the penalty term takes the L1-norm, it is Lasso regression. When the penalty term takes the L2-norm, it is Ridge regression [16]: From the prospect of Bayesian [17], the target is not considered as a single value to be estimated but is assumed to be extracted from a normal distribution. At the same time, the parameter ω of the linear regression model is also derived from the probability distribution. Based on this idea, a Bayesian regression model was developed for determining the posterior distribution of model parameters.
In addition, researchers developed the k-Nearest Neighbor algorithm (KNN) by referring to the idea that "Birds of a feather flock together". In this algorithm, the mean target value of k points closest to the unknown point used as the predict target value of that point. kis the learning objective [18].

Decision tree.
Comparing with other algorithms, decision tree (DT) regression model is the only one that can simulate the human decision-making process and provides a set of if-then decision law. It divides the feature space into several units and takes the average of the target value of all samples in each unit as the output value of the unit [20]. For new data, as long as it is attributed to a unit according to the features, the corresponding output value is obtained. The partitioning of the feature space is based on minimizing the squared error between the real target value and the predict target value of all known samples in each unit:  (Figure 2). It is iteratively updated by back propagation and gradient descent method. Beneficial from the existence of hidden layers, ANN can regress any linearity theoretically, especially nonlinear relationships with arbitrary precision [19].

Model Evaluation
The ideal regression model is expected can not only regress known features and its target, but also predict the unknown target for known features. Better generalization ability is required. Therefore, machine learning generally divides the whole data set into two parts, one for training the model while another for testing. The error of the model on the testing set is taken as the evaluation result. In this study, MAPE is taken as the loss function.         8 and the prediction MAPE in the testing set decreased. However, after 500 epoch, the prediction MAPE in the training set shows an upward trend, that is, the ANN model learned the change pattern of the data itself instead of the potential rules of material science. Regularization was performed to solve this problem, which means make the neurons in each layer stop working in a certain proportion during training. Through this operation, it can prevent the model from over-fitting due to over-reliance on local features [24].As shown in Figure 6(b), the MAPE in the testing set was decreased to 7%.

Main controlling factors identification
Main controlling factors identification result based on CIFS method is shown in Table 4. Associated with this decision pattern of human, when facing complex problems, the method of controlling variables always be adopted. The rank of main controlling factors can be referred when design the resin matrix composites coating with specific service performance. Considering the 10 dimensions main controlling factors instead on whole feature space is much more convenience in decoupling, which make the precise design of coating composition possible.

Conclusion
A novel method with engineering significance for structurization of the database was approved. The ablation condition and macroscopic properties according to energy consumption ways of the resin matrix composites were selected as the features, and the back-surface temperature of the substrate after ablation was taken as the target. The MAPE for prediction was controlled as low as 7% by Regularized back propagation ANN. Eventually, 10 dimensions main controlling factors were screened and ranked by CIFS method. The total mass ratio of organic components was evaluated as the most significant parameter for the ablation resistance coating. The selected features could be really helpful for the further new kinds of coating design.