Prediction of mechanical properties of Mg-rare earth alloys by machine learning

In this work, the quantitative relationship among the composition, processing history and mechanical properties of Magnesium-rare earth alloys was established by machine learning (ML). Based on support vector regression (SVR) algorithm, ML models were established with inputs of 310 sets of data, which can predict ultimate tensile strength (UTS), yield strength (YS) and elongation (EL) with well accuracy. In order to verify the general applicability of our model, new data were collected from the literature, and the ML models was used to predict their mechanical properties respectively. The MAPE of UTS, YS and EL predicted by SVR model are 9%, 12% and 36%, respectively. The reasons for the deviation of the predicted results were also analyzed. The effects of rare earth elements on UTS, YS and EL were analyzed by the SVR models. The established ML model was used to recommend the composition and processing history of new Magnesium-rare earth alloys with high mechanical properties.


Introduction
Magnesium (Mg) alloys are very attractive structural materials for the defense, aerospace and automotive industries because of their low density, good casting properties and high specific strength. The mechanical properties of Mg alloys are closely related to their composition and processing technology. The addition of rare earth elements to Mg alloy can refine the microstructure and improve the mechanical properties. Mg-rare earth alloys have attracted more and more attention because of their excellent mechanical properties. The rare earth elements commonly used to improve the mechanical properties of Mg alloys include Gd, Y, Nd and Ce. Rare earth elements Y and Gd can significantly improve the mechanical properties of Mg alloys. Gao et al [1] showed that Y and Gd had a better effect on strengthening yield strength (YS) than Al and Zn. Arrabal et al [2] showed that Nd can change the fibrous structure of AZ91 Mg alloy, refine the b phase. At the same time, from the point of view of corrosion, the best Nd content is 0.5%-20%. Ce has low solid solubility. And the yield strength and ultimate tensile strength (UTS) of Mg alloy increase with the increase of Ce content. However, too much Ce will affect the ductility of the alloy [3]. The addition of rare earth elements greatly improves the strength of Mg alloy, but also reduces the plasticity of the alloy. Therefore, it is an important development direction for Mg alloys to improve the plasticity without changing the strength of the alloy.
In the past, Mg alloys with excellent mechanical properties were obtained by conducting a lot of experiments. This process costs a lot of time and money and the results are often worse than expected. Machine learning (ML) is the scheme to change this situation, and more and more researchers are exploring the possibility of using ML to accelerate the development of new products, Examples include medical imaging [4] and prediction and design of new materials [5][6][7][8][9][10]. Recently, researchers have used ML in their studies, such as exploring the corrosion rate [11,12], and YS [13], predicting and designing superalloys [14,15], and predict the phase formation [16] and elastic properties [17] of alloys. Konstantin et al [18] proposed a material prediction method using ML interatomic potentials to approximate quantum mechanical energy and an active learning algorithm to automatically select the optimal training data set, which significantly reduces the amount of Density Functional Theory calculations required. Wu et al [19] used artificial neural networks to recommend new titanium alloys with Young's modulus below 50MPa. Liu et al [20] used ML methods to recommend novel Mg alloys with high hardness. Gao et al [21] used ML methods to composition design of Al-Zn-Mg-Cu alloys optimizing corrosion cracking resistance. Li et al [22] used ML methods to optimize the composition of Al-Zn-Mg-Cu alloys, which recommended an Al-Zn-Mg-Cu alloy with high UTS of 952 MPa and elongation (EL) of 6.3%.
In this study, the ML model was used to predict the UTS, YS and EL of Mg-rare earth alloy. The dataset contains 310 sets of data, related to Mg-rare earth alloys. A total of 12 attribute values are used to describe its composition and processing technology. In this study, it was found that the support vector regression (SVR) model had the best prediction effect on UTS, YS and EI. In the process of training, the data sets were divided into training set and test set. In order to better evaluate the performance of the model, new data were collected to evaluate the performance of the model by comparing it with the predicted value of the model. The effects of rare earth elements on mechanical properties of rare earth Mg alloy were analyzed by using the established models. Then the established models were used to recommend the composition and processing technology of new Mgrare earth alloys with high mechanical properties.

Data collection and analysis
The mechanical properties of Mg-rare earth alloy are not only related to the type and content of rare earth in the alloy, but also closely related to the processing history. The elements considered in our study are Zn, Gd, Y, Nd, Ce and Zr. At first we focused on the type and content of rare earth elements in the alloy when studying the influence factors of mechanical properties. But it turned out to be a poor prediction. After further study on the influencing factors affecting the mechanical properties of Mg alloy, extrusion ratio, extrusion time, solution temperature, solution time, aged temperature and aged time were taken into account. Initially, a large amount of literature was collected, but the alloys collected varied in their rare earth elements and the processing history record was not complete. In order to make the trained neural network more reliable, the principle of data selection is set as follows.
(1) The data provided by the author is complete, excluding the data with incomplete input and output values.
(2) If an alloy does not contain all components, the value of the missing element component is set to 0.
Extrusion, solution and aged are optional process technology, and if a data does not refer to these processes, the corresponding attribute is zero. This actually makes sense because an attribute with a value of zero does not affect the activation value of the next layer. (3) Alloys that were not manufactured and processed by special methods, such as severe plastic deformation, rapid solidification and powder metallurgy, spray forming, and multistage heat treatments, were excluded from the data set.
Based on the above principles, 310 sets of data were collected from 24  literatures, all of which were related to Mg-rare earth alloys, and the processing history were carefully recorded. Complete training Data is provided in the Data Statement.
In this study, the input layer of the machine learning model contains 12 variables that describe the components and processing processes of each data item. The output layer contains three variables: UTS, YS, and EL. Table 1 shows the variation ranges of 12 input variables and three output variables. The content percentage of Zn mainly distributed between 1% and 6%. The content percentage of Gd mainly ranges from 1% to 3% and from 8% to 12%. The content percentage of Y element is mainly between 1% and 6%. The content percentage of Nd element is mainly between 1.5% and 2.5%. The content percentage of Zr element is mainly distributed around 0.5%, and the content percentage of Ce element is mainly between 1% and 2%.
The evaluation of how a single attribute affects the UTS, YS and EL were carried out by calculating the Pearson coefficient, and the processing heat map is plotted, as shown in figure 2. The darker the color of the square, the higher the correlation between the value of the attribute and the output. Some conclusions can be drawn from figure 2. For example, the content of Zn element has a great influence on UTS and YS, but its influence on EL is very weak. Out of the 12 inputs, extrusion was the most effective method to increase EL.

Machine learning algorithms and model training
Machine learning includes many kinds of algorithms, such as SVR, Extreme Learning Machine (ELM), Elman neural network, Radical Basis Function (RBF), Back Propagation (BP) and so on. Not all algorithms are suitable for the discovery of new materials. Looking for the appropriate algorithm to solve the real problem has become the current problem to be solved. Twelve descriptors with the strongest correlation with the mechanical properties of Mg-rare earth alloys were selected as input variables to train ML models for predicting UTS, YS and EL, respectively. Table 1 describes the input and output variables in detail. The generalization ability of different models was compared by 10-fold cross validation, and the optimal model was selected. In order to avoid the order of magnitude difference between the collected data of each dimension, resulting in inaccurate prediction results. The data must be normalized, and the normalization process is to change all the data into numbers between (−1, 1). This is also the commonly used data preprocessing method before using machine learning method to predict. The equation of normalization treatment is as follows: In equation (1), y is the return value, max represents the maximum, min represents the minimum. Support Vector regression (SVR) is a nonlinear model extended from support vector Machine model [47]. SVR can effectively solve nonlinear problems by mapping input samples from low-dimensional space to high-dimensional space. By introducing slack variables into the objective function, the structural risk is minimized, the generalization ability is enhanced, and SVR has advantages in small size and high dimensional data sets.
Optimizations hyperparameter of c and g are important for SVR model training. In this paper, genetic algorithm is used to optimize hyperparameter of c and g of SVR model. By comparing the error of the test set when different combinations of hyperparameter of c and g are set, the optimal combination of hyperparameter of c and g is selected, as shown in table 2.

Comparing and selecting models
The root mean square error (RMSE), mean absolute percentage error (MAPE) and R-square (R 2 ) are commonly used to evaluate the quality of machine learning models. RMSE measures the absolute magnitude of deviation from the predicted value, and MAPE measures the relative magnitude of deviation. In equations (3) and (4), n is the total sample amount, y i is the experimental value, and y is the regression fitting value. The data set was divided into 10 groups, and the generalization ability of the five models was evaluated by 10fold cross-validation. In the 10-fold cross-validation of each model, 9 groups of samples were used as training data for model training, and one group of samples was used as test data for model checking. The overall performance of each model is the average of all 10 iterations. Figure 3 shows RMSE, MAPE and R 2 of the test set when the ML model predicted UTS(a), YS(b) and EL(c). The lower the absolute values of RMSE and MAPE and the higher of the R 2 , the better the prediction performance of the ML model.    Figure 4(a) is the UTS predicted by the SVR model and figure 4(b) is the YS predicted by the SVR model. It can be seen from the figure that the fitting results of the SVR model to predict the UTS and YS almost coincide with the y=x line. Surprisingly, it is not only the training set that has ideal fitting results, but also the test set that has ideal fitting results. Our model can predict the YS and UTS of Mg-rare earth alloy with high accuracy, which is a good guide for the developers of Mg-rare earth alloy. Figure 4(c) is the predicted EL of SVR model. However, the fitting results of the EL, although roughly linear, do not fit the line y=x. The reason for this may be that we did not find the most important factor affecting EL. In future work, more factors must be considered in order to better predict EL.

Reliability and general applicability of the models
In the previous discussion, the test sets we selected were all randomly selected from the total data set. Although the test sets did not participate in the training of the model, they were all from the same literature. There are many factors that affect the mechanical properties of the alloy during the fabrication process. Different experimental environments and fabrication methods will affect the mechanical properties of the alloy. In order to test the generality of the model, 8 new sets of data were found in three literatures [48][49][50], The composition and processing history of the alloy for the new data set are shown in table 3. The selection of new data set is also the focus of research on the selection of Mg-Zn and Mg-Gd alloys, and the more common ternary, four-element and five-element alloys. The new data does not duplicate with our previous data.   different components and processing history, the SVR model has realized the accurate prediction of UTS and YS for the new experimental data. As can be seen from figure 5(c), when the SVR model is used to predict the EL of the new experimental data, the predicted values of two groups of data differ greatly from the experimental values. Through analysis and comparison, the following two reasons are summarized. First, the larger the amount of training data, the more accurate the predicted value. A large part of this error is due to a lack of training data. Secondly, the machine learning model has a better prediction effect on the data close to the training set, but a poor prediction effect on the unknown region far away from the training set.

Element analysis and composition optimization affecting mechanical properties of alloys
The influence of rare earth elements on UTS, YS and EL of the alloy was analyzed by SVR model. To verify the influence of each element, we change the content of one element each time and the contents of the other elements are kept at their average values. Thousands of sets of data were output by SVR model to fit the curve. Because the solid solubility of different elements in Mg alloy is different, the variation range of elements is within the range of collected data, which are Zn: 0%-8%, Gd: 0%-15%, Zr: 0%-1%, Y: 0%-15%, Nd: 0%-3%, Ce:0%-2.1%.
It can be seen from figure 6 that appropriate addition of Y and Gd can increase UTS and YS of rare earth magnesium alloy, but excessive addition will reduce UTS and YS. Appropriate addition of Gd, Y and Ce can increase EL of Mg-rare earth alloys. The most suitable rare earth elements are present. As can be seen from figure 6(a), the highest UTS can be achieved at the contents of Zn (2.4%), Gd (8.3%), Zr (0.2%), Y (6.7%) Nd (0.32%) and Ce (0.4%), respectively. As can be seen from figure 8(b), the highest YS can be found at the contents of Zn (7.25%), Gd (6.6%), Zr (0.78%), Y (7.63%) Nd (1.17%) and Ce (1.77%). As is known, the mechanical properties of Mg-rare earth alloy are not determined by a single element and different elements may have cross effects, so it is necessary to study the influence of two or more elements simultaneously. The three elements with high content are selected, that are, Zn, Gd and Y. As shown in figure 7, the variation rules of UTS, YS and EL of Mg-rare earth alloy were studied when two of the three elements, Zn, Gd and Y, changed. The variation ranges of Zn, Gd and Y elements are 0%-8%, 0%-12% and 0%-8%, respectively. It can be seen from figure 7 that the trained SVR model can well reflect the influence of changes of rare earth elements on the mechanical properties of Mg-rare earth alloy. The addition of rare earth elements can significantly improve the UTS and YS of Mg alloy. There is an optimal ratio of the two elements, as shown in figure 6. It can be seen from figure 7 that with the gradual increase of rare earth elements Gd and Y, the UTS and YS increase gradually. The increase of Zn element will decrease the EL, but is beneficial to the increase of YS.
3.4. Using the model to recommend Mg-rare earth alloys with high mechanical properties While the ML model which can accurately predict the mechanical properties of Mg-rare earth alloys is established, the genetic algorithm is used to find the new Mg-rare earth alloy with higher mechanical properties by the trained SVR model. In order to make the recommended alloys more reasonable, boundary conditions were set for the genetic algorithm model, in which the contents of rare earth elements and processing values of all the recommended alloys did not exceed the range of training data. Considering the solid solubility and economic cost of elements in magnesium alloy. Genetic algorithm can be used in constrained conditions (Zn+Y+Gd)<20 wt%, search for thousands of alloys in the SVR model, and find the best alloy. The recommended new alloys are compared with the alloy data in the database, as shown in figure 8 and figure 9. Moreover, the composition and processing technology of the new alloy and the alloy in the database with higher mechanical properties are listed in detail, as shown in tables. Figure 9 shows the YS and EL values of all alloys in the data set. It can be found that EL gradually decreases with the increase of YS. YS and EL cannot be high at the same time, so it makes sense to increase YS while making EL as large as possible. Figure 9 recommends three new alloys that have higher EL than alloys with the same YS, and their composition and processing history are shown in table 7. In all the data of the training machine learning model, the maximum value of UTS is 542MPa. Using SVR model, three groups of Mg-rare earth alloys with UTS higher than 542MPa were recommended and their EL higher than 8.5% was guaranteed. The SVR model for predicting EL is used to recommend three new groups of alloys, in which EL is higher than 16%. In the collected data of Mg-rare earth alloys with high mechanical properties, Zn, Gd and Y are the most added elements. Figure 8 shows the position relationship between the new alloys and the collected alloys, where the size of the dot represents the numerical value of the mechanical properties of the alloy. The table 4 shows the composition and processing history of new alloys with high UTS recommended by the ML model. As can be seen from figure 8(a), the new alloys with higher UTS contains higher Zn, Y and Gd contents. It can be seen from figure 8(b) that the Zn+Y+Gd content of the newly recommended alloys with higher UTS does not exceed the maximum value of the training data, which is reliable. The table 5 shows the composition and processing history of new alloys with high YS recommended by the ML model. As can be seen from figure 8(c), the SVR model found a better composition combination of Zn, Y and Gd, thus finding a new alloy with higher YS. As can be seen from figure 8(d), the content of Zn+Y+Gd in the new alloy with higher YS recommended by the SVR model is less than the maximum value of the training data, so it is realized to use less rare earth elements and find a better new alloy. The table 6 shows the composition and processing history of new alloys with high EL recommended by the ML model. As shown in figure 8(e), an alloy with higher EL can be found by adding a small amount of rare earth elements, while too much rare earth elements will decrease EL. As shown in figure 8(f), the content of Zn+Y+Gd in the recommended new alloys with higher EL is not high. The alloys recommended are new alloys that have been found in a new space and that have not been tested before but are not far from the data range of previous experiments.

Conclusion
In this study, a machine learning-based model was established to predict the mechanical properties of Mgrare earth alloys taking into account of their composition and processing history. 310 sets of data were collected from published references and 280 of them were used as the training set and 30 of them as the test set. By comparing MAPE and RMSE of the test set, SVR model was selected to predict the mechanical properties of the alloy. To further test the predictive power of the model, 8 sets of new data were collected from the literature and used the established model to predict its mechanical properties. The results of mechanical properties are well reproduced by the model. The MAPE of UTS, YS and EL predicted by SVR model are 9%, 12% and 36%, respectively. The influence of rare earth elements on mechanical properties of Mg-rare earth alloy was analyzed by SVR model, and the relationship between rare earth elements and mechanical properties was described. Finally, the established model is used to recommend new Mg-rare earth alloys with high mechanical properties. Some kinds of Mg-rare earth alloys with relatively high UTS, YS and EL are recommended.   The experimental data of alloy was taken from [35]. b The experimental data of alloy was taken from [40].  Mg-12Gd-0.5Zr-3Y 0 0 0 0 230 4 432.9 420.9 5.4 a The experimental data of alloy was taken from [35]. b The experimental data of alloy was taken from [40].