A Machine Learning-Based Model for Predicting Atmospheric Corrosion Rate of Carbon Steel

+e purpose of this study is to develop a practical artificial neural network (ANN)model for predicting the atmospheric corrosion rate of carbon steel. A set of 240 data samples, which are collected from the experimental results of atmospheric corrosion in tropical climate conditions, are utilized to develop the ANN model. Accordingly, seven meteorological and chemical factors of corrosion, namely, the average temperature, the average relative humidity, the total rainfall, the time of wetness, the hours of sunshine, the average chloride ion concentration, and the average sulfur dioxide deposition rate, are used as input variables for the ANN model. Meanwhile, the atmospheric corrosion rate of carbon steel is considered as the output variable. An optimal ANN model with a high coefficient of determination of 0.999 and a small root mean square error of 0.281mg/m.month is retained to predict the corrosion rate. Moreover, the sensitivity analysis shows that the rainfall and hours of sunshine are the most influential parameters on predicting the atmospheric corrosion rate, whereas the average chloride ion concentration, the average temperature, and the time of wetness are less sensitive to the atmospheric corrosion rate. An ANN-based formula, which accommodates all input parameters, is thereafter proposed to estimate the atmospheric corrosion rate of carbon steel. Finally, a graphical user interface is developed for calculating the atmospheric corrosion rate of carbon steel in tropical climate conditions.


Introduction
Atmospheric corrosion is considered as an electrochemical nonlinear and complex phenomenon, which is mostly depending on external factors and material properties. It is a challenge to evaluate the influence of these parameters on the degradation of materials, specifically, for structures exposed to various climatic conditions.
In fact, data of the atmospheric corrosion can be obtained properly based on realistic measurements. Nevertheless, there is still some problem related to the mechanism of the atmospheric corrosion and the effects of environmental parameters on this phenomenon. Among those, the potential interaction between the pollutants and the meteorological parameters is the one of critical issues. Closer looking into these problems would be very useful and provide a better understanding of the atmospheric corrosion process.
In the last few decades, atmospheric corrosion has been an interesting topic for researchers around the world. Kallias et al. [1] proposed a deterioration modeling and performed the assessment of metallic bridges affected by atmospheric corrosion. Several studies investigated the atmospheric corrosion process of metals considering multiple environmental factors [2][3][4]. It was demonstrated that the presence of atmospheric pollutants sulfur dioxide in urban and industrial atmospheres and chloride concentration in marine atmospheres affected the corrosion rate of metal significantly. e effects of relative humidity on the atmospheric corrosion were evaluated in some studies [5][6][7][8][9]; meanwhile, the influence of temperature on the atmospheric corrosion was demonstrated in the work of Kong et al. [10]. ey showed that the corrosion rate of materials was increased as a function of temperature and relative humidity. A multiscale model for predicting atmospheric corrosion was proposed by Cole et al. [9], in which Australian conditions and marine aerosols were considered. Besides, the effects of rainfall on the atmospheric corrosion rate were investigated by several studies [11,12]. However, several studies pointed out that the chloride ions (Cl − ) coming from the sea and sulfur dioxide (SO 2 ) are the most important atmospheric corrosive agents [13][14][15][16].
A prediction of the atmospheric corrosion accounting for exposing time, relative humidity, temperature, time of wetness, and pollutant concentration was proposed by Tidblad [17]. e quantitative relationships of environmental factors on the corrosion process were presented using the basic linear model [18,19], the basic log-linear model [19][20][21], and dose-response functions [22][23][24]. Empirical equations to calculate the atmospheric corrosion rate were also proposed by some studies [19,21,25]. However, these equations only considered few input parameters, which are sulfur dioxide deposition rate, chloride, and time of wetness. Also, the atmospheric corrosion is controlled by various external factors of corrosion and pollution parameters such as humidity, temperature, and pollutants. Additionally, those atmospheric corrosion models are only valid for specific local geographical conditions. As the local geographical condition changed, such corrosion models are no longer applicable. erefore, a sufficient model, which can cover various environmental factors, is still needed for predicting the atmospheric corrosion rate of carbon steel.
Artificial intelligence (AI) models have been commonly utilized in predicting corrosion behaviors of steel structures. Seghier et al. [26] estimated the maximum pitting corrosion depth in oil and gas pipelines using support vector regression (SVR) combined with optimization techniques such as Genetic Algorithm and particle swarm optimization and firefly algorithm (FFA). ey demonstrated that the SVR-FFA model had superior performance compared with other considered models. In the study of Seghier et al. [27], the authors applied various data-driven models, which are artificial neural networks (ANNs), M5 tree, multivariate adaptive regression splines, locally weighted polynomials, kriging, and extreme learning machines, for calculating the maximum pitting corrosion depth of pipelines. Recently, a hybrid soft computing model, namely, multilayer perceptron-marine predators algorithm, was proposed for predicting the corrosion rate in the suspension bridge cables [28]. Diao et al. [29] developed corrosion rate models for low-alloy steels using the random forest and gradient boosting decision tree algorithms.
ANNs, an example of the most powerful algorithms of machine learning models, have been widely applied for metal sciences and atmospheric corrosion fields due to their advantages [30][31][32][33][34]. ose typical benefits are as follows: (1) ANN is a nonlinear model, which is easy to use and understand compared to statistical methods, and (2) ANNs allow the modeling of physical phenomena in complex systems without requiring explicit mathematical representations. eir findings showed that neural network models had a reliable prediction with a small error and a large coefficient of determination value (R 2 ). e ANN models were effective for various investigations such as thickness prediction of sherardizing coating [31], corrosion of metals in equatorial climate [33], and corrosion of copper in Valparaiso (Chile) [34]. Particularly, ANN was used for predicting the penetration of corrosion or the corrosion rate of carbon steel considering input parameters such as humidity, temperature, time of wetness, precipitation, sulfur dioxide concentration (SO 2 ), and chloride deposition rate (Cl − ) [30,35]. It was demonstrated that ANN models predicted the corrosion rate accurately with R 2 values of 0.90 in the work of Pintos et al. [35] and 0.998 in the work of Díaz and López [30]. It should be noted that the database used in the study of Pintos et al. [35] was measured from the Ibero-America region. Besides, the effects of temperature and hours of sunshine were not considered in the study of Díaz and López [30]. Additionally, it was stated that ultraviolet light can activate the metal surface and then lead to a sooner initiation and a faster rate of corrosion process [36]. erefore, the influence of sunshine hours, an important meteorological parameter, on the corrosion rate needs to be considered in the predicted model. Moreover, ANN-based explicit formulas or practical tools have not been proposed to apply the ANN model for realistic engineering problems so far. e purpose of this study is to develop a practical ANN model, which can be readily applied for predicting the atmospheric corrosion rate (K) of carbon steel. A total of 240 experimental data samples are used to establish the ANN model. Seven external factors, which are the average temperature (T), average relative humidity (RH), total rainfall (Rf ), time of wetness (TOW), hours of sunshine (HoS), Cl − , and SO 2 deposition rate, are considered as input variables of the ANN model. e performances of the ANN model are also compared with those of three existing empirical formulas and three regression models. Moreover, the influences of all input variables on the predicted corrosion rate are investigated thoroughly. Eventually, an ANN-based equation and a graphical user interface (GUI) tool are established to predict the atmospheric corrosion rate of carbon steel.

Data Collection
A set of 240 measured data samples of the atmospheric corrosion under tropical climate conditions in Vietnam were used to build up the ANN model. ese databases were provided by the report of the Center for Material Failure Analysis [37], in which the data points were recorded in 2 years. Seven parameters, namely, T, RH, Rf, TOW, HoS, Cl − , and SO 2 , were involved as input parameters. It should be noted that the atmospheric corrosion rate was measured based on the weight loss of carbon steel samples. e relationship between corrosion rate (K) and weight loss is expressed by the following equation [38]: where m 1 and m 2 are the weights of samples before and after corrosion, respectively; S is the area of sample surface; t is the corrosion time considered. e summary of the statistical properties of the input parameters is presented in Table 1. It should be noted that the database used in this study was mostly focused on the tropical monsoon climate, where steel is the most susceptible to corrosion. Figure 1 presents the histogram of the used data samples. In addition, the relationships between the atmospheric corrosion rate of carbon steel and seven input parameters are represented by the correlation matrix, as described in Figure 2. Based on this figure, it can be found that some parameters had a strong correlation such as RH and TOW or T and HoS. It is attributed to the reason that the relative humidity is always accompanied by time of wetness, and temperature is associated with the sunshine hour. Meanwhile, some others were poorly correlated such as Rf and SO 2 or TOW and SO 2 since their physical meanings have no connection. Moreover, the correlation between each single input parameter and the output, K, appeared to be weak.

Existing Equations for Predicting the Atmospheric Corrosion
Rate. In addition to regression models, the existing formulas for calculating the corrosion rate are presented in this study. ree typical formulas proposed by various studies [19,21,25] were used to obtain the atmospheric corrosion rate of carbon steel. Table 2 summarizes the empirical equations for calculating the atmospheric corrosion rate of steel.

Regression Models.
We also performed regression models to calculate the corrosion rate based on the used database. e regression is normally employed to define a relationship between variables. e regression model can be expressed by a general linear least-squares model as follows: Y � a 0 + a 1 z 1 + a 2 z 2 + · · · + a m z m + e, where z 1 , z 2 , z m are basis functions; a 0 , a 1 , a 2 , a m represent the regression coefficients; e is the residual, while [E] denotes the residual matrix [39]; [A] is the matrix determined by minimizing the mean squared difference between the regression values and the actual experimental data; [Z] is the input parameter matrix; A ∧ is the least-square estimate of [A], and it is determined according to [40] and expressed as follows: where [Z] T is the transformation matrix of [Z]. is study has used linear, quadratic, and quadratic with mixed terms regression models for input data. A summary of the forms and coefficients of three regression models is presented in Table 3.

Proposed ANN Model.
ANN is capable of dealing with various categories such as regression analysis, classification, or data processing [41,42]. Neurons are the smallest units in an ANN model. An ANN model comprises (1) an input layer, which contains input parameters, (2) single or multiple hidden layers, and (3) an output layer, which holds the output result. Neurons transfer signals to other neurons based on the signals they receive from other neurons. us, each neuron is connected to other neurons in the network through these synaptic connections, whose values are weighted. e signals transmitting through the network are strengthened or dampened by these weight values. It should be noted that there is a bias and an activation in each neuron [43]. e input signal of neuron is represented by a vector as x � [x 1 , x 2 , . . . , x m ], while the weighted sum of the input vector is determined by z ∈ R, as shown in equation (7).
where w � [w 1 , w 2 , . . . , w d ] ∈ R d is the vector of weight in the d-dimension; b denotes the bias. An activation function in the network determines the transformation of the weighted sum of the input into an output from a node or nodes in a layer of the network. Activation functions also support normalizing the output of any input in the range [1, −1] or [0, 1]. e selection of activation functions is depending on the problem purpose. Some typical activation functions such as sigmoid and tanh forms can be used in the hidden layer of the recurrent neural network [44]. Since this study focuses on the prediction problem, the hyperbolic tangent sigmoid function, so-called tansig, and a linear activation function, namely, purelin, are employed, as expressed by equations (5) and (6). It should be noted that the tansig function is used in the hidden layer, while the purelin function is utilized in the output layer.
Quadratic order (MLR2) a 0 + a 1 X 1 + · · · + a 7 X 7 + +a 11 X 2 1 + a 22 X 2 2 + · · · + a 77 X 2 Quadratic with mixed terms (MLR3) a 0 + a 1 X 1 + · · · + a 7 X 7 + +a 11 X 2 1 + a 22 X 2 2 + · · · + a 77 X 2 7 + +a 12 X 1 X 2 + · · · + a 67 X 6 X 7 According to Golafshani and Ashour [48], normalizing the database in a range of [−1, 1] before training is required. e normalization of input variables is determined by the following expression: where x is the considered input variable, x n is the normalization of variable, and x min and x max denote the minimum and maximum of the variable in the dataset, respectively.
For the proposed ANN model in this study, seven parameters, namely, T, RH, Rf, TOW, HoS, Cl − , SO 2 , are considered as the input variables, whereas the atmospheric corrosion rate of carbon steel, K, is the output variable. e two following steps are implemented for training ANN model: Step 1.
e input signals, after entering into the input layer, are transferred through the connections, from the hidden layer to the output layer.
e predicted result is obtained from the feedforward process; however, we need to minimize the error, which uses the mean square error (MSE) indicator. To diminish this error, the iteration is conducted till a convergence is obtained.
is step is for minimizing the error and obtaining an optimal model. is procedure is called back-propagation. e MSE value is calculated using the following equation: where n is the number of training data samples; p i and t i represent the predicted and target values of the i th sample, respectively.
Overfitting describes the phenomenon of a model adapting too well to the training data such that it cannot predict unseen data samples well. erefore, the model will fail to predict the output of data outside of the used training set. Accordingly, this phenomenon hinders the performance accuracy of the model and causes a deviation of the predicted result. To prevent such problem, the regularization solution is employed to modify the error function using the following equation [47,49]: where c is the performance ratio; MSWB represents the mean squared network weights and biases, which is expressed as follows: To optimize the performance of the predictive model, an efficient ANN model has to be determined using trial-anderror process. Various ANN architectures were tested with the training ratio varying from 0.6 to 0.85 and a wide range of neuron numbers in the hidden layer. It should be noted that only one hidden layer was used in testing ANN models. In this study, the Levenberg-Marquardt (i.e., damped leastsquare) algorithm was utilized for regulating weights and biases of ANN models [50]. e advantages of this algorithm are solving nonlinear least-squares problems, robustness, and obtaining rapid convergence [51]. is algorithm was also widely used in previous studies [43,46,47,[52][53][54][55]. To assess the ANN models, two indicators, which are the R 2 value and MSE, were quantified. Accordingly, the optimum ANN model contains largest R 2 and smallest MSE after training process was chosen. It should be noted that the proportion, 70%, 15%, and 15% of the dataset, was employed in training, testing, and validation, respectively. e number of neurons in the hidden layer is an important factor to train the ANN model. e best ANN model for experimental data was achieved by a sensitivity analysis. e number of neurons in the hidden layer was varied from 5 to 21 to obtain the optimum ANN model. After performing the sensitivity analysis, the best model with 10 neurons in the hidden layer was chosen, as illustrated in Figure 3. Figure 4 shows the structure of the proposed ANN model. In this model, seven neurons in the input layer denote the seven input variables (shown in Table 1), and one neuron in the output layer represents the atmospheric corrosion rate of carbon steel. It should be noted that the developed ANN model and its performance were conducted using MATLAB [56].

ANN Model Performance.
e performance of the proposed ANN model is shown in Figure 5, in which MSE for training, validating, and testing decrease with an increment of the epoch. e best validation performance was selected since MSE was reduced to 1.7814 × 10 − 3 at the 4 th epoch. A small value of the squared error indicates that the ANN model was well trained. Figure 6 shows the regression of the developed ANN model, in which the output and target results are highly matched. e R 2 values for training, testing, validation, and all-data regression are 0.9998, 0.9998, 0.9999, and 0.9998, respectively. It is observed that the R 2 values were mostly close to unity, highlighting that the proposed ANN model has a good performance. In other words, the ANN model was highly reliable in predicting the atmospheric corrosion rate of carbon steel. errors were shown to be trivial, mostly smaller than 0.08. Again, it was demonstrated that the ANN model determined the atmospheric corrosion rate of carbon steel accurately. Even though the ANN performance results were compared with the validation dataset, cross-validation should be considered. However, due to the limitation of the developed algorithm, cross-validation was not performed in this study.

Comparison between the Developed ANN Model and
Existing Formulas. e results obtained from the ANN model were compared with those of the regression models and existing formulas. ree regression models presented in Table 2 and existing formulas in Table 3 were utilized. To evaluate the performances of all predictive models, four indicators, which are RMSE, mean absolute percentage error where t i and o i are the target and output of the i th sample, respectively; n is the number of samples. It should be noted that the RMSE and MAPE values represent the mean of errors, whereas, the R 2 and r values were used to measure the variation and linear correlation between predicted and actual data, respectively. e higher values of R 2 and r and the lower values of RMSE and MAPE indicate a good performance of the predictive model. If the predictive model is perfect, the values of R 2 and r are equal to 1.0, and the error is zero. Figure 11 shows the calculated values of statistical parameters with various predictive models. It is clear that the ANN model has the smallest values of RMSE and MAPE and largest values of R 2 and r, followed by quadratic regression models. In other words, the ANN model is superior in predicting the corrosion rate of carbon steel compared to the other models. Moreover, an overall performance of all predictive models is illustrated in Figure 12. Again, it is observed that the proposed ANN model contains the smallest standard deviation, followed by the regression models and predictive models proposed by Knotkova et al. [19], Roberge et al. [21], and ISO and MICAT [25]. Details of the calculated results can be seen in Table 4. Table 5 also shows the statistical results of different ratios of the predictive models to test results. It demonstrates that the mean ratio of the ANN model was 1.0002, mostly equal to unity, and the standard deviation was lowest compared to those of other models. Again, the ANN model was shown to be the optimal and reliable option in predicting the corrosion rate of carbon steel.

Evaluation of the Effects of Input Parameters
A parametric study was carried out to evaluate the influences of input parameters on the atmospheric corrosion rate of carbon steel using the developed ANN model. To account for the interaction of multiple parameters on calculated K, the considered variable was varied from the lowest to the highest range, and simultaneously other variables were changed in turn. It should be noted that the L, ML, M, MH, and H letters in Table 6 are the abbreviations of the lowest, middle-low, mean, middle-high, and highest values, respectively. Consequently, the variations of the predicted result caused by the variation of the input parameters were quantified. Figure 13 shows the effects of the average temperature (i.e., X 1 ) on the atmospheric corrosion rate of carbon steel, K. During the variation of the average temperature T, other parameters were varied in turn to evaluate the effects of the interaction between T and other variables on K. It was found that the increment of the average temperature caused an increase in the atmospheric corrosion rate of carbon steel. If T was 1.5 times increased, the K value was increased by 10%. is observation can be attributed to the reason that the increment of temperature can intensify the chemical reaction, which may boost the corrosion process in the carbon steel.  Figure 10: Validation data performance. 10 Advances in Materials Science and Engineering

Effect of the Average Relative Humidity.
e effects of the average relative humidity (i.e., X 2 ) on the atmospheric corrosion rate of carbon steel are shown in Figure 14. e corrosion rate was gradually increased when the average relative humidity increased in all cases. Specifically, the corrosion rate was not affected by relative humidity at low temperature, short time of wetness, low level of SO 2 , and short time of sunshine.

Effect of the Time of Wetness.
e time of wetness (i.e., X 3 ) depends on the temperature, humidity, total rainfall, and hours of sunshine. TOW had been identified according to the suggestion of Tidblad and Mikhailov [57]. e effects of TOW on the atmospheric corrosion rate of carbon steel are shown in Figure 15. Similar to T, the corrosion rate K value was enlarged as TOW increased. is is consistent with the previous study [35].

Effect of the Average Chloride.
e effects of the average chloride (i.e., X 4 ) on the atmospheric corrosion rate of carbon steel are shown in Figure 16. It was found that the corrosion rate was increased with the increment of Cl − . It can be attributed to the reason that the passivation film of steel can be damaged by chloride ions in the process of competing with hydrogen and oxygen ions in the absorption process, thus causing the occurrence of pitting corrosion [58]. Figure 17 shows the effects of the average sulfur dioxide deposition rate (i.e., X 5 ) on the atmospheric corrosion rate of carbon steel. e atmospheric corrosion rate of carbon steel increased as the SO 2 rate increased. is is due to the attribution of SO 2 to react with H 2 SO 4 in the atmosphere or on   Advances in Materials Science and Engineering the surface of carbon steel. Combining with high humidity or wetness, the damage caused by SO 2 would be considerable [59]. Figure 18 shows the influences of the total rainfall (i.e., X 6 ) on the atmospheric corrosion rate. It can be observed that the atmospheric corrosion rate of carbon steel was increased, since Rf varied from the minimum to the maximum value. Moreover, the K value was increased by 6% if the total rainfall increased 6 times. In the tropical climate region, due to the annual high rainfall, a consideration of the effects of rainfall on the corrosion rate is needed. is statement was also pointed out in previous studies [30,60]. Figure 19 shows the effects of the hours of sunshine (i.e., X 7 ) on the atmospheric corrosion rate of carbon steel. It was found that the atmospheric corrosion rate was decreased with an increment of HoS. is observation is probably due to the reason that the sunshine hours have a strongly negative correlation with the relative humidity and time of wetness, as shown in Figure 2. Moreover, the corrosion mechanism of carbon steel in the tropical region is a complex combination of chemical and physical conditions. Figure 20 demonstrates the sensitivity of input variables to the atmospheric corrosion rate of carbon steel. It should be noted that the K value in this figure was achieved at the upper bound (i.e., maximum) of each input parameter. It was observed that the rainfall was the most influential parameter on predicting the atmospheric corrosion rate, followed by the time of wetness, the average temperature, the average sulfur dioxide deposition rate, the average chloride, and average relative humidity. Meanwhile, the sunshine duration negatively affected the atmospheric corrosion rate.

ANN Model-Based Equation.
As analyzed above, the proposed ANN model can predict the atmospheric corrosion rate of carbon steel accurately. It is needed to develop an ANN-based formula for explicit usage in the practical problems. Considering the K value as the output response, the procedures presented in the previous sections were adopted herein. e explicit formulation of K was obtained directly from the developed ANN model by using the activation functions, weights, biases, and normalization factors, expressed as where K N is a normalized atmospheric corrosion rate of carbon steel. e form of equation (12) comes from the denormalization procedure of equation (3). As a result, the value of 20.429 is the minimum value of the atmospheric corrosion rate of the database. e value of 2.419 is a half of the difference of maximum and minimum atmospheric corrosion rate values of database, as shown in Table 1. e normalized value K N was a function, which is expressed by the following equation:       Table 7.

ANN Interactive Graphical User Interface (GUI).
In this study, a practical GUI tool was constructed using MATLAB [56] to simplify calculating the atmospheric corrosion rate of carbon steel, as shown in Figure 21. Seven input parameters, from X 1 to X 7 , were provided in the input signal. Also, ten neurons in the hidden layer are shown in Figure 21. is tool is accessed freely, and it is very convenient to use. Users can easily obtain the output by clicking on the "Start Predict" button after putting all input parameters. It takes less than one second to obtain the result. Since this GUI tool was developed using the proposed ANN model, the accuracy of prediction was verified and demonstrated in the previous section. is GUI tool is freely available at https://github.com/duyduan1304/GUI_corrosionrate. It should be noted that the ANN algorithm cannot tackle extrapolation; thus the input values should be restricted to the minimum and maximum of the utilized database. To expand the coverage of ANN model, a wide range of collected data should be considered.

Conclusions
A practical ANN model was developed to predict the atmospheric corrosion rate of carbon steel based on a set of 240 experimental data samples. e results of the proposed model were compared with those of three regression models and three existing formulas. Additionally, a series of parametric studies were performed to evaluate the effects of input parameters on the atmospheric corrosion rate. e following conclusions are drawn: e developed ANN model in this study predicted the atmospheric corrosion rate of carbon steel more accurately than the regression models and existing equations. e accuracy of the model was verified by the statistical properties including RMSE, MAPE, R 2 , and r value. e rainfall and hours of sunshine were the most influential parameters on predicting the atmospheric corrosion rate. Meanwhile, the average chloride ion, the average temperature, and the time of wetness were less sensitive to the atmospheric corrosion rate. An ANN model-based formula, which considered all seven input parameters, was proposed to calculate the atmospheric corrosion rate of carbon steel. A graphical user interface tool was developed and easily applied for simplifying the prediction of the atmospheric corrosion rate of carbon steel.

Data Availability
All the data supporting the key findings of this paper are presented in the figures and tables of the article. Requests for other data will be considered by the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest.