Prediction Model of Strip Crown in Hot Rolling Process Based on Machine Learning and Industrial Data

: The strip crown in hot rolling has the characteristics of multivariablity, strong coupling and, nonlinearity. It is difﬁcult to describe accurately using a traditional mechanism model. In this paper, based on the industrial data of a hot continuous rolling ﬁeld, the modeling dataset of a strip steel prediction model is constructed through the collection and collation of the on-site data. According to the classical strip crown control theory, the important process parameters that affect the strip crown are determined as input variables for the data-driven model. Some new intelligent strip crown prediction models integrating the shape control mechanism model, artiﬁcial intelligence algorithm, and production data are constructed using four machine learning algorithms, including XGBoost, Random Forest (RF), Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP). The overall performance of the models is evaluated using error indicators, such as Mean Absolute Percentage Error (MAPE), Root Mean square Error (RMSE), and Mean Absolute Error (MAE). The research results showed that, for the test set, the determination coefﬁcient (R 2 ) of the predicted value of the strip crown model based on the XGBoost algorithm reached 0.971, and the three error indexes are at the lowest level, meaning that the overall model has the optimal generalization performance, which can realize the accurate prediction of the outlet strip crown in the hot rolling process. The research results can promote the application of industrial data and machine learning modeling to the actual strip shape control process of hot rolling, and also have important practical value for the intelligent preparation of the whole process of steel.


Introduction
The production of hot-rolled strip occupies an important position in the modern iron and steel industry system. Strip shape, mainly including crown and flatness, is a key quality indicator in the production process of hot-rolled strip, and strip shape control is also a challenge in the production process of strip steel [1,2]. Although scholars and enterprises have conducted a large amount of research on this issue, up to now, there are still prominent issues of poor strip shape in the production process of hot-rolled strip. A large number of thin strip coils are rolled with product defects, such as mid wave, side wave, quarter wave, wedge shape, and unqualified crown hit, which seriously affect the yield of hot-rolled strip and bring incalculable economic losses to the enterprise. Poor shape quality of hot-rolled strip not only affects the smooth progress of the rolling process, but also adversely affects the smooth progress of subsequent processes, such as cold rolling, shearing, and other processes [3]. This inevitably requires accurate control of the shape accuracy of hot-rolled strip. The preset model in the shape control model calculates the adjustment amount of each relevant shape adjustment mechanism using a mathematical model based on the relevant parameters, such as the thickness, width, material, reduction, the two prediction models, it was found that the neural network model was more effective and accurate based on microstructure and casting temperature parameters.
In addition, machine learning also has a wide range of applications in the prediction of key parameters in the rolling process and process optimization. Nandan et al. [15] proposed a multi-objective optimal control strategy based on genetic algorithms and applied it to the identification of hot-rolled strip shape parameters and the setting of rolling schedules, which successfully obtained the optimal setting values for the crown and flatness, and also improved the control accuracy of strip shape. Chakraborti et al. [16] used genetic algorithms and ant colony algorithms to optimize the crown of hot-rolled strip, demonstrating the practicality of evolutionary algorithms in optimizing rolling process parameters. John et al. [17] directly used a combination of neural networks and genetic algorithms to establish a relationship model between input parameters and strip shape to predict the minimum strip shape value of hot-rolled strip. Liu et al. [18] established a transfer matrix between the flatness error characteristic parameter and the flatness adjustment parameter using a genetic algorithm to optimize the BP neural network method, and successfully applied the transfer matrix to the flatness adjustment mechanism of a 900 mm six-high HC rolling mill. Peng et al. [19] proposed a new method for recognizing the shape pattern of cold-rolled strip and verified it on an 8000 kN HC mill. The results show that the method can effectively reduce the shape deviation of the strip. Yang et al. [20] established an intelligent collaborative control model for cold-rolled strip flatness control mechanisms based on the combination of a flatness control matrix and differential algorithm optimization extreme learning machine (ELM). Zhang et al. have, respectively, used support vector machines (SVMs) [21], T-S cloud inference neural networks [22], PID neural networks [23], and radial basis networks [24] to study the problem of strip shape pattern recognition in the cold rolling process. Simulation results under various methods show that the various models proposed can identify common defects in strip shape with high accuracy. Deng et al. [25,26] constructed a hybrid model based on the combination of hot strip rolling production data and deep learning networks to predict strip outlet crown, achieving an absolute error of less than 5 µm for 97.04% of the predicted data in the modeling data. Song et al. [27] used machine learning algorithms to establish an accurate prediction model for the strip crown of hot rolling. The model prediction results showed that 97.83% of the data had a difference of less than 4 µm between the actual value and the predicted value. Through the above analysis, it can be found that combining big data technology with artificial intelligence modeling methods is a new trend in studying how to further improve the accuracy of shape control in rolling process [28,29]. Based on the massive industrial data accumulated in the process of hot strip rolling, this paper establishes an accurate prediction model for strip crown through in-depth mining of industrial data, combined with advanced machine learning algorithms, and then constructs a new intelligent shape control preset model that deeply integrates the crown control mechanism model, artificial intelligence algorithms, and production data. As such, it solves the problem that it is difficult to further improve the shape control accuracy when relying on traditional mechanism models, achieves the goal of effectively improving the flatness control accuracy, and obtains high-quality strip steel with good flatness.
The paper is organized as follows. The definition of the strip crown in hot rolling and the influence factors of the strip crown are briefly introduced in Section 2. In Section 3, the process of collecting and processing modeling datasets and the selection of model input variables are described in detail. Section 4 briefly introduces the four machine learning algorithms used for modeling and determines the main parameters of each model. The discussion of the strip crown forecasting results is described explicitly in Section 5, and Section 6 concludes this paper.

Definition of Strip Crown
Strip shape intuitively refers to the degree of warpage of the strip, and essentially refers to the distribution of residual stress inside the strip. The measurement of strip shape usually includes both longitudinal and transverse indicators. The longitudinal direction is represented by flatness, which refers to the flat degree of the strip along the length direction. The transverse direction is expressed by crown, referring to the cross-sectional shape of the strip steel. The standard definition of strip crown is the difference between the center thickness of the strip and the specified edge thickness. To eliminate the effect of strip edge thinning, the edge reference point is usually located 40 mm from the strip edge. The schematic diagram of crown definition is shown in Figure 1. Equation (1) is as follows: where h c is the thickness at the center of the strip, h i is the thickness at the distance operation side at 40 mm, and h i is the thickness at the distance drive side at 40 mm.

Definition of Strip Crown
Strip shape intuitively refers to the degree of warpage of the strip, and essentially refers to the distribution of residual stress inside the strip. The measurement of strip shape usually includes both longitudinal and transverse indicators. The longitudinal direction is represented by flatness, which refers to the flat degree of the strip along the length direction. The transverse direction is expressed by crown, referring to the cross-sectional shape of the strip steel. The standard definition of strip crown is the difference between the center thickness of the strip and the specified edge thickness. To eliminate the effect of strip edge thinning, the edge reference point is usually located 40 mm from the strip edge. The schematic diagram of crown definition is shown in Figure 1. Equation (1) is as follows: where c h is the thickness at the center of the strip, i h is the thickness at the distance operation side at 40 mm, and i h′ is the thickness at the distance drive side at 40 mm.

Influence Factors of Strip Crown
The crown of the strip can be seen as an image of the cross-sectional shape of the roll gap at the exit of the deformation zone. Therefore, all factors that can affect the crosssectional shape of the roll gap at the exit of the deformation zone are factors that affect the strip crown. The strip width, rolling force, work roll diameter, backup roll diameter, crown of work roll, entrance strip crown, and roll bending force are listed as the main factors affecting the strip crown [30,31]. The interrelationship of various influencing factors is shown in Figure 2. The impact effects of these factors are different, and under certain conditions, a certain influencing factor may have a relatively significant effect. As can be seen from Figure 2, there is a non-linear and strong coupling characteristic between the influencing factors affecting the strip outlet crown.

Influence Factors of Strip Crown
The crown of the strip can be seen as an image of the cross-sectional shape of the roll gap at the exit of the deformation zone. Therefore, all factors that can affect the crosssectional shape of the roll gap at the exit of the deformation zone are factors that affect the strip crown. The strip width, rolling force, work roll diameter, backup roll diameter, crown of work roll, entrance strip crown, and roll bending force are listed as the main factors affecting the strip crown [30,31]. The interrelationship of various influencing factors is shown in Figure 2. The impact effects of these factors are different, and under certain conditions, a certain influencing factor may have a relatively significant effect. As can be seen from Figure 2, there is a non-linear and strong coupling characteristic between the influencing factors affecting the strip outlet crown.

Data Collection
The process layout of the hot continuous rolling production line is shown in Figure  3. Generally, the hot continuous rolling production line consists of a heating furnace, roughing mill, finishing mill, laminar cooling device, and coiling. The roughing mill unit includes a high-pressure water descaler, a vertical mill (E1), and a roughing mill (R1). The vertical mill (E1) controls the width of the strip, and the roughing mill controls the thickness of the strip when it enters the finishing mill unit. The semi-finished products from the rough rolling area are sent to the finishing mill area through the roller table. The finishing mill unit includes a high-pressure water descaler, a flying shear, and a seven-stand finishing mill (F1~F7). After the strip comes out of the finishing mill, it enters the laminar cooling device, which controls the cooling speed and final coiling temperature of the strip. The strip is rolled into coils by a coiler.  During the actual production process, the automatic control system monitors and records the relevant data of each equipment and strip steel in real time. The data recorded in the database is divided into original data, calculated value data, and actual value data.

Data Collection
The process layout of the hot continuous rolling production line is shown in Figure 3. Generally, the hot continuous rolling production line consists of a heating furnace, roughing mill, finishing mill, laminar cooling device, and coiling. The roughing mill unit includes a high-pressure water descaler, a vertical mill (E1), and a roughing mill (R1). The vertical mill (E1) controls the width of the strip, and the roughing mill controls the thickness of the strip when it enters the finishing mill unit. The semi-finished products from the rough rolling area are sent to the finishing mill area through the roller table. The finishing mill unit includes a high-pressure water descaler, a flying shear, and a seven-stand finishing mill (F1~F7). After the strip comes out of the finishing mill, it enters the laminar cooling device, which controls the cooling speed and final coiling temperature of the strip. The strip is rolled into coils by a coiler.

Data Collection
The process layout of the hot continuous rolling production line is shown in Figure  3. Generally, the hot continuous rolling production line consists of a heating furnace, roughing mill, finishing mill, laminar cooling device, and coiling. The roughing mill unit includes a high-pressure water descaler, a vertical mill (E1), and a roughing mill (R1). The vertical mill (E1) controls the width of the strip, and the roughing mill controls the thickness of the strip when it enters the finishing mill unit. The semi-finished products from the rough rolling area are sent to the finishing mill area through the roller table. The finishing mill unit includes a high-pressure water descaler, a flying shear, and a seven-stand finishing mill (F1~F7). After the strip comes out of the finishing mill, it enters the laminar cooling device, which controls the cooling speed and final coiling temperature of the strip. The strip is rolled into coils by a coiler. During the actual production process, the automatic control system monitors and records the relevant data of each equipment and strip steel in real time. The data recorded in the database is divided into original data, calculated value data, and actual value data. The original data includes strip steel grade, material, slab size, and finished product size. These data are necessary parameters for setting calculation and automatic control. The During the actual production process, the automatic control system monitors and records the relevant data of each equipment and strip steel in real time. The data recorded in the database is divided into original data, calculated value data, and actual value data. The original data includes strip steel grade, material, slab size, and finished product size.
These data are necessary parameters for setting calculation and automatic control. The calculated value data includes various process parameters and model calculation data during the production process, which are important for the accuracy and quality of the finished product and for the analysis of the process. The actual value data includes the measured data of various pieces of testing equipment during the production process, which is an important basis for adaptive learning and the dynamic correction of the rolling mathematical model. The data collection process is shown in Figure 4. calculated value data includes various process parameters and model calculation data during the production process, which are important for the accuracy and quality of the finished product and for the analysis of the process. The actual value data includes the measured data of various pieces of testing equipment during the production process, which is an important basis for adaptive learning and the dynamic correction of the rolling mathematical model. The data collection process is shown in Figure 4.

Modeling Datasets
Roll

Data Preprocessing
The data were collected on a real 1780 mm hot strip mill process production line of HBIS Group Co., Ltd., which is located in Hebei Province, China. Pre-processing of collected data included deleting sample data with missing values and eliminating outlier data. After data processing, 1809 pieces of strip steel sample data were finally obtained to form a modeling dataset. Part of the modeling data is shown in Table 1. The modeling dataset is divided into the training set and test set based on the sampling balance. The training set proportion was 70%, and the test set proportion was 30%. In order to eliminate the impact of large differences in magnitude in different dimensions of sample data on model accuracy during modeling, the data of all input variables were standardized [32,33]. The formula for standardization processing is as follows: x are the maximum and minimum numbers of data sequences, respectively.

Data Preprocessing
The data were collected on a real 1780 mm hot strip mill process production line of HBIS Group Co., Ltd., which is located in Hebei Province, China. Pre-processing of collected data included deleting sample data with missing values and eliminating outlier data. After data processing, 1809 pieces of strip steel sample data were finally obtained to form a modeling dataset. Part of the modeling data is shown in Table 1. The modeling dataset is divided into the training set and test set based on the sampling balance. The training set proportion was 70%, and the test set proportion was 30%. In order to eliminate the impact of large differences in magnitude in different dimensions of sample data on model accuracy during modeling, the data of all input variables were standardized [32,33]. The formula for standardization processing is as follows: where max(x i ) and min(x i ) are the maximum and minimum numbers of data sequences, respectively.

Determination of the Input and Output Parameters of the Models
Based on the analysis of the influencing factors of strip outlet crown in the theory of strip shape control, the input variables of the machine learning model were finally determined according to the plate shape control theory. These input variables included strip width (W), slab thickness (H), exit thickness (H1~H7), entrance temperature (T1~T7), exit temperature (t1~t7), rolling force (F1~F7), rolling speed (V1~V7), strip yield strength (Q1~Q7), bending force (W1~W7), rolling shifting (S1~S7), roll diameter (D1~D7), roll thermal expansion (C1~C7), and roll wear (M1~M7). The specific input variables are shown in Figure 5. The strip crown is used as the model output variable.

Determination of the Input and Output Parameters of the Models
Based on the analysis of the influencing factors of strip outlet crown in the theory of strip shape control, the input variables of the machine learning model were finally determined according to the plate shape control theory. These input variables included strip width (W), slab thickness (H), exit thickness (H1~H7), entrance temperature (T1~T7), exit temperature (t1~t7), rolling force (F1~F7), rolling speed (V1~V7), strip yield strength (Q1~Q7), bending force (W1~W7), rolling shifting (S1~S7), roll diameter (D1~D7), roll thermal expansion (C1~C7), and roll wear (M1~M7). The specific input variables are shown in Figure 5. The strip crown is used as the model output variable.

Experimental
ANNs are one of the most classic machine learning algorithms, widely used in various fields due to their strong nonlinear fitting ability and their ability to map any nonlinear relationship. Compared with ANNs, SVMs have a more solid mathematical theoretical foundation and can effectively solve the problem of constructing high-dimensional data models under limited sample conditions. Separate ANNs and SVMs are typical individual learners. The method of constructing and combining multiple individual learners to complete learning tasks is called ensemble learning. Random forest (RF) and XGBoost are two different ensemble learning strategies. RF is an ensemble learning method based on the idea of bagging, which obtains multiple parallel individual learners through effective resampling, and then calculates the model mean as the final model result. XGBoost is an ensemble learning method based on Boosting's idea, which continuously adjusts the weights of individual learners according to the error rate during the training process, so that individual learners with low error probability can obtain larger weights. The above four models are the most representative methods of machine learning. Therefore, these

Experimental
ANNs are one of the most classic machine learning algorithms, widely used in various fields due to their strong nonlinear fitting ability and their ability to map any nonlinear relationship. Compared with ANNs, SVMs have a more solid mathematical theoretical foundation and can effectively solve the problem of constructing high-dimensional data models under limited sample conditions. Separate ANNs and SVMs are typical individual learners. The method of constructing and combining multiple individual learners to complete learning tasks is called ensemble learning. Random forest (RF) and XGBoost are two different ensemble learning strategies. RF is an ensemble learning method based on the idea of bagging, which obtains multiple parallel individual learners through effective resampling, and then calculates the model mean as the final model result. XGBoost is an ensemble learning method based on Boosting's idea, which continuously adjusts the weights of individual learners according to the error rate during the training process, so that individual learners with low error probability can obtain larger weights. The above four models are the most representative methods of machine learning. Therefore, these four methods are used to establish a strip crown prediction model based on the dataset in this article and to conduct comparative research to obtain the best strip crown prediction model.

MLP-Based Method
The multilayer perceptron model (MLP) is an ANN. An important feature of the MLP is that it has multiple neural layers. The MLP model consists of an input layer, hidden layer, and an output layer, with each layer consisting of multiple neurons [34,35]. According to the modeling dataset, the number of neurons in the input layer of the constructed MLP crown prediction model is equal to the number of input feature variables in the dataset. Therefore, the number of neurons in the input layer is 79, and the number of neurons in the output layer is equal to the number of output variables in the dataset, which means that the number of neurons in the output layer is 1. After testing the hidden layer's activation function of 'identity', 'logistic', 'tanh', and 'relu', it is found that the 'logistic' function has the best prediction accuracy. In addition, the number of neurons in the hidden layer [36] and the regularization term coefficients are important parameters that affect the accuracy of the MLP model. The MLP model adopts 'Python sklearn.neutral_network. MLPRegressor' generation, using 'GridSearchCV' to optimize the number of hidden layer neurons and regularization term coefficients (alpha). The optimal range of the number of neurons in the hidden layer is [80,100,200], and the optimal range of the 'alpha' is [0.01, 0.1, 0.5]. Under the dataset in this article, after optimizing the model parameters, the final parameters of the MLP regression model are shown in Table 2.

SVM-Based Method
The SVM is a new type of machine learning algorithm based on statistical learning theory, which is an approximate implementation of structural risk minimization methods [37,38]. It is suitable for small sample data. Through some nonlinear mapping, the input variables are dimensioned up to a high dimensional space. An SVM has the advantages of fast solution speed and strong generalization ability, and it is widely used in many fields. An SVM can ultimately be transformed into dual optimization forms, as follows: The constraints are as follows: where n is the number of samples; α i α * i are the Lagrange multipliers; K x i , x j is the kernel function, and the commonly used kernel functions include linear kernel, polynomial kernel, and radial basis functions (RBF). Due to the strong nonlinear mapping ability of RBF, this paper uses polynomial kernel as the kernel function. Polynomial kernel is expressed as follows: where a is kernel function parameter. The number of polynomial kernel functions 'degree', penalty factor 'C', and the coefficient of the kernel function 'gamma' of the polynomial kernel function in SVM models have important impacts on the prediction accuracy of the model. The SVM model adopts 'Python sklearn.svm import SVR' generation, using 'GridSearchCV' to optimize the 'degree', 'C', and 'gamma'. The optimal range of the 'degree' is [1,2,3], the 'C' is [20,30,40], and the 'gamma' is [0.1, 0.2, 0.23]. After testing the kernel function type of 'linear', 'poly', 'rbf', and 'sigmoid', it is found that the 'poly' function has the best prediction accuracy. Under the dataset in this article, the parameters of the SVM finally determined after parameter optimization are shown in Table 3.

RF-Based Method
RF is a parallel integrated learning algorithm based on decision trees. Compared to individual base learners, it achieves randomness in sample selection and feature selection through bagging and random subspace ideas [39,40], enhancing the generalization ability of models. According to the idea of bagging, the RF model obtains m sets of sampled training sets with the same capacity as the original training set through independent random sampling with m times of return, and then uses these sampled training sets to train in order to obtain corresponding m base learners. Due to the independence of sampling, the sampled training set obtained from each sampling is different from the original training set and other sampled training sets, which can effectively avoid the occurrence of local optimal solutions from the perspective of training sample selection and can also ensure a low correlation between each decision subtree.
The number of base learners 'n_estimators', the depth of decision tree 'max_depth', and the number of features to consider when limiting branching 'max_features' in the RF model have important impacts on the prediction accuracy of the model. The RF model adopts 'Python sklearn.ensemble import RandomForestRegressor' generation, using 'GridSearchCV' to optimize the 'n_estimators', 'max_depth', and 'max_features'. The optimal range of the 'n_estimators' is [60, 80, 100], the 'max_depth' is [10,30,50], and the 'max_features' is [6,9,12]. Under the dataset in this article, the parameters of the RF finally determined after parameter optimization are shown in Table 4.

XGBoost-Based Method
XGBoost is a flexible, efficient, and convenient optimal distributed algorithm proposed by Chen et al. [41][42][43], based on the Gradient Boosting Decision Tree (GBDT). XGBoost adopts the idea of boosting ensemble learning to combine multiple decision trees to achieve better results and make the combined model more generalized. XGBoost is composed of multiple decision trees. Each decision tree learns the residual between the target value and the predicted value, where the predicted value is the sum of the predicted values of all previous decision trees. After the training of all decision trees is completed, a common decision is made. After obtaining the corresponding prediction values on each tree, the samples are accumulated as their final prediction results. During the training phase, each new tree is trained on the basis of the completed tree. Each decision tree is a weak learner. Boosting technology is used to upgrade all weak learners into strong learners. To avoid model overfitting and enhance generalization capabilities, XGBoost adds regularization terms to the loss function of the GBDT model. The traditional GBDT calculates the loss function using a first order Taylor expansion, using a negative gradient value to replace the residual for fitting. XGBoost adds a second order Taylor expansion to the loss function, using second order derivatives to collect gradient direction information, thereby improving the accuracy of the model.
The loss function of the XGBoost algorithm is defined as follows: where l y i ,ŷ is the prediction residual of the ith sample to the tth iteration; X i is the ith sample; Ω( f t ) is the regularization term; ω 2 j is the score of leaf node; T is the number of leaf nodes; γ is the coefficient; λ is the coefficient of the sum of the weights regularized by the L2 regularization term of all leaf nodes.
The second-order Taylor expansion of the loss function of XGBoost algorithm is as follows: where g i is the first derivative term of the loss function; h i is the second derivative term of the loss function. The number of base learners 'n_estimators', the depth of decision tree 'max_depth', and the 'learning_rate' in XGBoost model have important impacts on the prediction accuracy of the model. The XGBoost model adopts 'Python XGBoost import XGBRegressor' generation, using 'GridSearchCV' to optimize the 'n_estimators', 'max_depth', and 'learn-ing_rate'. The optimal range of the 'n_estimators' is [60, 80, 100], the 'max_depth' is [10,30,50], and the 'learning_rate' is [0.1, 0.2, 0.3]. Under the dataset in this article, the parameters of the XGBoost finally determined after parameter optimization are shown in Table 5.

Results and Discussion
The experiments are carried out on a computer running 64-bit Windows 10 with a 3.0 GHz processor and 16 GB memory. The algorithms are implemented in the PYTHON language. This research adopted R 2 , MAE, MAPE, and RMSE as criteria for assessing the prediction performance of machine learning models. The performance criteria calculation formulas are as follows: where n denotes the number of sample data; y i and y * i are the measured value and the predicted value of the ith sample, respectively.
The modeling dataset is divided into a training set and a test set. Four machine learning algorithms, namely MLP, SVM, RF, and XGBoost, are used to train the strip crown prediction model for hot strip rolling based on the same training set data, and the performance of the established models are tested using the same test set. The regression effects of the four prediction models are shown in Figure 6. According to Figure 6 In order to visually display the comparison between the predicted values and the actual values of the four crown prediction models, Figure 7 is drawn to express the proximity of the predicted values to the actual values. Due to the large amount of modeling sample data, it is not possible to display the situation of all samples in one graph. There- In order to visually display the comparison between the predicted values and the actual values of the four crown prediction models, Figure 7 is drawn to express the proximity of the predicted values to the actual values. Due to the large amount of modeling sample data, it is not possible to display the situation of all samples in one graph. Therefore, the first 100 samples from the training set and the test set are taken for plotting. As can be seen from Figure 7a, in the training set, the predicted values of each model have a greater degree of consistency with the actual values, because all the samples in the training set participated in the training of the model during the modeling process. On the contrary, there are many samples in the test set that have significant errors with the actual values, especially in the prediction results of the MLP and SVM models, where a large number of samples seriously deviate from the actual values of the corresponding crown. There are two reasons for this result. Firstly, compared to the training set, the test set samples did not participate in the model construction, resulting in a lower accuracy for its predicted values than the training set sample. In addition, because the generalization performance of MLP and SVM models is inferior to that of XGBoost and RF models, the prediction effect shown on the test set is that there are more samples with prediction results that seriously deviate from the actual values. In order to visually display the comparison between the predicted values and the actual values of the four crown prediction models, Figure 7 is drawn to express the proximity of the predicted values to the actual values. Due to the large amount of modeling sample data, it is not possible to display the situation of all samples in one graph. Therefore, the first 100 samples from the training set and the test set are taken for plotting. As can be seen from Figure 7a, in the training set, the predicted values of each model have a greater degree of consistency with the actual values, because all the samples in the training set participated in the training of the model during the modeling process. On the contrary, there are many samples in the test set that have significant errors with the actual values, especially in the prediction results of the MLP and SVM models, where a large number of samples seriously deviate from the actual values of the corresponding crown. There are two reasons for this result. Firstly, compared to the training set, the test set samples did not participate in the model construction, resulting in a lower accuracy for its predicted values than the training set sample. In addition, because the generalization performance of MLP and SVM models is inferior to that of XGBoost and RF models, the prediction effect shown on the test set is that there are more samples with prediction results that seriously deviate from the actual values. From the perspective of quantitative analysis, the absolute error between the predicted value and the actual value of each model in the training set and test set is statistically analyzed, and the statistical results are based on the reference standard that the absolute error of the predicted crown value is less than 4 µm. The results are shown in Table  6. From the table, it can be seen that the absolute error of XGBoost model's prediction From the perspective of quantitative analysis, the absolute error between the predicted value and the actual value of each model in the training set and test set is statistically analyzed, and the statistical results are based on the reference standard that the absolute error of the predicted crown value is less than 4 µm. The results are shown in Table 6. From the table, it can be seen that the absolute error of XGBoost model's prediction results on the training set is all less than 4 µm. The proportion of samples with an absolute error of less than 4 µm on the test set reached 96.13%, with only a small percentage having an absolute error of greater than 4 µm. The sample proportions of other models within this error range are smaller than that of XGBoost model, and the order of the ratio from large to small is RF, SVM, and MLP. The absolute error frequency distribution histograms and corresponding normal distribution curves of the four models' predictions are shown in Figure 8. It can be seen from Figure 8 that compared with the other three prediction models, the prediction error of XGBoost model is more concentrated around the error of zero. With the increase in absolute error, the frequency gradually decreases and presents a normal distribution. It is fully proved that XGBoost prediction model has a good prediction effect.  In order to evaluate the generalization performance more comprehensively and quantitatively, three errors, RMSE, MAE, and MAPE, are used as error indicators to analyze the above four models. Table 7 lists the calculated values of the three error indicators of each model. Figure 9 shows the error distribution histogram drawn according to the calculation results. When comparing the MLP model, RF model, SVM model, and XGBoost model, the result shows that the prediction accuracy of the XGBoost model is significantly better than the other models.  In order to evaluate the generalization performance more comprehensively and quantitatively, three errors, RMSE, MAE, and MAPE, are used as error indicators to analyze the above four models. Table 7 lists the calculated values of the three error indicators of each model. Figure 9 shows the error distribution histogram drawn according to the calculation results. When comparing the MLP model, RF model, SVM model, and XGBoost model, the result shows that the prediction accuracy of the XGBoost model is significantly better than the other models.  In the training set, with the results presented in Figure 9a, the XGBoost model is used to predict the hot strip crown, and the calculation results of the RMSE, MAE, and MAPE error indicators are smaller than the others. For the data-driven model, the error on the test set shows the generalization performance of the established model. Figure 9b 5.308 and 6.582, respectively. The calculation results of the three errors show the same rule, that is, that the XGBoost model shows the best prediction accuracy on the test set while maintaining the best training effect of the model. The reason for the above results is that the MLP model uses the BP algorithm for training. The traditional BP algorithm is a local search optimization method, and the weights of the network are gradually adjusted along the direction of local improvement, which can cause the algorithm to fall into local extremum [44], which often leads to the model falling into an over-fitting state, and the training time of the model will increase exponentially with the increase in the number of hidden layers and the number of neurons in each hidden layer. The time spent in model training is shown in Figure 10. Compared with the other three models, MLP has the longest training time. This seriously affects the training efficiency of the MLP model. The SVM algorithm uses quadratic programming to solve support vectors, which involves designing the cal- In the training set, with the results presented in Figure 9a, the XGBoost model is used to predict the hot strip crown, and the calculation results of the RMSE, MAE, and MAPE error indicators are smaller than the others. For the data-driven model, the error on the test set shows the generalization performance of the established model. Figure 9b 6.582, respectively. The calculation results of the three errors show the same rule, that is, that the XGBoost model shows the best prediction accuracy on the test set while maintaining the best training effect of the model. The reason for the above results is that the MLP model uses the BP algorithm for training. The traditional BP algorithm is a local search optimization method, and the weights of the network are gradually adjusted along the direction of local improvement, which can cause the algorithm to fall into local extremum [44], which often leads to the model falling into an over-fitting state, and the training time of the model will increase exponentially with the increase in the number of hidden layers and the number of neurons in each hidden layer. The time spent in model training is shown in Figure 10. Compared with the other three models, MLP has the longest training time. This seriously affects the training efficiency of the MLP model. The SVM algorithm uses quadratic programming to solve support vectors, which involves designing the calculation of m-order matrices. Therefore, when the matrix order is large, it will cause a decrease in generalization performance and consume a large amount of machine memory and computing time. This also explains why the two algorithms in Figure 6b,d have large absolute errors on the test set. RF and XGBoost all belong to the ensemble learning model. Ensemble learning integrates multiple models through a certain strategy and improves the accuracy of decision-making through group decision-making. The training of the RF model can be highly parallelized, fast, and efficient, which can be proved by Figure 10. The XGBoost model shows better comprehensive performance than RF. Based on the above analysis, it can be concluded that the XGBoost model is most suitable for establishing a prediction model of strip crown in hot rolling under the dataset of this research.

Conclusions
In this paper, some new data-driven strip crown prediction models integrating the shape control mechanism model, artificial intelligence algorithms, and production data are constructed using the XGBoost, RF, SVM, and MLP algorithms. Through the analysis and evaluation of the prediction results of strip crown for each model, the following main conclusions can be drawn. Using four machine learning algorithms and combining indus trial data, the prediction models for hot-rolled strip crown are constructed. Under the same dataset, the XGBoost model has the highest coefficient of determination (R 2 ) for the prediction results, reaching 0.971 on the test set, and the MLP model has the lowest coef ficient of determination for the prediction results, which is 0.860 on the test set. The RF and SVM models are between the XGBoost model and MLP model, with 0.945 and 0.909 on the test set, respectively. The comprehensive performance of four crown prediction models is evaluated using MAE, RMSE, and MAPE. The results show that the prediction model based on the XGBoost algorithm has the smallest errors under the same modeling dataset, showing the best prediction performance and the best generalization perfor mance. Based on these advantages, the combination of the XGBoost algorithm and indus trial data can be used to effectively predict the strip crown. By generating corresponding training data during the rolling process, those data-driven prediction methods can easily be extended to predict and optimize other parameters. The research in this paper provides a new method for solving complex industrial problems with multiple variables, strong coupling, and nonlinearity that cannot be handled by traditional mathematical models and also provides technical support for the effective utilization of massive data and shape control in hot strip rolling.
Author Contributions: Z.W. contributed to the conception of the study, design of experiments, anal ysis, and manuscript preparation; Y.H. helped perform the analysis with constructive discussions Y.L. contributed to write, review, and edit; T.W.'s contribution was investigation, supervision, and

Conclusions
In this paper, some new data-driven strip crown prediction models integrating the shape control mechanism model, artificial intelligence algorithms, and production data are constructed using the XGBoost, RF, SVM, and MLP algorithms. Through the analysis and evaluation of the prediction results of strip crown for each model, the following main conclusions can be drawn. Using four machine learning algorithms and combining industrial data, the prediction models for hot-rolled strip crown are constructed. Under the same dataset, the XGBoost model has the highest coefficient of determination (R 2 ) for the prediction results, reaching 0.971 on the test set, and the MLP model has the lowest coefficient of determination for the prediction results, which is 0.860 on the test set. The RF and SVM models are between the XGBoost model and MLP model, with 0.945 and 0.909 on the test set, respectively. The comprehensive performance of four crown prediction models is evaluated using MAE, RMSE, and MAPE. The results show that the prediction model based on the XGBoost algorithm has the smallest errors under the same modeling dataset, showing the best prediction performance and the best generalization performance. Based on these advantages, the combination of the XGBoost algorithm and industrial data can be used to effectively predict the strip crown. By generating corresponding training data during the rolling process, those data-driven prediction methods can easily be extended to predict and optimize other parameters. The research in this paper provides a new method for solving complex industrial problems with multiple variables, strong coupling, and nonlinearity that cannot be handled by traditional mathematical models, and also provides technical support for the effective utilization of massive data and shape control in hot strip rolling.