Endpoint prediction of BOF steelmaking based on state-of-the-art machine learning and deep learning algorithms

: To enhance the e ﬃ ciency and sustainability, technical preparations were made for eliminating the Temperature, Sample, Oxygen test of basic oxygen furnace (BOF) steelmaking process in this work. Utilizing data from 13,528 heats and state-of-the-art (SOTA) machine learning (ML) and deep learning algorithms, data-driven models with di ﬀ erent types of inputs were developed, marking the ﬁ rst use of time series data (o ﬀ - gas pro ﬁ les and blowing practice related curves) for BOF steel-making ’ s endpoint prediction, and the tabular features were expanded to 45. The prediction targets are molten steel ’ s concentrations of phosphorus (Endpoint [P], %) and carbon (Endpoint [C], %), and temperature (Endpoint-Temp, °C). The optimal models for each target were implemented at a Hesteel Group ’ s BOF steelmaking facility. Initially, SOTA ML models (XGBoost, LightGBM, Catboost, TabNet) were employed to predict Endpoint [P]/[C]/Temp with tabular data. The best mean absolute errors (MAE) achieved were 2.276 × 10 − 3 % (Catboost), 6.916 × 10 − 3 % (Catboost), and 7.955°C (LightGBM), respectively, which surpassed the conventional models ’ performance. The prediction MAEs of the conventional models with the same inputs for Endpoint [P]/ [C]/Temp were 3.158 × 10 − 3 %, 7.534 × 10 − 3 %, and 9.150°C (Back Propagation neural network) and 2.710 × 10 − 3 %, 7.316 × 10 − 3 %, and 8.310°C (Support Vector Regression). Subsequently, predictions were explored to be made using SOTA time series analysis models (1D ResCNN, TCN, OmniScaleCNN, eXplainable Convolutional neural network (XCM), Time-Series Transformer, LSTM-FCN, D-linear) with the original time series data and SOTA image analysis models (Pre-activation ResNet, DenseNet, DLA, Dual path networks (DPN), GoogleNet, Vision Transformer) with resized time series data. Finally, the concat-model and the paral-model architectures were designed for making predictions with both tabular data and time series data. It was determined that the concat-Model with TCN and ResCNN as the backbone exhibited the highest accuracy. It ’ s MAE for predicting Endpoint [P]/[C]/Temp reaches 2.153 × 10 − 3 %, 6.413 × 10 − 3 %, and 5.780°C, respectively, with ﬁ eld test ’ s MAE at 2.394 × 10 − 3 %, 6.231 × 10 − 3 %, and 7.679°C. Detailed results of the importance analysis for tabular data and time series are provided.


Introduction
Basic oxygen furnace (BOF) steelmaking stands as the predominant method globally, constituting 60% of steel production by 2000.In its high-temperature chemical process, it is paramount to precisely control the temperature and critical contents of chemical elements (such as carbon and phosphorus) of molten steel at the endpoint.This is achieved by blowing oxygen into a blend of scrap, hot metal, and additives.The endpoint measurement in BOF steelmaking serves as a direct indicator of molten steel quality, which is vital for subsequent processes.Currently, the Temperature, Sample, Oxygen (TSO) test is widely utilized as the endpoint measurement method.In this test, the temperature and oxygen content of molten steel are directly measured, and steel samples are taken for further laboratory chemical testing by a sub-lance at the end of blowing.While this method is quite efficient for endpoint measurement, it does exhibit several drawbacks: (1) Delay of chemistry results: Crucial chemistry values require laboratory testing for 3-5 min, rendering them too tardy to determine the steel quality and guide the steelmaking process.(2) Additional cost: Sub-lance's sampler is disposable, and the maintenance of sub-lance and its cooling system need continuous investment, resulting in high operation costs, pollution, and carbon emissions.(3) Inadequate testing success rate: The success rate of TSO measurements is less than 90%.
The identified drawbacks present substantial challenges in achieving timely endpoint control, full automation of the BOF steelmaking process, as well as in optimizing costs and sustainability.Accurately predicting endpoint values could obviate the corresponding sampling and measuring processes, thereby enhancing steelmaking efficiency, endpoint control, and sustainability.,This study implemented the following measures for predicting the targets (Endpoint [P] (%), Endpoint [C] (%), and Endpoint-Temp (°C)): (1) Large-scale data collected from 13,528 heats was utilized, with the tabular features expanded to 45.Additionally, time series data can reflect entire steelmaking process was introduced as input of data-driven endpoint prediction model for the first time.(2) state-of-the-art (SOTA) and deep learning (DL) models were used for prediction.Certainly, here is a more detailed description of the main contents of this study: (1) Historical data from 5,562 heats were selected for the prediction of Endpoint [P] and Endpoint [C], while data from 7,060 heats were utilized for predicting Endpoint-Temp, all derived from the raw data of 13,528 heats.
Comprehensive feature engineering and importance analysis were conducted for both time series and tabular data.(2) Predictions were conducted using SOTA machine learning (ML) models (XGBoost, LightGBM, Catboost, TabNet) along with 45 tabular features comprising information from hot metal, scrap, additives, preset values, and blowing practice.Additionally, Back propagation neural network (BP-NN) and Support Vector Regression (SVR) were implemented, with their performance serving as a control.(3) Off-gas and blowing practice-related time series data were introduced as input for the data-driven endpoint prediction model for the first time.SOTA DL models for time series analysis (1D ResCNN, TCN, OmniScaleCNN, eXplainable Convolutional neural network (XCM), Time-Series Transformer [TST], Long short-term memory-full convolutional neural network [LSTM-FCN), D-linear) and image processing (Pre-activation ResNet, DenseNet, DLA, Dual path networks (DPN), GoogleNet, Vision Transformer) were implemented to make predictions using original and resized time series data, respectively.
(4) Two mixed-input models, named concat-model and paralmodel, were proposed to enhance prediction accuracy.These models utilize both tabular data and normalized time series as inputs.Their performance was evaluated using various DL backbones (1D ResCNN, TCN, OmniScaleCNN, XCM, TST, LSTM-FCN, D-linear), and comparisons were made to determine the most effective configuration.(5) An online prediction human-machine interface (HMI) was developed, incorporating models with the best performance for each target.This HMI was deployed in practical field production, and field test results from 300 heats were recorded.The interface design and detailed field test results are elaborated upon.

Related literature review
For endpoint prediction for BOF steelmaking, the following approaches have been adopted in previous studies: 1. Utilize theoretical models based on material balance, heat balance, thermodynamics, and kinetics to make predictions.This theoretical modeling approach idealizes the steelmaking process, resulting in poor predictive performance.For example, Wang et al. [1] proposed a model with oxygen balance, whose hit rate was only 62%, with an average error of within 3%. 2. Utilize tabular data and either original or modified BP-NN for prediction.The models' inputs consist of static information such as molten iron, scrap, and additives.For example, with tabular inputs, He and Zhang [2] proposed a BP-NN modified with principal component analysis (PCA-BP), and Wang et al. [3] proposed a BP-NN modified with genetic algorithm (GA-BP).Compared to theoretical models, these models exhibit a significant improvement in hit rates, achieving hit rate of 84-90%.There are more similar studies [4][5][6][7][8][9][10].The models only considered initial conditions and static process data, leading to lower robustness.3. Utilize tabular data and either original or modified support vector machine (SVM) for predictions.The advantages and disadvantages of this approach are like those of using BP-NNs.Gao et al. [11] proposed a k-nearest neighborbased weighted twin SVR, and there are more similar studies [8,12,13].The hitting rate was 86-93%.4. Utilize ML models with flame related data for prediction.Shao et al. [14] proposed a model based on SVM and flame radiation, and Zhou et al. [15] proposed a model based on SVM and flame spectrum.This method is also used in various studies [15][16][17], with its accuracy being like that of using tabular data but requiring additional equipment.If training directly using flame images, it would lead to a larger dataset.
Through the related literature review and the survey [17][18][19][20][21][22], it has been found that the accuracy of ML models used in previous studies has been surpassed by more advanced models.Furthermore, the data utilized in these studies are often static or represent a specific moment, thus failing to cover the entire steelmaking process.In this study, time-series data that cover the entire steelmaking process was introduced, and SOTA ML and DL algorithms were employed for prediction.

Data description and training process
The dataset was divided into three subsets: 70% for the training set, 20% for the validation set, and the remaining 10% for the test set.The data utilized for prediction comprises three categories: time series data, tabular data, and mixed data.Since data from the BOF steelmaking plan used three significant figures, all the data in this work adopted three significant figures.Their individual descriptions are provided as follows:

Time series data
A time series consists of a sequence of random variables arranged chronologically.In a two-dimensional context, it typically represents the curves of the target process, sampling values at a specified rate, and with uniform time intervals.The time series data for TSO prediction consists of seven curves, which are curves of off-gas total flow (Gas-Flow), lance height (Lc-Height), cumulative oxygen consumption (O 2 -Blow), and gas percentage of carbon monoxide (GP-CO), carbon dioxide (GP-CO 2 ), oxygen (GP-O 2 ), and hydrogen (GP-H 2 ).Off-gas refers to the reaction gas produced during BOF steelmaking.The off-gas profile refers to the collective term for percentage amount curves of four gases, which sensitively reflect the steelmaking reactions.Depending on the sampling sequence of the BOF system, the time interval between each time step is 1 s.As the steelmaking time varies for each heat, all the time series were zero-padded to 1,024 time-steps.An example of an original time series is illustrated in Figure 1.

Tabular data
The columns in tabular data define the features (tabular features), while the rows represent the values of each heat for these features.

Mixed data
Mixed data contained both tabular and time series data.

Targets and error metrics
The targets are molten steel's contents of phosphorus (Endpoint [P] (%)), carbon (Endpoint [C] (%)), and temperature (Endpoint-Temp (°C)) in endpoint.The target values in dataset are measured from TSO test.The distributions and relationships of the targets are shown in Figure 2. Combinatorial metrics were used to evaluate the model.
The basic metrics adopted were , where MSE is the mean square error; MAE is the mean absolute error; ; Y i is the true value of target with index i in dataset X ; Y ˆis the predicted value of Y i ; and n is the number of samples contained in X .
Traditional methods of assessing hit rates based on uniform standards are not applicable due to significant differences in the end points of various steel grades.In this study, the determination of the hit range is based on the target value.In on-site production, a statistical hit rate of ± ± 10, 15, and ±20% around the target value is commonly applied for che- mical tests.For temperature, this ranges are ± ± 10, 15, and ±20℃.SO, following metrics were also used to assist the evaluation: (1) The proportion of the predicted value distributes within ± ± 10, 15, and ±20% of the true value of Endpoint [P] and Endpoint [C].
(2) The proportion of the predicted value of Endpoint-Temp distributes within ± ± 10, 15, and ±20℃ of the true value.

Hyper-parameters tuning
The models were auto tuned with Bayesian optimization for 2,000 trials.Then, manual tuning is used to further adjust the hyper-parameters.The hyper-parameter boundaries and optimum hyper-parameters for ML models and the best models are detailed in the appendix.

ML with tabular data
The subsequent ML models are employed for predicting endpoint values using tabular data.It is a data mining process, and the inputs are tabular data.
XGBoost [23].XGBoost is an ML algorithm rooted in the gradient boosting framework, employing CART decision trees as its base learner.It enhances performance by integrating second-order Taylor series expansion and regularization into the loss function, optimizing it using the Newton-Raphson method rather than gradient descent.Furthermore, it implements techniques such as shrinkage, column subsampling, handling missing values, and feature block arrangement to improve predictive accuracy.Its structure is shown in Figure 3.It has the advantages of regularization that helps simplify models and prevent overfitting.However, it suffers from an abundance of algorithm parameters and is not suitable for handling ultrahigh-dimensional feature data.
LightGBM [24].LightGBM is an ML algorithm rooted in the gradient boosting framework.It introduces innovation through the histogram algorithm, which discretizes continuous float eigenvalues into k integers and constructs a histogram with a width of k.This histogram accumulates necessary statistics after traversing the dataset.Subsequently, an optimal segmentation point is determined based on the discrete values of the histogram.LightGBM employs a leaf-wise strategy for splitting, prioritizing leaves with the highest split gain, thereby achieving superior precision compared to the level-wise approach.It has the advantages of low memory occupation and higher accuracy, but it uses deeper trees that are easier to overfit.Its structure is shown in Figure 4.
CatBoost [25].CatBoost is an ML algorithm built upon the gradient boosting framework, utilizing oblivious trees as its base learner.It incorporates techniques such as null value processing, ordered target statistics encoding, and feature combinations to handle categorical features effectively.To address prediction shift issues, CatBoost employs an ordered boosting algorithm.Notably, it conducts categorical feature processing during training rather than as a pre-processing step.During the training process, CatBoost computes both random combinations of target statistics σ cat and random combinations of ordered boosting σ boost .It has high accuracy and low requirements for data preprocessing but need more memory storage.Its structure is shown in Figure 5.
TabNet [26].TabNet is a neural network-based ML algorithm that employs the sequential attention mechanism to mimic decision tree algorithms.Sequential attention consists of two crucial steps: utilizing the Attentive Transformer to identify important features and employing the Feature Transformer to convert feature values into a feature map.Both the Attentive Transformer and Feature Transformer utilize a feature block composed of a fully connected layer, batch normalization layer, and gated linear unit activation function in sequence as their base learner.Sparse regularization is utilized as the loss function.It is more suitable for fine-tuning and transfer learning, but it is prone to overfitting.Its structure is shown in Figure 6.
Importance analysis.Aside from data mining, further analysis of feature importance in tabular data is conducted based on the average scaled importance values assigned to each feature by all models above.This analysis combines these importance values with the slope sign derived from linear regression results.The culmination of this analysis is presented in Section 5.

Time series analysis models with time series data
After padding and channel-wise standardization, the time series is input into the following models to predict TSO    values.The input time series has dimensions of 7 channels and 1,024 time-steps.The following are SOTA models for different time series classification and prediction tasks (these two tasks are collectively referred to as time series analysis).They efficiently extract features from time series data and make accurate predictions.Their outputs are flattened and project to single values.The following is a brief overview of each model.1D ResCNN (ResCNN) [27].ResCNN is a convolutional neural network (CNN) using residual connections.To enhance the robustness of the model, residual block was fixed to the first three convolutional layers and different activation functions were adopted in different layers.In case of overfitting, global average pooling is applied instead of fully connected layer.
TCN [28].TCN is a model based on one-dimensional (1D) CNN.Its main characteristics are causal convolution, dilation convolution, and residual connection.Causal convolution ensures that the output of each time step is only related to previous time step.And dilation convolution enhances the limited receptive field of causal convolution.
OmniScaleCNN (OS-CNN) [29].OS-CNN is a model based on 1D CNN.Omni-Scale Block was proposed to increase 1D CNN receptive field and improve model robustness.This block automatically adjusts the kernel size and receptive field according to different time series to get the highest accuracy.
XCM [30].XCM is a compact model based on 1D CNN.It accurately identifies and utilizes important time steps by Gradient-weighted Class Activation Mapping, which is a post hoc model-specific explainability method.
TST [31].TST is based on self-attention mechanism.Its base learner is self-attention layer that consists of multihead attention network, feedforward network, and residual connection.The TST framework introduces an unsupervised pre-training regimen to provide significant performance benefits.
LSTM-FCN [32].LSTM-FCN uses both LSTM module and FCN module.The time series sensitivity is enhanced by adding LSTM modules.In particular, there are 1D CNN layers with batch normalization and layers with shuffle and dropout.The output of FCN and LSTM is concatenate as the final output.D-linear [33].The main feature extraction mechanism of the D-linear model is a linear layer network.It decomposes historical time series data into trend and remainder components with linear layer networks.Positional encoding is introduced during the decomposition to preserve sorting information.In this study, D-Linear-I was utilized to address underfitting resulting from weight sharing between different features, ensuring each feature has its independent linear layer.
( ) The structures of time series analysis models are shown in Figure 7 Importance analysis of time series.After TSO prediction, for further studying the relationship between TSO values and time series data, a single-layer gated recurrent unit (GRU) network based on the attention mechanism was developed.The hidden of the last time step was taken as Query, the hidden of each time step was taken as Key and Value for dot product according to equation (1) [30].By extracting the dot product results after Softmax operation, the weights (importance) of each time step can be obtained.With another single-layer attention based GRU with permuted time series, the weights of each channel can be also received.

Image analysis models with time series data
Each time series sequence tensor, consisting of 1,024 timesteps and 7 channels, was resized to a 32 × 32 image tensor with 7 channels to facilitate higher-dimensional feature extraction.This transformation converted endpoint prediction into an image regression task.SOTA image classification and regression (collectively referred to as image analysis) models were modified for image regression.The outputs of the models were flattened and projected to single values.Pre-activation ResNet [34].Pre-activation ResNet is founded upon CNNs, emphasizing direct transmission of feature information between modules.In its architecture, the conventional residual block is substituted with two sequentially arranged modules comprising batch normalization, ReLU functions, and convolutional layers.This substitution alleviates the training complexity of deep networks while preserving the model's parameter capacity.
DenseNet [35].DenseNet is a CNN architecture designed to address the issue of gradient vanishing.In the DenseNet structure, each layer's input is derived from the outputs of all preceding layers, establishing direct connections from early layers to later ones.Batch normalization and ReLU activation functions are employed between each convolutional layer.This design promotes information flow throughout the network, facilitating efficient gradient propagation and enhancing training performance.
Deep layer aggregation (DLA) [36].DLA serves as a foundational structure for consolidating deep networks.It encompasses two distinct types of aggregations: Iterative deep aggregation (IDA) and Hierarchical deep aggregation (HDA).In IDA, the aggregation node's input comprises the output of the current stage and the previous node.Conversely, in HDA, the aggregation node's input consists of the outputs of blocks and the preceding node, while the aggregation node's output is also fed into the subsequent set of blocks.In this study, DLA, based on two-dimensional convolutional networks, utilizes IDA to amalgamate stages and HDA to consolidate blocks within each stage.
Dual path networks (DPN) [37].DPN is a model that integrates key features from ResNet and DenseNet architectures.The practical implementation of the DPN network adopts ResNeXt, leveraging group convolution instead of ResNet, as the primary component.This modification effectively enhances the learning capacity of each block while mitigating the rapid expansion of DenseNet channels.Subsequently, slice and concatenate layers are applied to introduce additional DenseNet pathways.Notably, split and elementwise addition operations are performed on the outputs of both model components.
Vision transformer (VIT) [39].Vision transformer is rooted in the transformer architecture.It introduces a multi-head attention mechanism utilizing dot product, where each transformer layer comprises layer normalization, a feedforward network, and multi-head attention sequentially, with residual connections.VIT begins by embedding and projecting a patched image (C × H × W) to resize it to C × D. Following patch embedding and position embedding, the transformed image is forwarded to the transformer encoder to produce the final output.
The schematic diagram of the image regression process and the structures of SOTA backbones used are depicted in Figure 8.

Concat-model and paral-model with mixed data
To further explore methods for optimizing the utilization of available data and enhancing prediction accuracy, two DL architectures were designed for making prediction with both tabular and time series data.Their architectures are shown in Figure 9.
The first architecture is named concat-model.It involves embedded and reshaped tabular data, then concatenating it with the time series data in the channel dimension to form a new sequence.This new sequence will be input into time series analysis models for prediction.
The second architecture is dubbed the paral-model.It employs time series analysis models to handle time series data, while simultaneously utilizing a tabular data processing network composed of fully connected layers with dropout to process tabular data.The outputs of these two networks are added, flattened, and projected as the final output.

Results and analysis 5.1 Results of tabular feature importance
Figure 10 shows the tabular features and their feature importance in different tasks.And Figure 10 shows the distribution of importance across channels and time steps for the time series data in different prediction tasks.The features related to hot metal and oxygen blowing are more important in each task.The features related to additives are of lower importance.
Time series data from 300 heats were randomly chosen for channel and time step importance analysis.Figure 11  However, for Endpoint-Temp, mid-stage time steps are more important, while channels of GP-CO, GP-CO 2 , and GP-O 2 exhibit lower importance for prediction.For each target, the blowing practice related curves are more important.

Results of ML with tabular data
The prediction results of test set are shown in Table 2. Conventional BP-NN and SVR are also considered, and their performance are also listed and compared.The best values are given in bold.
Table 2 indicates that the targets exhibit high predictability with tabular data.Across the same task, the discrepancy in accuracy among models was minimal.

Results of time series analysis models with time series data
The performance of time series analysis models with different SOTA backbones on the test set is presented in Table 3, and the models are named with the backbones' name.The best values are given in bold.The training strategies are demonstrated in appendix section.According to Table 3, approximately 70% of the predicted values for Endpoint [C] and Endpoint [P] fell within ±20% of the actual values.For Endpoint-Temp, nearly 90% of the predictions were within ±20℃ of the true values.These results demonstrate a strong correlation between time series data and TSO values.It can be concluded that OS-CNN exhibited the highest accuracy for predicting Endpoint [P] and Endpoint-Temp, while ResCNN performed best for predicting Endpoint [C].Compared to ML models, using only time series data yields lower accuracy for predicting Endpoint [P] and Endpoint-Temp but higher accufor predicting Endpoint [C].

Results of image analysis models with time series data
The performance of image analysis models utilizing different SOTA backbones is depicted in Table 4. Model names are the same as backbones' name.The best values are given in bold.Table 4 indicates that image analysis models did not enhance forecasting accuracy.The best prediction effect of image analysis models with different backbone is similar to time series models.However, in practical scenarios, after hyperparameter tuning, image analysis models necessitate more parameters and are more prone to overfitting.Therefore, if only time series data are accessible for forecasting, time series DL models are recommended.

Results of concat-model and paral-model with mixed data
The prediction of concat-model and paral-model with different backbones is shown in Tables 5 and 6.They demonstrate that using mixed data with DL models can enhance the prediction accuracy of Endpoint [P]/[C]/Temp compared to using only time series data, especially for Endpoint-Temp.The concat-model outperforms the paral-model, achieving higher accuracy and reducing MAE by 4-10%.These findings suggest that a DL model using composite inputs may perform better than a DL model using only one of the inputs.At the same time, the way of feature merging also affects the prediction efficiency.The earlier the feature fusion is performed, the higher the feature fusion efficiency may be.Besides, the results also show that a single network can extract information from merged features efficiently.The best values are given in bold.
The visualized best prediction results in test set for each task is shown in Figure 12.The models are concat-model with backbone of TCN (for Endpoint [P]) and ResCNN (for Endpoint [C]/Endpoint-Temp).
According to all the above results, concat-models with TCN and ResCNN as backbone are the best models.A five-fold crossvalidation is used to strengthen the reliability of the model evaluations and further optimize the hyperparameters.Specifically, the training set is mixed with the validation set and the order is randomly shuffled.For a set of model hyperparameters, 0-20, 20-40, 40-60, 60-80, and 80-100% of this new dataset are used for model validation, respectively, while the remaining data are used for training.The average of the results of these five training sessions is used for selecting the best hyper-parameters.The performance of the models with the best hyperparameters in the test set is shown in Table 7.
After five-fold cross-validation, the performance of the model was further improved.The adjustment of hyperparameters is also experienced by all the training set and test set data, so that the has stronger confidence.

Results of comparison with other public works
The MAE and MSE of BOF endpoint prediction obtained from previous studies are juxtaposed with those derived from this work.Despite differences in data quality, objectives, application scenarios, BOF vessel structure, and steelmaking processes across these studies compared to the scenario addressed in this   8.

Discussion on the potential for overfitting
The models developed in this work employ many tabular features.To investigate whether such many tabular features have led to overfitting, specific experiments were conducted.In the experiments, the hyperparameters of the best models were kept, and the number of tabular features were changed to: 25 (by removing all features of additives,  the values are 2 × 10 −4 -3 × 10 −4 % (MAE) and 3 × 10 −6 -1 × 10 −5 % (MSE).And for predicting Endpoint-Temp, the values are 0.2-0.5°C.The results show that adding and reducing original features will not lead to overfitting, but reducing key features makes the underfitting.The removing of features has less impact for muti-input DL models.
To further explore whether the complex models used in this work have induced overfitting, additional experiments were conducted under the condition of using the original input features.10.
According to Table 10, the performance of the best models in the training set and the test set are like each other.However, based on the original model, a large reducing of the model complexity will lead to underfitting problems, while a significant increase in the model complexity may lead to serious overfitting.The overfitting caused by the increased model complexity on DL models is even more serious.

Significance of predictions
1. Optimizing the steelmaking process.(1) The models allow for continuous output of predictions.In the later stages of smelting, since the additives have been completed, for the model, its predictions are solely related to the time series and the oxygen blowing volume is updated every second.With such dynamic inputs, the models can generate a decarburization/temperature rise curve to dynamically control the endpoint accordingly (The "Dynamic" mode of the prediction software).( 2) The importance analysis based on the developed model has demonstrated the significance of various inputs, enabling targeted adjustments to the process based on their importance.For instance, in controlling the endpoint temperature, attention should be paid to controlling the amount of Newman ore added according to the steel grade and process requirements.that cannot be utilized due to manual input.In such cases, if there are oxygen blowing-related curves and gas composition data from coal gas recovery, endpoint prediction can still be achieved.

Results of field application
The best models have been applied to practical field production.The interface of the python-based online prediction software is shown in Figure 13.They receive data from dynamic database and make prediction with the best models for each task.It has two predicting methods.For dynamic method, it predicts each target automatically for each second.And in static method, it only makes predictions when the "Predict" button is pressed.When selecting "Tabular" button, the ML models will be used for prediction with tabular data.When selecting "Time Series" button, the concat-model will be adopted.
In the actual field production, the model prediction results of 300 furnaces were recorded.The MSE, MAE, and visual running chart of the predicted results are shown in Figure 13.The MSE for predicting Endpoint  [P]/[C]/Temp are 9.419 × 10 −6 %, 6.857 × 10 −5 %, and 104.021°C.And the MAE are 2.394 × 10 −3 %, 6.231 × 10 −3 %, and 7.679°C.In the case of field uncertainty, the prediction accuracy of Endpoint [P] and Endpoint-Temp is only slightly lower than that of the test set.The prediction result of Endpoint [C] is better than that of test set.This proves that the model has good robustness and generalization ability.The results are shown in Figure 14.

Conclusion
This study concentrated on predicting the endpoint chemistry and temperature (Endpoint [P]/[C]/Temp) of BOF steelmaking using SOTA ML (XGBoost, LightGBM, Catboost, TabNet) and DL models, coupled with varied inputs gathered from 13,528 heats.It provides the potential to eliminate the TSO test.The best models were implemented in practical field production, and the prediction results of 300 heats were recorded.By integrating the testing and field application results, the following conclusions were drawn.greatly, but middle stage of the curves was more important for prediction of Endpoint-Temp.6. Limitations of the study and directions for future research: Due to constraints in project planning and computational resources, this study proposed several generalized robust models.In the future, it is advisable to establish specific prediction models for the main products based on different steel grades and their process characteristics, aiming to maximize prediction accuracy.

Figure 1 :
Figure 1: An example of an un-padded time series.(a) Off-gas profile; (b) curves of oxygen lance height, accumulative oxygen consumption, and offgas flow rate.

Figure 2 :
Figure 2: The statistical distributions and relationships of the targets.

Figure 3 :
Figure 3: Structure of XGBoost, f K is loss function in kth tree.

Figure 5 :
Figure 5: Structure of CatBoost, N is number of samples.

Figure 7 :
Figure 7: The structure of all the time series analysis models.The outputs of all models are flattened and processed by a fully connected layer with dropout to produce a single output, and training will be conducted using the rooted mean squared error (RMSE) loss function.And Conv1d is 1D convolutional network, Conv2d is two-dimensional convolutional network.k is the number of split token of inputs and latent.The structures are (a) 1D ResCNN; (b) TCN, B 1 is the number of shown block; (c) OS-CNN; (d) XCM; (e) TST, z, and u are processed token, and B 2 is the number of transformer encoder; (f) LSTM-FCN; and (g) D-linear.

Figure 8 :
Figure 8: Diagram of the image regression process and the structures of SOTA backbones.(a) Diagram of the image regression process.The structures are (b) Pre-activation ResNet; (c) DenseNet; (d) DLA; (e) DPN; (f) GoogleNet; and (g) VIT.

Figure 9 :
Figure 9: Architectures of (a) Concat-model and (b) paral-model, FC is fully connected layer.For keeping the same complexity, the tabular data are embedded in the same pattern.

Model- 25 )
, 36 (by removing the physical and chemical features of hot metal, Model-36), and 37 (by merging the features of additives before and after the TSC test, Model-37).Using the best ML and DL models (concat-model) in each task with the same training and tuning process, the performance disparity between the training and testing datasets for the model has been documented in Table9.The MAE/ MSE (test-train) is the difference of the MAE/MSE in the test set and the training set.According to Table9, taking MAE and MSE as criterion, for predicting Endpoint [P], when removing tabular features, the difference between training set and test set of all models are distributed with 2 × 10 −4 -2.5 × 10 −4 % (MAE) and 1.5 × 10 −6 -6 × 10 −6 % (MSE).For predicting Endpoint [C],

Figure 12 :
Figure 12: Running chart of prediction results of best models.

Figure 13 :
Figure 13: Interface of the prediction software.

Figure 14 :
Figure 14: Field application results of the best models.

Table 1
displays the features of tabular data and their distribution.It consists of features of endpoint targets, information of hot metal, scraps, additives, blowing practices, and Temperature, Sample, Carbon test (TSC) related features.The pre-set oxygen blown values came from second calculation.

Table 1 :
Descriptive statistic of tabular data is importance analysis results of time series data.It demonstrates that time series data exhibit comparable distributions of time step and channel importance for predicting Endpoint [P] and Endpoint [C].Later stage time steps and channels of O 2 -blow, flow, and CO are more crucial for predicting Endpoint [P]/[C].

Table 2 :
Performance of ML models with tabular data in different tasks

Table 3 :
Performance of time series analysis models with time series data in different tasks

Table 4 :
Performance of image analysis models with time series data for different tasks Bold values represent best values for each task.

Table 5 :
Performance of concat-model with different backbones in different tasks

Table 6 :
Performance of paral-model with different backbones in different tasks Endpoint prediction of BOF steelmaking based on SOTA ML and DL algorithms  15 study, they still hold significance for reference and comparison purposes.The results are shown in Table 2. Reduce resource consumption.By eliminating the TSO test, each heat can directly save $20-30 in disposable sampler consumption.Along with savings in cooling and driving energy consumption, even if only 50% of heats cancel the TSO test, each steel plant can save nearly half million dollars annually.

Table 8 :
Results of comparison with other public works

Table 7 :
Results of cross validation

Table 9 :
The performance of models with different model complexity

Table 10 :
Results of changing best model's complexities 1.The best prediction MAE of SOTA ML models with 45 tabular features (features of hot metal, scraps, additives, and so on) for Endpoint [P]/[C]/Temp were 2.276 × 10 −3 % (Catboost), 6.916 × 10 −3 % (Catboost), and 7.955°C (LightGBM), respectively, which exceeded the performance of conventional BP-NN and SVR models.2. The time series data (off-gas profile and blowing practice related curves) has strong relationships with the targets.The targets can be predicted with only original time series data and SOTA DL time series analysis models (1D ResCNN, TCN, OS-CNN, XCM, TST, LSTM-FCN, D-linear).For predicting Endpoint [C], the best accuracy (MAE = 6.791 × 10 −3 %, ResCNN) was better than SOTA ML models with tabular features.But for predicting Endpoint [P] (MAE = 2.499 × 10 −3 %, OS-CNN) and Endpoint-Temp (MAE = 10.169°C,OS-CNN), the accuracies are lower.If using resized time series data and DL image analysis models (pre-activation ResNet, DenseNet, DLA, DPN, GoogleNet, VIT), the accuracy will not improve.3. Making prediction with both time series data and tabular data with designed concat-model and paral-model would lead to a higher prediction accuracy.The concatmodel with backbone of TCN (MAE = 2.153 × 10 −3 % for Endpoint [P]) and ResCNN (MAE = 6.413 × 10 −3 % for Endpoint [C] and MAE = 5.780°C for Endpoint Temp) performed the best, which achieved the highest accuracies for each target in this study.4. The best models (concat-model with backbone of TCN and ResCNN) have been applied in field production.The field test results showed that the prediction MAE for Endpoint [P]/[C]/Temp were 2.394 × 10 −3 %, 6.231 × 10 −3 %, and 7.679°C, respectively.The accuracy of predicting Endpoint [C] is better than that of test set.And the accuracies of predicting Endpoint [C]/Temp are near to that of test set.The results served as an evidence of the developed model's strong robustness and generalization capability. 5.The off-gas related curves held less significance compared to those representing blowing practices for all targets.The onset and the final stage of curves influenced the prediction of Endpoint [P] and Endpoint [C]