BOF steelmaking endpoint carbon content and temperature soft sensor model based on supervised weighted local structure preserving projection

: Endpoint control stands as a pivotal determinant of steel quality. However, the data derived from the BOF steelmaking process are characterized by high dimension, with intricate nonlinear relationships between variables and diverse working conditions. Traditional dimension reduction does not fully use non-local structural information within manifold shapes. To address these challenges, the article introduces a novel approach termed supervised weighting-based local structure preserving projection. This method dynamically includes label information using sparse representation and constructs weighted submanifolds to mitigate the in ﬂ uence of irrelevant labels. Subsequently, trend match is employed to establish the same distribution datasets for the submanifold. The global and local initial neighborhood maps are then constructed, extracting non-local relations from the submanifold by analyzing manifold curvature. This process eliminates interference from non-nearest-neighbor points on the manifold while preserving the local geometric structure, facilitating adaptive neighborhood parameter change. The proposed method enhances the adaptability of the model to changing working conditions and improves overall performance. The carbon content prediction maintains a controlled error range of within ± 0.02% , achieving an accuracy rate of 82.50%. The temperature prediction maintains a controlled error range of within ± 10°C, achieving an accuracy rate of 79.00%.


Introduction
The basic oxygen furnace (BOF) steelmaking method employs high-purity oxygen as an oxidant, harnessing both the physical heat generated by molten iron and the chemical heat from oxidation to fulfill steelmaking requirements [1].BOF has emerged as the predominant steelmaking process due to its high capacity, cost-effectiveness, rapid equipment deployment, and superior product quality.During BOF steelmaking, oxygen is introduced into molten iron, followed by slagging operations and a sequence of physicochemical reactions, culminating in the attainment of the desired steel composition and temperature, termed the endpoint.Efficient and environmentally friendly production necessitates precise endpoint control, with carbon content and temperature serving as pivotal indicators of steel quality.Endpoint control methods in BOF steelmaking encompass both contact measurement and non-contact indirect observation techniques.Due to cost considerations, manual observation remains prevalent, with operators relying on visual cues from the BOF flame, albeit susceptible to factors like operator skill and experience.In recent years, the development of distributed control systems has facilitated data acquisition and analysis, enhancing precision and minimizing human subjectivity in process monitoring.
Advancements in computer technology have enabled the flexible application of soft sensor methods across diverse industrial processes, optimizing production costs while ensuring stability, and effectively solving the problem of the traditional method that makes it difficult to measure the target variable continuously and accurately [2][3][4].
However, the dynamic nature of steel production, combined with varying raw material quality, presents challenges.Fluctuations in scrap iron quality and the inherent complexity of process mechanisms result in nonlinear characteristics, necessitating adaptive operating strategies [5,6].Variations in impurity content, chemical composition, and melting points of raw materials directly influence furnace reactions, leading to a change in working conditions that necessitates adjustments to parameters such as oxygen pressure and gasblowing volume.Consequently, constructing a static single model with high prediction accuracy becomes challenging due to disparities between furnace samples and the nonlinear nature of chemical processes [7].
To address the aforementioned challenges, this study employs a localized modeling approach to capture the dynamic fluctuations within process data.Just-in-time learning (JITL), representing a standard framework in local modeling, employs similarity metrics to select the most relevant historical samples for query samples from a historical database.It then constructs a local model for output prediction and adapts the model in real time as new query samples arise [8].Yu [8] introduced the JITL based on data-driven (JITL-DD) strategy for process monitoring, entailing the construction of a nonlinear model and residual calculation to derive monitoring outcomes.Yang et al. [9] argued that a time series consists of a potential trend and a rapidly changing sequence of stochastic residuals.Building on this, the present study integrates the JITL concept with a trend match (TM) module, dynamically constructing a similar trend dataset based on sample trend characteristics.
JITL modeling performance is intricately linked to the metrics.However, in the context of BOF steelmaking, which involves multiple input variables, the model's data point requirements increase with dimension, leading to longer algorithm runtimes [10].Moreover, metric effectiveness diminishes with rising data dimension, and the accuracy of DD soft sensors relies heavily on the similarity of sample distributions in training and test sets [7].To address this issue, a dimension reduction model is introduced to simplify data complexity.Manifold learning aims to preserve the spatial geometric structure of the dataset, with classical algorithms such as Laplacian Eigenmap (LE) utilized to maintain local neighborhood structures by minimizing distances between projections of neighboring data points.The locally conserved nature of the LE algorithm renders it relatively robust to outliers and noise, thereby emphasizing natural data clustering [11].Previous research by Liu et al. [12] demonstrated the utility of the LE algorithm in hyperspectral image dimension reduction, highlighting enhanced resolution and correlation representation postreduction.Similarly, Zhang et al. [11] applied LE for nonlinear dimension reduction in near-infrared spectral data, resulting in higher prediction accuracy and adept handling of nonlinear variable relationships.
However, the direct application of the LE algorithm to BOF process data is hindered by inherent variability resulting from process perturbations, control differences, sensor drift, equipment degradation, and environmental fluctuations.Even under identical raw material input and operating conditions, variability across different furnaces impedes complete production replication [13].As a distance-based analysis method, K-nearest neighbor (KNN) uses Euclidean distance to determine the nearest neighbor.The LE algorithm uses KNN to construct the adjacency matrix.Consequently, direct application of LE to BOF process data yields suboptimal performance, with local relationships among original data points compromised in low-dimensional embeddings.Li and Liu [14] proposed an adaptive LE algorithm that dynamically adjusts neighborhood parameters based on sampling density and manifold curvature, enhancing adaptability and robustness.Moreover, LE, as an unsupervised algorithm, overlooks label information, measuring similarity solely based on label-independent data characteristics [15].Kanghua and Chunheng [16] imposed strict constraints on KNN graph construction, limiting neighboring points to those within the same category.However, in sparsely sampled scenarios, this method fails to adequately explore manifold geometry [17].Additionally, the BOF steelmaking process necessitates distinguishing production data with diverse characteristics, such as mean, variance, and correlation, corresponding to different production requirements [18].While Li et al. [15] proposed that assuming data located on a single manifold has the same labels, data belonging to different manifolds will be labeled differently.For example, a person's face image is considered on one manifold, while another person's face image will be distributed on another manifold.Raducanu and Dornaika [19] introduced label information to split the Laplacian graph into intraclass and interclass graphs, using the average distances between each sample and all the samples as a threshold for determining class membership.However, this approach overlooks non-local relationships within the same labeled data and fails to analyze relationships between corresponding manifolds of differently labeled data [20].Despite considering label differences, this method does not fully capture similarity within class-labeled data or dissimilarity between classes.
Drawing from the aforementioned analysis, introduces a novel supervised weighted local structure preserving projection (SWLSPP) algorithm designed to extract the most pertinent labels relevant to the current query sample via sparse representation (SR).Subsequently, weights are assigned to these labels based on their degree of relevance.Leveraging the outcomes of these relevant labels, TM is executed following the extraction of the pertinent submanifold.This process constructs a uniformly and dense distributed dataset with similar data characteristic performance, thereby facilitating the construction of global and local initial neighborhood graphs.Subsequently, the non-local characteristics of the manifolds are elucidated through curvature analysis, enabling the removal of non-nearest neighbors on the manifolds to optimize neighbor selection.The resultant optimization outcomes serve as inputs for the dimension reduction model, thereby resolving the issues encountered with the original LE algorithm, including the tendency to include non-nearest neighbors on the manifolds during neighborhood matrix construction and the lack of label information for weight assignment.
The contributions of this study can be summarized as follows: 1.In response to the absence of label information in the original LE algorithm, the frequent fluctuations in BOF conditions, and the differences in data characteristics across different batches, an adaptive supervised neighborhood graph construction strategy is proposed.This strategy extracts submanifold relevant to new query samples based on their characteristics, eliminates the influence of irrelevant samples through TM, and retains data exhibiting similar trends to construct the initial neighborhood graph.2. To address the issue of including non-nearest neighbors in the construction of the adjacency matrix, a manifold curvature analysis (MCA) strategy is proposed.This strategy involves adaptive optimization of neighborhood parameters to accurately preserve local geometric structures.3.An experimental investigation using actual BOF steelmaking process data is conducted to validate the effectiveness of the proposed method through ablation experiments and comparisons with other soft sensor methods.
The remainder of the article is structured as follows: Section 2 provides a review of relevant knowledge pertaining to the proposed method and its underlying motivation.Section 3 elucidates the structural arrangement of the proposed method, offering a detailed exposition of its algorithmic components.Section 4 delineates the soft sensor model for BOF steelmaking process data based on SWLSPP.Section 5 presents experimental results and analyses.Finally, Section 6 offers conclusions.

Relevant theories and research motivations
Section 2 delineates the components of the proposed algorithm and its rationale.By providing a description of how the SR and LE work, their mechanisms, the motivation for their introduction, and how they work together within the framework are explained.

Sparse representation (SR)
The SR posits that a given sample can be expressed as a linear combination of other samples, where the sparse coefficients denote the extent to which other samples contribute to reconstructing the given sample.Higher coefficients indicate stronger similarity.Consequently, the reconstruction coefficients serve as a measure of sample similarity [21].
Additionally, SR offers the benefits of few parameters and robustness to data noise.The SR process is shown in Figure 1.
The darker the color of f in the figure, the bigger the coeffi- cient represented, and the bigger the contribution of the corresponding sample in the dictionary X T to reconstructing the given sample x t .Given a sample set = X x x x , , , n , m is the sample dimension, and n is the total number of samples.There exists a query sample x t and a pre-given set of samples X .It is customary to refer to the matrix X as an over-complete dictionary and x k as an atom.Finding the sparsest representation of the x t using a dictionary X requires minimizing the objective function: where f is the sparse coefficient.The f 0 ‖ ‖ is the 0 l -norm, denotes the number p of nonzero entries in the vector f .x t can be repre- sented linearly by satisfying ≈

Laplacian eigenmap (LE)
LE, as one of the representative nonlinear manifold learning methods, finds the low-dimensional embedding of the original data by preserving the similarity relationship between local points, which can better explain the geometric structure of nonlinear manifolds [15,19].
The steps for realization are as follows.
(1) Create a closest neighbor graph G using nodes as data points and edges generated by the k-NN method.
(2) Create a weight matrix W. Using a heat kernel, compute the edge weights w ij .
x x w 0, and are not neighbors, and are neighbors, where ∈ σ .(3) Feature Mapping.To find the low-dimensional embedding, the eigenvalues and related eigenvectors are calculated.The eigenvalue decomposition is applied on the Laplace matrix = − L D W , i.e., = Ly Dy λ , where D is a diagonal matrix that satisfies = ∑ D w ii j ij , λ and y are the generalized eigenvalues and eigenvectors, respectively.The coordinates of the reconstructed space are the eigenvectors with the smallest d non- zero eigenvalues ( ≪ d M , where d is the number of low dimen- sions and M is the number of dimensions of the original high- dimensional space).
The traditional LE algorithm is only suitable for dealing with uniformly distributed flat manifolds, so for sparsely distributed or highly distorted manifolds [14], it is easy to mistakenly include the non-nearest neighbor points on the manifolds into the adjacency graphs, which incorrectly extracts the local geometric structures and interferes with the regression modeling.Meanwhile, the traditional LE algorithm, as an unsupervised dimension reduction method, lacks the use of label information, and different labels do not reflect the differences in weight assignment when constructing the neighborhood graph.

Supervised weighted local structure preserving projection (SWLSPP)
The LE algorithm is a locality-preserving manifold learning approach designed to uncover a low-dimensional representation of high-dimensional data by maintaining the local geometry among the original data points [15].However, numerous experiments have demonstrated that the efficacy of dimension reduction heavily relies on accurately setting the neighborhood size.Furthermore, LE fails to leverage labeled data information.When utilizing the original LE algorithm to construct the nearest-neighbor graph, samples with disparate labels may exhibit shorter Euclidean distances than those with identical labels, leading to the inability to differentiate points with distinct labels in the dimension reduction outcomes [15].
To tackle the aforementioned challenges, the article introduces an SWLSPP algorithm for BOF steelmaking production process data.Its objective is to derive a low-dimensional embedding of the high-dimensional data, which serves as input for the regression model.A diagram of the SWLSPP algorithm is shown in Figure 2, where the left part of the diagram corresponds to Sections 3.1 and 3.2, and the right part corresponds to Section 3.3.

Supervised sparse weights (SSW)
Supervisory information significantly influences feature extraction [15].Utilizing labels allows for the categorization of data with the same or similar labels, facilitating the extraction of differentiation information concealed within the original data.While the original LE algorithm can extract the low-dimensional embedding of high-dimensional data if all data points are situated on a single continuous manifold, BOF steel production process data typically exhibit various operational conditions, resulting in a complex nearest-neighbor relationship among data clusters.Assuming that data with identical or similar labels reside on distinct manifolds, the dataset manifests a multi-manifold structure, necessitating the extraction of information from corresponding manifolds for data with different labels [15,19].
Although raw LE adeptly captures and preserves local manifold shape information within the dataset, it falls short in segregating differently labeled data.To address this limitation, this section introduces an SSW module.Using label information, this module divides the historical dataset into relevantly and irrelevantly labeled subsets based on the characteristics of current query samples.Subsequently, it quantifies the relevance of each labeled set by assigning weights, thereby emphasizing label information.

Definition 1. Supervised sparse weighting module (SSW)
Step 1: The current query sample x t is represented as a linear combination of the historical sample set , m is the sample dimension, and n is the total number of samples) by SR, from which the sparse coefficient f is obtained, the larger the coefficient, the greater the degree of similarity of the query sample x t with the corre- sponding historical samples, whereby the sparse coefficient f is regarded as a similarity metric of x t with the historical sample set X as a result of the similarity measure.The sparsity coeffi- cient f is calculated from equation (1).
Step 2: The labels of the historical samples corresponding to the index are extracted based on the non-zero terms in the sparse coefficient f and where p is the amount of non-zero terms in the sparse coefficient f .
Step 3: Removing duplicate labels gives the set of related labels = < l l l q p label , , , , Step 4: The non-zero values of the sparsity coefficients corresponding to the statistics k [ ] are used as the degree of relevance W of different labels for the current query sample x t .
An SSW module is employed to disentangle historical sample sets from the influence of irrelevantly labeled data clusters, particularly in scenarios with complex distributions and multiple working conditions.This module quantifies the degree of association among relevantly labeled data sets, thereby mitigating the interference of irrelevant information on subsequent processing.

Trend match (TM)
While process data from distinct stages may exhibit varying numerical performances due to differences in initial conditions of the BOF and raw material batches, it is theoretically expected that the autocorrelation and inter-correlation among process variables within each stage adhere to intrinsic mechanistic relationships [22].In the context of regression tasks, capturing similar trend samples is crucial for preserving their shared characteristics and production trends.
To address this, we introduce the TM module.This module identifies historical data with analogous trends in data sets associated with different relevant labels for the query sample x t by assessing the synchronization of change trends between samples, the similarity of underlying shapes, and comparisons of material values.Subsequently, it conducts modeling and prediction based on these evaluations.By analyzing and extracting similar trends, we can better understand and leverage their common characteristics, thereby enhancing the accuracy and reliability of the soft sensor model (Figure 3).

Definition 2. Trend match module (TM).
Step 1: A quantitative evaluation of the degree of synchronization of the change trends and the degree of similarity of the base shapes between the historical samples and the query samples x t is made based on the Pearson correlation coefficient, which is calculated by the following equation: where )is the covariance between x i and x t , x i is the ith history sample, x t is the query sample, and σ * is the standard deviation.The customized synchronization degree threshold C is determined by the following equation: where N is the number of historical samples involved in the current sub-TM.
Step 2: The historical samples ) are formed into the initial filtered dataset.Considering the performance of different raw material ratios and furnace parameters on process data values, the difference in numerical performance is calculated on the basis of the initial filtration, and the samples with a difference less than the threshold value δ 1 are retained to form the filtered dataset X 1 with the following equation: where δ 1 is the TM threshold and the value of the para- meter is determined experimentally.This results in a TM result, i.e., dataset , . The matching result, X PF , contains samples in the historical dataset that have the same characteristics and trends as x t , reflecting similar physical properties, chemical compositions, furnace conditions, etc.

Manifold curvature analysis and adjacency matrix construction (MCA & AMC)
Traditional LE algorithms hinge on the accurate reflection of internal manifold structure by local neighborhood settings [14].Oversized parameters are prone to eliminating small-scale manifold structures, leading to short-circuiting, while undersized parameters may result in manifold splitting [23].The k-NN often generates distorted neighborhood structures for sparse and noisy data [23], exacerbating the shortcircuiting phenomenon.Short-circuiting refers to the proximity of folding surfaces on the manifold, causing the search for nearest neighbors to include points from different folding surfaces rather than neighboring points on the manifold, thereby necessitating neighborhood optimization.As depicted in Figure 4, the red circle illustrates the neighborhood range constructed by the traditional distance metric, wherein point a serves as the origin, and points c and d from the other folded plane are erroneously included.Even when employing solely geodesic distance as the distance metric, depicted in the gray section of the figure, point c, not residing in the same local linear plane, may still be considered a neighboring point.Hence, both metrics must collaborate in analyzing the curvature of the manifold in high-dimensional space to accurately construct neighborhoods.Given the non-uniform distribution of BOF steelmaking process data, the traditional k-NN can construct oversized neighborhoods in sparse regions.To address this, we introduce the MCA module in this section.This module incorporates geodesic distance proximity and subsequently filters out non-nearest neighbors after computing the curvature of local manifold shapes, enabling adaptive adjustment of neighborhood parameters to enhance model performance.Subsequently, the corresponding weights from the SSW module for each labeled dataset are merged and applied to each local neighborhood matrix.These matrices are then fused with the global neighborhood map to derive the fused neighborhood map for participation in dimension reduction.

Definition 3. MCA module
Step 1: The k neighborhood of any point in X PF is computed using Euclidean distances, thus forming the initial neighborhood graph.
Step 2: Solve for the geodesic distance x x d , g i j ( ) between any two points on the initial neighborhood map.
If x i and x j are close neighbors of each other, then the geodesic distance is the Euclidean distance x x d , as shown in equation ( 6), and N i ( ) is the neighborhood of x i .
i j g e i j j j ( ) Step 3: Using x h , iteratively compute all x x d , , , , , .
Step 4: Adjust the neighborhood parameters based on the manifold curvature.Calculate the ratio of x i to the geodesic and Euclidean distances of the neighboring points as the manifold curvature with the following equation: When λ is greater than the curvature analysis threshold δ 2 , x i and x j are considered to be on the same sub-manifold surface; otherwise, they are considered to be on different folded surfaces and non-manifold nearest neighbors, at this time, x j is removed from the neighborhood N i ( ), which achieves the curvature analysis of the manifolds and optimization of the neighborhood parameters, and the resulting optimized curvature analyzed neighborhood graph G MCA is obtained.
, and in the same subfluid plane, , and in different folding surfaces.PF i .After that, G XPFi is sequentially connected head to tail in diagonal form and fused into a locally optimized neighborhood graph G L .
Step 2: The different labeled sets = X i q 1, 2, , PF i ( ••• ) obtained from Definitions 1 and 2 are sequentially merged to obtain the global similarity trend dataset, which serves as the input to the MCA module to obtain the globally optimized neighborhood graph G G .
Step 3: The globally optimized neighborhood graph G G and the locally optimized neighborhood graph G L are fused, i.e. = + G G G G L .The resulting fused neighborhood graph G is obtained and used as an input to the subsequent dimension reduction model.The data from BOF steelmaking processes exhibit high dimension, multiple working conditions, and uneven distribution.Given the limited performance of the traditional model, this part proposes a SWLSPP soft sensor model for predicting carbon temperature at the endpoint of BOF steelmaking, built upon traditional LE algorithms.The overall soft sensor modeling steps are delineated as follows: Step 1: Utilize SR to identify the most relevant labels and their corresponding weights for the query samples, thereby eliminating interference from irrelevant labels.
Step 2: Apply TM within each valid label's data set to extract samples akin to the working conditions of the query samples, filtering them for participation in modeling.
Step 3: Based on the TM outcomes, introduce the MCA module to adaptively optimize neighborhood parameters.This facilitates the construction of both globally optimized neighborhood graph G G and locally optimized neighbor- hood graph G L .
Step 4: Generate predictions for the endpoint carbon content and temperature of the current query sample.
The algorithm pseudo-code is shown below.Meanwhile, the experiments were done in MATLAB2021a and run on a computer equipped with i7-12700H (Intel(R) Core (TM) @ 2.3 GHz) CPU and 16 GB RAM. Algorithm:

Experiments and analyses
To validate the efficacy of the proposed method, simulation experiments are conducted using real BOF steelmaking process data, following the modeling process outlined in Section 4.

Evaluation indicators
In order to evaluate the prediction performance, regression accuracy (RA), root mean square error (RMSE), and mean absolute percentage error (MAPE) are used as metrics in this article, among which RA is the most important, which indicates the prediction accuracy of the model for the test set within the allowable error.The formula for RA is as follows: where N test is the number of test samples, ∼ y i and y i are the predicted and actual values of the ith output endpoint, respectively, and ) is the conditional function, which outputs a 1 to indicate a prediction hit if the difference between the two outputs is within the range of tolerable error Te ( ).RMSE and MAPE are used to assess the regression model performance; the smaller the value, the better the model performance; the formula is as follows:

Data introduction and parameter setting
The experimental data were obtained from actual BOF steelmaking production data at a steel plant, with the target variables being the endpoint carbon content and temperature, i.e., the output variables.These samples were collected under different production conditions due to changes in raw material origins leading to quality differences between batches, adjustments in raw material ratios and furnace parameters, and changes in production conditions.First, the original dataset is checked and processed for missing data and outliers, and the corresponding samples are removed.Then, using the min-max scale, each variable to the range [0,1] using its minimum and maximum values, as shown in equation (14).Normalization is a preprocessing step crucial and effective for distance-based algorithms, equalizes the influence and importance of feature scales and thus prevents wide ranges or higher magnitudes from having a greater influence on learning [26] where V max ( ) and V min ( ) are the maximum and minimum values of each variable in the original data, respectively.′ v i is the normalized value of each variable in the range 0,1 [ ].A total of 3,000 furnace samples under normal operating conditions are collected in the experimental process, and 2,800 furnaces are randomly divided into 2,800 furnaces as the training set and 200 furnaces as the test set.As the raw process data contains a large number of redundant features.Therefore, based on previous research, 33-dimensional raw features are selected as input variables for the model according to feature importance [24,25] (Table 1).
Table 2 shows the optimal parameter values corresponding to the experiments.

Impact of TM thresholds and curvature analysis thresholds
The TM threshold δ 1 and the curvature analysis threshold δ 2 can have a great influence on the prediction accuracy.
For the prediction of carbon content, the value interval of δ 1 is set to 0.0008, 0.003 [ ] with a step size of 0.0001, and the value interval of δ 2 is set to 0.86, 0.95 [ ] with a step size of 0.01, and the effects of different values of the thresholds on the results are analyzed, and the results are shown in Figure 5.As can be seen from the figure, the minimum value of RMSE occurs at = = δ δ 0.001, 0.9 , when the RA reaches the highest.The graph shows a clear inverse relationship between RMSE and RA, which proves that the model prediction results fit the real values better and have better performance.
For the prediction of temperature, the interval of δ 1 is set to 0.0004, 0.001 [ ] with a step size of 0.0001, and the interval of δ 2 is set to 0.87, 0.94 [ ] with a step size of 0.01, and the effect of different values of the thresholds on the results is analyzed, and the results are shown in Figure 7.As can be seen from the figure, the minimum value of RMSE occurs at = = δ δ 0.0006, 0.88 1 2 , while the RA reaches the highest.In Figure 6 in the beginning stage RMSE and RA trends are similar, and then the tuning adjustment gradually shows an inverse relationship; this is due to the unreasonable parameter settings in the beginning stage, resulting in the model underfitting, cannot fit the data better, with the tuning, the model gradually improved.

Impact of the number of nearest neighbor points
The LE algorithm is used as a local dimension reduction model, and the local geometry of the manifold shape is usually constructed based on the k-NN [15].For carbon content prediction, the experiments were carried out in the interval [16,30] to determine the value of the number of nearest neighbor points k, and for temperature prediction, the experiments were carried out in the interval [10,24] to determine the value of the number of nearest neighbor points k, with a step size of 1.It was verified experimentally that, under the optimal settings of other parameters, the best performance of carbon content and temperature prediction was achieved with the values of the number of nearest neighbor points k being 24 and 11, respectively.Figures 7-10 show some of the experimental results.Table 3 shows of the experimental results; the larger the sphere volume, higher the RA generated based on the corresponding parameters.From Figures 7 and 8, it can be seen that with the adjustment of the number of nearest neighbor points k, the RA of the model shows an increasing and then decreasing trend when predicting the carbon content and reaches a peak at k = 24.It shows that when k = 24, the dimension reduction results retain sufficient data structure information, so that the model can better capture the neighbor relationship between data points.As the k value continues to increase, the performance of the model begins to decline.Setting a larger value of the nearest neighbor point may increase the sensitivity of the noise, so that the low-dimensional embedding extraction results are greatly affected by the noise.
For Figures 9 and 10, as the number of nearest neighbor points k increases, the irrelevant information contained in  the construction of the adjacency matrix gradually increases, and the RA tends to decrease significantly.With the increase of the number of neighboring points, the model will search for adjacent points more widely in the high-dimensional space, resulting in the points far away from each other participating in the construction of the adjacency matrix, which contains irrelevant information and reduces the RA.

Impact of low dimension
Accurate parameter setting of low dimension can effectively retain the spatial geometric information of the data and improve the performance of the dimension reduction model.The experiments were carried out in the interval [6,16] to determine the value of the low dimension d, with a step size of 1.It was experimentally verified that the best prediction of carbon content and temperature with the optimal settings of the other parameters was achieved with the values of 6 and 7 for the low dimension d (Figure 11).Some of the results of the experiments are shown in Figures 12 and 13, where the greater the volume of the sphere, the greater the accuracy of the regressions generated based on the corresponding parameters (Table 4).

Ablation experiment
Ablation experiments are conducted to validate the effectiveness of the proposed method.The regression model          As depicted in Figures 13 and 14, the SSW_MCA&AMC configuration lacks the TM module to filter out samples with significant discrepancies.Consequently, the model extracts information with greater interference, resulting in the poorest fitting performance observed in Figure 14(b), particularly evident in larger errors during true predicted value jumps.
Conversely, TM_MCA&AMC effectively eliminates irrelevant information but fails to distinguish and analyze submanifold.It relies on the spatial geometric information of the overall dataset, neglecting the internal geometric structures unique to submanifold within each dataset labeled differently.This distinction is crucial for a comprehensive understanding of the dataset.
Meanwhile, SSW_TM_AMC augments information by incorporating features from the former two configurations.However, it overlooks the removal of non-nearest-neighbor points within neighborhoods, adversely impacting the model's performance.

Experimental comparison results with other algorithms and analysis
In    Bold values represent the proposed method has the highest prediction accuracy compared with other algorithms.CJS-SLLE and S-LE 2 effectively enhance model performance by 20 and 23% for carbon content prediction and 20 and 20.5% for temperature prediction, respectively.However, the neglect of difference information still leads to the limited improvement of fitting degree.as shown in Figure 16(b) and (d).
(3) For the S-LE 1 algorithm, the average value of the distance between each sample and all samples is used as the threshold for judging the set of similar nearest neighbors and the set of dissimilar nearest neighbors when constructing the nearest neighbor graph, which does not take into account the similarity between the samples within a class and the differences between dissimilar samples very well, and at the same time, it will result in the increase of the computational cost, and the algorithm targets classification problems and performs poorly in regression tasks.Compared with S-LE 2 , S-LE 1 lacks the consideration of folded surfaces and manifold curvature in the high-dimensional space and includes non-nearest neighbor points when constructing the nearest-neighbor map, which leads to the dispersion of the denser regions in the original data into multiple parts, affecting the accuracy of the dimension reduction results.From the experimental results, in the BOF steelmaking process data, compared with the original LE algorithm, S-LE1 improves by 16% and 11.5% when predicting carbon content and temperature, respectively.(4) According to Figures 15 and 16, the comparison between the real value and the predicted value of each soft sensor model shows that the SWLSPP model has the smallest error between the real value and the predicted value compared with the models of other methods, thus reflecting the effectiveness of the soft sensor modeling, and realizing the real-time accurate prediction of the carbon content and temperature of the endpoint of BOF steelmaking.(5) Table 7 shows the experimental results of the SWLSPP algorithm proposed in this article and other comparative algorithms for endpoint prediction, of the 78 furnace samples obtained.Due to the differences in data distribution between the new furnace dataset and the original training and test sets, the regression accuracies of some algorithms are reduced by 1.73-2.46%compared with Table 6.Meanwhile, the new dataset has a smaller number of samples, which may lead to the inability to fully evaluate the performance of the model.

Conclusion
Addressing the challenges posed by the high dimension and varying working conditions of BOF steelmaking process data, traditional static models often struggle to adapt to changing conditions.To overcome this limitation, the article proposes a soft sensor model for predicting carbon content and temperature at the endpoint of BOF steelmaking.Using SWLSPP, this model aims to achieve accurate predictions.By integrating label information into the dimension reduction process, the SWLSPP model mitigates interference from irrelevant data, ensuring precise preservation of local spatial structures through analysis of non-local relationships within data sets corresponding to different labels.Through comprehensive modeling and simulation of BOF steelmaking process data, the proposed SWLSPP model undergoes ablation experiments and comparative analysis against alternative dimension reduction soft sensor algorithms.Evaluation across various criteria on carbon content and temperature datasets demonstrates the superior performance of the proposed algorithm compared to others.When the error range of carbon content prediction is controlled within ±0.02%, with an accuracy of 82.50%; and when the error range of temperature prediction is controlled within ±10°C, with an accuracy of 79.00%.

Figure 1 :
Figure 1: Schematic diagram of the SR.

Figure 3 :
Figure 3: Schematic of SSW and TM.

Step 1 :
For the current query sample x t , from Definitions 1 and 2, we obtain different labeled sample clusters = X i q 1, 2, , PF i ( ••• ) and the corresponding weights W .The weights are applied to G , XPFi respectively, i.e.,

Figure 7 :
Figure 7: Effect of different numbers of nearest neighbor points k on the model.The larger the volume of the sphere in the figure, the higher the accuracy of the regression (carbon content).Figure 8: Effect of different numbers of nearest neighbor points k on the model (carbon content).

Figure 8 :
Figure 7: Effect of different numbers of nearest neighbor points k on the model.The larger the volume of the sphere in the figure, the higher the accuracy of the regression (carbon content).Figure 8: Effect of different numbers of nearest neighbor points k on the model (carbon content).

Figure 9 :
Figure 9: Effect of different numbers of nearest neighbor points k on the model.The larger the volume of the sphere in the figure, the higher the accuracy of the regression (temperature).
remains unchanged throughout the experiments, with model prediction performance contingent on the TM efficacy, neighborhood parameter optimization, and integration of SSWs.To assess the efficacy of the SWLSPP model, it undergoes ablation along the following lines:(1) Ablation of SSWs using the TM and MCA & AMC, which eliminates the interference of widely varying samples and manifolding non-nearest neighbors.However, it lacks the utilization of label information and differentiation between sets of differently labeled data.This ablation experiment is denoted as TM_MCA & AMC (Table5).(2) For the ablation of TM, although the SSW module introduces labeling information, it lacks a mechanism to remove interference from irrelevant working condition sample information.This ablation experiment is denoted as SSWs_MCA & AMC.(3) For MCA ablation, the screening of non-nearest-neighbor points on the manifold is omitted.This ablation experiment is denoted as SSWs_TM_AMC.Subsequent experiments are based on production process data, model parameters of methods adjusted to optimum values.

Figure 11 :
Figure 11: Effect of different low dimension d on the model.The larger the volume of the sphere in the figure, the higher the accuracy of the regression (carbon content).

Figure 10 :
Figure 10: Effect of different numbers of nearest neighbor points k on the model (temperature).

Figure 12 :
Figure 12: Effect of different low dimension d on the model.The larger the volume of the sphere in the figure, the higher the accuracy of the regression (temperature).

Figure 14 :
Figure 14: Comparison of results from ablation experiments (carbon content on the left, temperature on the right).(a) TM_MCA&AMC, (b) SSW_MCA&AMC, and (c) SSW_TM_AMC.

Figure 15 :
Figure 15: Comparison results with other algorithms: (a) carbon content and (b) temperature.

Table 1 :
Detailed information on the data set of the BOF steel production process

Table 2 :
Experimental parameter settings

Table 3 :
Effect of different numbers of nearest neighbor points k and low dimensionality d on RA

Table 4 :
Effect of different low dimension d on RA

Table 5 :
Comparison of carbon content and temperature prediction result of ablation experiments Bold values represent the proposed method has the highest prediction accuracy compared with other algorithms.

Table 6 :
Comparison of carbon content and temperature prediction result of comparison experiments