Improving Stockline Detection of Radar Sensor Array Systems in Blast Furnaces Using a Novel Encoder–Decoder Architecture

The stockline, which describes the measured depth of the blast furnace (BF) burden surface with time, is significant to the operator executing an optimized charging operation. For the harsh BF environment, noise interferences and aberrant measurements are the main challenges of stockline detection. In this paper, a novel encoder–decoder architecture that consists of a convolution neural network (CNN) and a long short-term memory (LSTM) network is proposed, which suppresses the noise interferences, classifies the distorted signals, and regresses the stockline in a learning way. By leveraging the LSTM, we are able to model the longer historical measurements for robust stockline tracking. Compared to traditional hand-crafted denoising processing, the time and efforts could be greatly saved. Experiments are conducted on an actual eight-radar array system in a blast furnace, and the effectiveness of the proposed method is demonstrated on the real recorded data.


Introduction
Blast furnaces (BFs) are the key reactors of iron and steel smelting, which consumes about 70% of the energy (i.e., coal, electricity, fuel oil, and natural gas) in the steel-making process [1,2]. In iron-making, solid raw materials, e.g., iron ore, coke, limestone, are violently burned and consumed from time to time, and the charging operation needs to be executed by accurately estimating the current depth of the burden surface. Burden surface monitoring is crucial to ensuring high quality steel production as it affects the optimization of charging operations and the utilization ratio of heat and chemical energy [3].
The measurement of the BF burden surface is a longstanding and challenging task because of the harsh in-furnace environment, which is lightless, high-pressure, high-dust, high-humidity, and extremely high in temperature [4]. With the advantage of its contact-free nature, high precision, and high penetrability, the frequency-modulated continuous wave (FMCW) radar has become widely popular with its pointwise measuring method to locate the burden level in real time [5]. As shown in Figure 1, the employed eight-radar array system is introduced, where the radar sensors are placed scientifically in consideration of the practical industrial field. The burden surface is bilaterally symmetrical because of the uniform rotation of the charging chute, so only half of the surface needs to be measured. The trajectory of the measuring points taken by the radar over time is termed the stockline and reflects the depth of the burden surface at every timestamp. Improving the precision of stockline detection, the operator can optimize the burden distribution in a way that is conducive to a stable iron-making process. Noise interferences are crucial issues in the BF radar system. The radar signals usually suffer heavy low-frequency noises due to strong electromagnetic scattering. At times, the shadows of the rotating chute and falling materials, the influence of natural deviations, instrument errors, fraudulent behaviors, and other unexpected interferences result in the appearance of distorted signals and make stockline detection more challenging.
In the study of BFs, a variety of algorithms of noisy data processing have been developed so far, covering principal component analysis [6], support vector machines (SVM) [7,8], neural network models [9], extreme learning machines [10], etc. It is acknowledged by remarkable previous works that the noises occurring within the BF reactor are still an extremely complex issue. The existing works of stockline detection are reviewed [5,[11][12][13][14][15]. Generally, peak searching (PS) is used to extract the stockline, which corresponds to the maximum amplitude component in the signal spectrum [12]. In the case of actual BF environments, the collected signals are usually corrupted with noises that bury the target features under fraudulent peaks and bring outliers to the stockline. A simple approach to eliminate the noise influence is threshold clipping, of which the adaptive threshold method is an example [13], where the noises outside the thresholds would be ignored rudely. Effective filtering methods are being developed to exclude the empirical interval of noise distribution, e.g., the infinite impulse response (IIR) filter used in [14] and the windowed finite impulse response (FIR) filter used in [16]. Ongoing efforts are being made to improve noise robustness by taking historical observations into consideration, e.g., Kalman tracking methods [17,18]. Several stockline smoothing methods are presented to reduce noise fluctuation, e.g., mean shift and spectrum average [13]. The CLEAN and clustering algorithms, as reported in [5], can omit the falsely noisy targets on the burden surface but are unsuitable for the task of continuous detection as such. Traditional noise abatement requires domain expertise to construct the feature selector, involving, for example, threshold selection or noise filtering, to the extent that it limits further performance improvement. Another bottleneck is the limited stockline tracking capability.
Recently, deep neural networks have made considerable progress on diverse kinds of data processing, such as image, video, speech, and text [19]. The convolutional neural network (CNN) is widely believed to have a powerful learning ability for feature selection and extraction. Besides, with the advantage of long-range memory, the long short-term memory (LSTM) network is broadly used in time-series data modeling. An increasing number of hybrid architectures that combine the CNN and LSTM as an encoder-decoder pair have achieved great success in promoting long-term information learning, such as the image caption [20], speech signal processing [21,22], sensory signal estimation [23], etc.
In this paper, we propose a novel encoder-decoder architecture for effective stockline detection. The encoder is a one-dimensional convolutional neural network (1D-CNN), and the decoder is a cascade multi-layer LSTM network. Between the encoder and decoder, a binary classifier is constructed to mitigate the negative impacts of distorted signals. The hybrid architecture has an excellent anti-noise learning ability and a long-range stockline tracking ability. Our contributions can be summarized as follows: • To present a novel encoder-decoder architecture to improve stockline detection, which learns desired features from noisy data adaptively. We save time and effort compared to traditional hand-crafted denoising processing.

•
To present an effective stockline tracking strategy by leveraging the LSTM network to model longer range historical signals. A large tracking capability brings better robustness of noise randomness.

•
The experiments are validated on actual industrial BF data. In particular, the experiments are carried out on an intact multi-radar scenario rather than a single radar scenario.
The rest of the paper is organized as follows. In the second section, the issues of stockline detection and the necessity of the encoder-decoder architecture are described. The proposed algorithm and the loss function are explained in Section 3. We conduct the experiments on actual BF data collected from the eight-radar array system in Section 4. A conclusion is provided in Section 5 to summarize this work.

Issue Description and Necessity Of Encoder-Decoder Architecture
The signals are collected individually and sequentially among different radars. They are 1024-dimensional vectors quantified by the 10-bit analog-digital converter. There are some examples of the input signals shown in Figure 2. Stocklines are fluctuant curves that reflect the changing depth of the burden surface, as shown in the radar temporal-frequency spectrum in Figure 3 where the vertical coordinates have been converted to the measuring distance using where d is the distance in m, c = 3 × 10 8 m/s is the velocity of electromagnetic waves, and B = 1.64 GHz represents the radar bandwidth. f is the beat frequency in Hz. More introductions about the FMCW radar can be found in [24]. Several typical noisy conditions are depicted in Figure 3. Figure 3A shows a condition where the stockline is able to be detected. On the contrary, the stockline in Figure 3B appears to be buried under the strong low-frequency noises (the luminous yellow band above the stockline). In Figure 3C, the stockline is so discontinuous that numerous absent measurements occurred due to the interruption of distorted signals. In general, the collected data close to the furnace center suffer heavier noisy impacts compared to those close to the furnace wall.  (A-C) are from radar #2, #4, and #6, respectively. The horizontal axis represents the number of time series signals, and the vertical axis represents the distance between the radar sensor and the measured point.
As declared in [2,7], the unknown statistic properties of the actual BF noises are always a dilemma for noise abatement. Different from traditional methods, which perform data filtering processing that is heuristic, repetitive, and knowledge-based for a multi-radar system, the proposed method circumvents the noise interferences in a learning fashion, significantly reducing the time and effort of hand-engineered data filtering. The front CNN is used to extract representative features from the raw signals; the middle classifier is used to separate the fraudulent distorted signals; the trailing LSTM is utilized to capture the dependencies of time series measurements and decode the feature. Such an effective encoder-decoder (CNN-LSTM) backbone architecture tailored to stockline detection is presented.

Architecture
The proposed architecture is shown in Figure 4. It consists of four operations, including CNN feature extraction, radar identification (ID) embedding, distorted signal separation, and temporal feature decoding.  Mathematically, given the input of the convolutional layer, X ∈ R C×D , which is C-channel and D-dimensional. Let W ∈ R K×C×L be the weight, where K is the number of convolutional kernels and L is the kernel length. The 1D-convolutional calculation can be formulated as where the symbol * stands for the convolutional operation, s is the sliding stride, and · is the ceiling function. We apply batch normalization (BN) and nonlinearity after each convolutional layer, as shown in Figure 5b. BN is widely used in the neural networks and can speed up convergence and slightly improve performance [25]. Assuming x is the input of BN, we have where γ is a scaling factor and β is a shifting factor. It is noted that β also acts as a bias term to the convolutional layer. E[·] is the mean, and Var[·] is the variance. is a small positive constant to prevent division by zero, e.g., = 10 −5 . The leaky rectified linear unit (LRelu) is adopted as nonlinearity [26]. It is where f (·) represents the LRelu function and α is a constant, e.g., α = 0.2. Max-pooling layers with a stride of 2 are used in the encoder. After encoding by the CNN, the m-dimensional feature is extracted. m is a optional hyperparameter, and m = 64 is used.
ID Embedding. The characteristics of noisy signal data are different from radar to radar. For multi-radar data, we embed the radar ID information into feature F to slightly improve performance. The radar ID is encoded using one-hot coding. Formally, given R radars, the corresponding ID is denoted by I (r) = (I F (r) and I (r) are concatenated together in dimensionality, the superscript denotes the signal derived from the r-th radar.
Distorted Signal Separation. We construct a binary classifier using a fully connected layer to classify the normal signals and distorted signals before feeding them into the decoder. The fully connected layer maps the feature into a V-class decision space (V = 2 for the 2-class task). Let W ∈ R V×| F| be the weight, with b = (b 1 , · · · , b V ) ∈ R V as the bias. We have where F (r) j is j-th element of F (r) , F is the dimensionality of F (r) , and p = (p 1 , · · · , p V ) is the category predicting the probability by the softmax function. For the binary classification, we have the expectation p * = (1, 0) for the normal signals and p * = (0, 1) for distorted signals. The distorted signals carry confusing information and do not make any meaningful contributions to stockline estimation. We mask them by zero to reduce their negative impacts for the next calculation step.
LSTM Decoder. LSTM is an improved variant of recurrent neural networks (RNNs) [27,28]. Define T be the tracking length of the stockline, including T − 1 historical signals and one current signal. Let the subscript t be the index of the T sequential signals that are fed into the LSTM, so t = 1, 2, · · · , T. LSTM makes use of an effective gate mechanism to control the context information flow, which is comprised of the input gate i t , the forget gate f t , and the output gate o t . Its inner gate mechanism is shown in Figure 6. The cell state c t stores the information of each time step, the input gate determines whether to add new information to c t , the forget gate selectively forgets the uninteresting previous information involved in c t , and they control the update of the cell state c t . The output gate controls the emission from c t to the hidden state h t . h t is mapped to the decoder output y t = U t h t + b t , where U t is the weight matrix and b t is the bias. The output y T of LSTM would correspond to one point of the stockline.

Loss Function
Classification loss. The classifier is trained by minimizing the cross-entropy loss. It can be formulated as Regression loss. The output values {y t } T t=1 are regressed to the target values {y * t } T t=1 by minimizing their square errors, and the form of regression loss is where ρ is the discount factor, ρ = 0.5 is used. Joint loss. It is a joint training task of classification and regression. The general way of constructing a joint loss is using a linear weighted sum of each subtask loss. The total loss can be written as where λ 1 , λ 2 , λ 3 are the weighted factors of each subtask. norm (θ) = θ is the L2 norm loss. When it simply presets λ 1 : λ 2 to 1 : 1 in our experiments, the classification loss tends to steer the training process and damage the regression performance. Thus, we use the strategy of homoscedastic uncertainty to search for the suitable weight factors to balance their performance [29,30]. The joint loss is rewritten as where σ 1 and σ 2 are two learnable parameters that stand for the observed variances of the subtasks. The bound term log σ 1 σ 2 discourages the variances from increasing too much. Assuming λ 1 = 1 , λ 3 = 5 × 10 4 . On implementation, the network is trained to predict the log variance because it is more numerically stable; for instance, letting s 1 = log σ 2 1 , we use √ e s 1 to replace σ 1 in (11), and so for σ 2 .

Experiment
In this section, the experiments are carried out and the experimental setups are provided in detail. The experimental results consist of four parts: performance, scientificity of the architecture, the effect of the tracking strategy, and the running time.

Experiment Setup
Dataset. There are 504, 488 signals collected from the industrial BF eight-radar array system as posed in Table 1. The data is divided into three parts, including the training set, validation set, and testing set; they are 3:1:1, respectively. The training set is used to train the model, the validation set is used to validate the performance and tune the hyperparameters, and the testing set is used to test its performance. The results are from the testing set as default, unless otherwise stated. The radar signals are normalized between [0, 1] in order to eliminate the effect of the amplitude.
Hyperparameters. The encoder with 5 convolutional layers, the decoder with 3 hidden layers, 72 time steps (i.e., T = 72), and 128 hidden nodes are adopted. The convolutional layers and fully connected layers are initialized with zero bias and a Gaussian weight filled with (−0.1, 0.1). The LSTM cells are zero-initialized. We use the Adam solver [31], which is an improved stochastic gradient descent algorithm, to optimize the model. A dynamic learning rate is used,. It starts with 0.001 and decays by 0.98 per every 100 iterations. The model is trained for 20 epochs with a batch size of 120. We perform the dropout with a 50% dropout rate to the LSTM as regularization [28,32]. When the training is completed, the adaptive parameters s 1 , s 2 are leveled off to −4.09, −0.34, respectively; in other words, λ 1 : λ 2 = 29.87 : 1.40. Evaluation. We use mean absolute error (MAE) and root mean square error (RMSE) to evaluate the detection performance. Formally, MAE is defined as The RMSE is defined as We use accuracy, precision, recall, and F1 score to evaluate the classification performance. Accuracy indicates the proportion of correctly classified samples among all samples. Precision indicates the proportion of true positive samples among true positive samples (TPs) and false positive samples (FPs), that is, Recall indicates the proportion of true positive samples among true positive samples and false negative samples (FNs). recall = TP TP + FN (15) The F1 score is the harmonic mean of precision and recall.

Results
Different experiments were conducted as shown below. The proposed method was implemented on the Tensorflow framework [33].
Performance. The traditional peak searching method with pass-band FIR filters and Kalman filters were implemented for comparison. The settings of the FIR filters are shown in Table 2, and the Kalman algorithm is introduced in [34]. Respectively, the single CNN model and single LSTM model were constructed for another baseline comparison and their structures were identical to the corresponding part of the hybrid model.
The estimation stocklines are displayed in Figure 7, including the proposed method and FIR-Kalman methods. The regression curves of the proposed method exhibited less fluctuation around the expectation curves. In contrast, the filter-based approaches tended to be influenced by interferences.  As shown in Tables 3 and 4, with an MAE of 0.0432 and a RMSE of 0.0581, the presented architecture is capable of learning features and suppressing noise interferences of noisy data. It could be observed that the proposed method significantly outperforms traditional experience-dependent denoising methods. From the table, the traditional method without denoising processing is extremely sensitive to heavy noisy impacts and achieves a poor performance. In contrast, the proposed method shows itself to be more efficient and robust. The classifier performance is shown in Table 5. With scores of 95.90%, 96.00%, 99.48%, and 97.68% for the accuracy, precision, recall, and F1 score indicators, respectively, it achieves a decent classification result that can classify the distorted signals well. Scientificity of architecture. A series of experiments were carried out to verify the scientificity of such an architecture. First is the validity of such a CNN-LSTM mixed architecture. Two simpler baseline architectures are provided, i.e., a single CNN architecture and a single LSTM architecture, as shown earlier in Tables 3 and 4. The detached architectures both show worse performances compared to the proposed mixed architecture due to losing the advantages of each. For the simple CNN architecture, it shows a weak performance and is even worse than traditional denoising methods. For the simple LSTM architecture, we observe that LSTM gets into trouble learning the feature from noisy signals without the help of the CNN encoder.
Secondly, we conduct experiments to examine the complexity of the encoder and decoder. The impacts of picking a different number of CNN layers and a different number of LSTM layers are shown in Figure 8a,b, respectively. Based on the experiments, we choose a 5-layer CNN and a 3-layer LSTM as the encoder and decoder for optimal performance. The experimental results also indicate that a deep architecture is not always necessary.
Thirdly, the stockline tracking length T is examined. The experiments are shown in Figure 8c; as T increases, errors decrease in a general trend. Due to the limited memory of the LSTM network, it is found that when the tracking length T > 72, it brings a marginal improvement of performance but conspicuously increases computational time, so a proper T = 72 is used. Effect of tracking strategy. To further validate the effect of the stockline tracking ability provided by the LSTM, a snapshot-based model as designed by setting the time step T = 1, implying that the network has no visible historical signal. We compare its performance with the tracking-based model (i.e., T = 72) with fair training settings. The results are shown in Figure 9, where the black dotted line stands for the expectation, and the one whose distribution is closer to the black line shows a better performance. The snapshot-based tracking strategy, with a MAE of 0.1019 and a RMSE of 0.1854, is more sensitive to the interruptive noises and is even worse than the FIR-Kalman approach. As shown earlier in Figure 8c, a longer tracking capability has a positive effect on performance. The proposed method takes a longer range of previous stockline into consideration; however, the Kalman tracking approach only makes use of one previous moment. An effective tracking strategy brings a better tolerance to noisy randomness and disturbances.
Running time. In addition, with an average computational time of 0.522 ms, the proposed architecture is lightweight and fully meets the real-time requirements of the industrial process.

Conclusions
In this paper, we have successfully developed a hybrid CNN-LSTM architecture for the challenging stockline detection problem. Its effectiveness has been demonstrated by experiments on BF eight-radar array data. The success of the proposed method is attributable to three reasons: its effective learning ability, its ability to classify distorted signals, and its excellent stockline tracking ability.
In the industrial process, most of the monitoring data or variables have close time-series dependencies. The general contribution of this work is that we improve the long-range history learning by leveraging the novel LSTM network. It is a very promising direction to achieve more accurate and robust industrial control, if we can make the most of the context relationship of successive measurements.
In future work, we will dedicate our efforts to the image reconstruction of BF burden surfaces. The inner transparency of the neural network methodology will be investigated, which will allow us to have an in-depth qualitative or quantitative analysis of noise influences.