Segmentation Improved Label Propagation for Semi-Supervised Anomaly Detection in Fused Magnesia Furnace Process

This article introduces a semi-supervised solution to the practical task of identifying the semi-molten working condition for fused magnesium furnace. The classifier is trained under the semi-supervised learning framework using video data which are partially labeled, and it works on-line to classify the semi-molten working condition from the monitoring video of fused magnesium furnace. Firstly, a deep auto-encoder is used to extract features from the video. Next, the feature time series is segmented into subsequence through a bottom-up algorithm which maximizes the interior data correlation. Then, a graph based label propagation algorithm is employed to iteratively train a LSTM classifier using the subsequences as training samples and improve the accuracy of the LSTM classifier. The main contribution of this work is the novel segmentation algorithm for sequential data which can remarkably improve the classification accuracy. The advantages of the proposed method are demonstrated by experiments and comparative analysis based on industrial data.


I. INTRODUCTION
Fused magnesia is an industrial material which can find applications in high end optical equipment, nuclear reactors and rocket nozzles [1]. It is produced at temperatures above the fusion point of magnesia oxide (2800 • C) using electric arc furnaces (EAF). The industrial fusing process is referred as the fused magnesia furnace (FMF) process. The FMF process typically lasts 12 hours and then the melting block is being cooled down to obtain the magnesia crystal. FIGURE 1 shows the fused magnesia production process.
Semi-molten is one of the abnormal working condition occurring occasionally during FMF process. Under semimolten, an overheated region is developing inside the molten pool whose temperature will eventually exceed the resistance limit of the furnace lining. The precise mechanism of semi-molten development is still unclear [2]. It is commonly believed that semi-molten is related to the impurities such as SiO 2 , CaO and Fe 2 O 3 in the magnesite powder which makes The associate editor coordinating the review of this manuscript and approving it for publication was Jenny Mahoney. the electrical resistance of the molten pool being lower than the normal value. Therefore excessive the increased electrode current produces high intensity arc resulting in significant temperature increase in the furnace. It is vital to detect semimolten in its early development to prevent burnt-through of the furnace wall. Missed detection or delayed detection of the semi-molten will result in leakage of high-temperature molten ore causing production halt and substantial economic loss. It may even endanger the safety of the operators at the scene.
In practice, early warning of semi-molten is largely relied on human operators by patrolling in the workshop. Operators estimate the working condition by inspecting the flame on the furnace mouth based on their experiences. This is because the characteristics of the flame such as its shape, brightness, color and the flashing patterns are closely correlated to the working condition. However, inexperience, unstable mental state of the operators may affect their judgment, which can lead to missed or false detection. Also, the on-site environment, such as intense light, high temperature, dust and noise, is human unfriendly, and continuous inspection involves high labor intensity. Therefore, the technology of automatic identification of semi-molten is very needed by FMF enterprises to ensure production safety and to free the operators from the burdensome patrolling and inspection tasks.
Currently, there are two approaches to the detection of semi-molten working condition of FMFs. The first approach identifies working condition based on the three-phase electrode currents of FMF. For example, Wu et al. [2] established a rule based technique to classify working condition using the features extracted from the tracking error of electrode current, the current change rate, and the current change time. Zhang [3] proposed a method of monitoring working condition based on subspace separation method. Indeed, the electrode current based semi-molten detection approach is intuitively sensible in that the local electrical resistance will remarkably change under the semi-molten condition. However, the patterns of electrode current are not sufficient to distinguish the semi-molten because other factors, such as refilling furnace, may also cause the similar fluctuation pattern of the current. The electrode current alone is not a sufficient source of features to distinguish semi-molten.
The other approach uses the machine vision based classifier trained with features extracted from the FMF monitoring video. It is inspired by the fact that experienced operators can tell the developing semi-molten state solely by ''watching the flame''. This indicates that the image of the furnace mouth flame contains rich, ubiquitous features related to the working condition of FMF process. To this end, Guo et al [4] established a semi-molten classifier using the stochastic configuration networks [5] trained using historical monitor video images. Lu et al [6] established a RGB and infrared images based classifier using deep convolution network and the generative adversary networks (GAN) were employed to synthesize extra samples so as to overcome the problem of shortage of samples.
Compared with the electrode current approach, the machine vision based approach is much more promising because the visual patterns of furnace flame are sensitive and unique to the semi-molten working condition. However, the current machine vision based detection techniques are still far from practical because of low accuracy. There are two main challenges: First, the flame of a FMF is highly dynamic. The shape, brightness and color of the flame are always changing due to unmeasurable disturbances. It is not possible to accurately distinguish semi-molten according to a single or few images. One must also consider the temporal patterns of the video. Second, the historical video monitoring images can only be manually labeled by experienced operators. Labeling a large amount of video footage by human is very costly and not practical. Therefore, the problem has to be tackled with very limited number of labeled samples.
In view of the above problems, this article proposes a machine vision solution under the framework of semisupervised learning. The semi-supervised learning (SSL) is a machine learning paradigm which utilizes a large number of unlabeled data with the assistance of a small number of labeled data [7]. Using sufficient unlabeled data, which are often easy to obtain, to improve the performance of classifier is the key to the success of many practical tasks. To address the dynamics in the FMF monitoring video, we consider using multivariate time series subsequences as samples in the classifier training. SSL classification of time series is especially challenging due to the dynamic characteristics of time series data [8], [9]. The subject has attracted increasing attentions driven by applications such as electroencephalography (EEG) pattern recognition, human dynamic behavior and facial expression recognition, automated trading etc. (see a review in [9]).
In this article, we introduce the development of the semimolten working condition detection solution. We made three main contributions: 1) We developed an end-to-end solution to the practical task of abnormal condition detection in industrial settings. The proposed solution addresses the problem of insufficient labeled data by integrating semisupervised learning approach with time-series segmentation method. And the proposed solution is tested using real industrial dataset. 2) We propose a dynamic multivariate segmentation method to adaptively partition the feature sequence extracted from video into homogeneous subsequences. Traditional studies on machine learning often assume that samples are well prepared. However, in practical tasks, the quality of samples is often the key to success. By segmenting the video sequence into homogeneous subsequences, it can make the condition model better learn the temporal pattern in the time series extracted from the video. And we will show that the proposed preprocessing treatment significantly improves the classification performance. 3) We employ the graph based label propagation algorithm to train the condition classifier based on Long Short Term Memory (LSTM) model. When dealing with time series data, the common approach is to convert the sequential data to high dimensional static data and use the Euclidean distance measure. However when dealing with sequence data, such an approach loses the temporal information of the data. We propose a new measure of sequence distance by combining the dynamic predictability and the static reconstruction error in the construction of the affinity graph, which is the key to address the dynamics in FMF flame video.
The rest of the paper is organized as follows: Section 2 defines the FMF anomaly detection problem and briefly reviews the related works. Section 3 overviews the proposed method. Section 4 and section 5 elaborate the prepossessing and classifier training methods, which are the main contributions of this work. Section 6 presents the evaluation results of the proposed method. And finally, section 7 concludes our work.

II. FMF ANOMALY DETECTION AND RELATED WORKS A. PROBLEM DESCRIPTION
The goal of the proposed method is to distinguish the abnormal working condition, semi-molten, from the other working conditions based on the monitoring video. FIGURE 2 illustrates image samples of the four major FMF working conditions including smelting preparation, ore feeding, smelting and semi-molten. Each working condition possesses visually distinguishable features. The semi-molten condition can only occur during the smelting process. When semi-molten is developing, the flame on the furnace mouth becomes less stable and the furnace sometimes splash sparks. Also, the furnace wall may show a brighter area because of overheating, as shown in FIGURE 2-(d). However, such a feature may not be necessarily observed if the overheating bubbles are not near the furnace wall.

B. RELATED WORKS
The key problem in the FMF anomaly detection task is the semi-supervised classification of time series. An effective framework for the SSL of time series is the self-training framework [9], which repeatedly uses supervised learning to predict unlabeled samples and then retrain the supervised classifier using the extended samples [10]. Self-training assumes that the high confidence predictions tend to be correct, which implies well-separated classes (cluster assumption) [7]. Therefore, the clustering and labeling approach is a common approach under the self-training framework. For example, Marussy and Busa [11] proposed a SSL classification algorithm which uses hierarchical clustering of all the data and the labeled samples as the root nodes to establish the minimum spanning tree of all the samples. Then, the 1-NN algorithm was used to cluster the unlabeled sample according to its distance to the root node. The distance of time series sample can be measured by Dynamic Time Warping (DTW) or its variants [12], [13]. There are also other similarity measures tailored to specific SSL tasks such as the maximum diagonal cross recursion quantitative analysis (MDL-CRQA) [14].
The method used in this work is mostly related to the graph based label propagation algorithm, which is one of the prominent methods based on the cluster assumption [15]. The main idea of the algorithm is to construct a graph using observations as vertices and the similarity between observations represented as edges. The graph based label propagation smooths the spread of label because the nearest neighbors on the graph have the same prediction. For example, the authors of [16] proposes a probabilistic graphical model for label propagation in video sequences using an EM based algorithm. Wang and Tsotsos [17] extend the label propagation to be able to deal with multi-class problems.
Segmentation is to partition a given sequential data into subsequences which are internally homogeneous [18]. Proper segmentation of the sequential data is the key for semi-supersized time series classification [19] and it has been applied to anomaly detection [20], [21]. In semi-supervised learning, segmentation can serve as a data pre-treatment tool for representation reduction and feature extraction. Most of the basic segmentation approaches have a root on the idea of piecewise linear approximation (PLA) [22]. For example, the principal component analysis (PCA) based approach [23] and the dynamic PCA (DPCA) based approach [24]. For the clustering and labeling SSL approach, segmentation and classification are two closely related problems. For example, in [25] the authors cast the video segmentation problem as a semi-supervised learning problem.

III. PROPOSED METHOD OVERVIEW A. DEFINITION
Let V = {v 1 , . . . , v n } define the historical monitoring video of FMF process for training, where v i represents the ith frame image and N is the total number of the images. Define the feature space mapping of image φ : V → F, where F = {x 1 , . . . , x n } ∈ R m×n and x i is the m-dimension feature vector extracted from v i . This forms a multivariate feature vector sequence. In this article, we use sequence as the atomic sample data for training. By performing segmentation on F we obtain the training set X = {X 1 , . . . , X κ }, where X i is a subsequence of m-dimension feature vector. Note that the length of subsequences is not equal. The labeling matrix including l labels and the rest are unlabeled. It is defined

B. THE PROPOSED METHOD
To train a classifier using partially labeled dataset X , our proposed method includes two parts: preprocessing and classifier training. The process of the proposed method is shown in FIGURE 3.
Preprocessing is to construct the dataset which includes multivariate time series subsequences as samples. First, the input video image sequence is compressed by the sparse auto-encoder (SAE) [26] to obtain the multivariate feature sequence F. Then, we propose a multivariate time series segmentation algorithm to segment the multivariate feature sequence into subsequences. The algorithm recursively merges neighborhood subsequences through a bottom-up procedure. The cost of merging is defined as the weighted combination of the dynamic latent model prediction accuracy and the principal component analysis reconstruction error.
The second part is to train a LSTM classifier for the detection of semi-molten working condition. The classifier is trained under the framework of graph based label propagation. First we construct a graph whose nodes are data points and edges are similarities between points. Next, the network parameters of the LSTM model is trained using the initial available labeled samples. Then, the unlabeled samples are assigned with pseudo label which is calculated as the combination of the initial label information and those received from its neighbors weighted by the corresponding edges. And then the network is retrained using both labeled and pseudo labeled datasets.

IV. SEQUENCE SEGMENTATION BASED ON LINEAR DYNAMIC PREDICTABILITY A. SEQUENCE SEGMENTATION PROBLEM
It has been noted that human operators learn to recognize the semi-molten working condition by watching the dynamic pattern of the furnace flame. And human experts are more comfortable to label the working condition based on video rather than single images. To develop a machine detector which mimics the human cognitive process and learning, we will use video sequences as the basic training samples. This requires a segmentation operation on the multivariate feature time series extracted from the video for training.
Give a sequence data T = {x i ∈ R m |1 ≤ i ≤ N }, the segmentation problem can be defined as constructing q non-overlapping subsequences The objective of segmentation is to make the subsequence homogeneous according to given criteria, which is expressed as a cost function cost(·) which maps the criteria to a scalar.
The global objective of a given segmentation scheme is Since the subsequences are used as the training samples for a higher level classifier model, it is desirable to keep the structure of subsequence as simple as possible. Therefore, we seek a segmentation scheme that each subsequence can be linearly expressed. Considering that the multi-variate feature sequence shows both cross-correlation and auto-correlation, we use the dynamic latent variable (DLV) modeling method to approximate the multivariate sequence. The segmentation objective is then to minimize the overall averaged approximation error.
where w is weight vector. The dynamics in latent variables can further be expressed as a linear combination of the past s variables as Dong and Qin [27] proposed the dynamic inner principal component analysis (DiPCA) algorithm which can simultaneously learn w and β. The objective function has the following form where The DiPCA algorithm uses an iterative process to optimize the above objective function and the details can be found in [27].

2) SEGMENTATION USING DYNAMIC PREDICTABILITY
In [28], we propose a multivariate time series segmentation algorithm. The algorithm realizes the ''bottom-up'' approach [29]. It starts with the sequence partitioned into the maximum allowed number of subsequences. Then, in each iteration, it merges one of adjacent subsequences which maximizes the objective function. The iteration repeats until the termination condition is satisfied. The key of the segmentation algorithm is the definition of merging cost. The main idea of the algorithm is to learn a DLV model of each subsequence using the DiPCA algorithm and then the merging VOLUME 8, 2020 cost is defined as the mutual predictability of the subsequence models. For a new data x s+1 , form the data matrix X s+1 as in eq. (6). The estimated latent score is calculated aŝ where is estimated using the least squares method. The prediction accuracy (PA) of a next score can be measured by cosine similarity between T s+1 and the predictionT s+1 .
Eq. (8) can be used to calculate the merging cost of a pair of two adjacent subsequence S i and S i+1 as

3) AN EXTENDED SEGMENTATION ALGORITHM
One limitation of the segmentation algorithm proposed in [28] is that it only considers the dynamic predictability. The cost function defined using eq.(9) uses the dynamic latent variables which are mostly relevant to the dynamic auto-correlation. In view that the features extracted from the furnace flame image are highly correlated, it is also necessary to take into consideration the cross-correlation structure. The authors of [27] suggested that DiPCA can be used as a dynamic whitening filter. This indicates that the prediction error can be considered as the residual after filter off the autocorrelation in the original data. For X s+1 its dynamic prediction using a learned DiPCA model iŝ Therefore, we can extract the static latent variables by applying PCA on the prediction error and obtain the following Using the similar idea of eq. (9), we can define the merging cost of the static part as whereÊ i∪i+1 s+1 is the reconstruction using T r P T r . Next, we introduce a coefficient λ ∈ [0, 1] to control the weight of the dynamic part and the static part, and the final merging cost is: The segmentation algorithm is listed in the following.

Algorithm 1 Multivariate Sequence Segmentation
1: Partition T into k equal length subsequences 2: while k > K target do 3: for i = 1 to k − 1 do 4: Calculate merging cost S(i, i + 1) using eq. (13) 5: end for 6: Merge the two adjacent subsequences which having the largest merging cost 7: k ← k − 1 8: end while 9: for each subsequence whose length > κ do 10: Equally divide the subsequence to have length κ 11: Discard the reminder if its length is smaller than κ min 12: end for

V. LABEL PROPAGATION BASED SEMI-SUPERVISED LEARNING
In this section, we develop a semi-supervised learning strategy under the framework of label propagation. The central idea is to assign pseudo labels to unlabeled samples and train the classifier using all labeled and pseudo labeled data. The overall learning process alternatively carries out the label propagation step and the training step [15].

A. LABEL PROPAGATION
The label propagation operation is performed using the graph based method introduced in [<zhou2003>]. Given the training sample set X, first we construct the affinity matrix W whose diagonal elements are zero and the w ij for i = j which represents the similarity of sample x i and x j . Then, we calculate the normalized graph Laplacian L = D −1/2 WD −1/2 where D is the diagonal degree matrix of W . LetŶ be the label matrix estimated using the current classifier network trained using the available labeled training samples. The label spreading procedure proposed in [15] uses the following iterationŶ and the above iteration equation converges tô Instead of solving eq. (15), we solve the following linear system using conjugate gradient as it has been shown [30] to be more efficient.
(I − αL)Ẑ =Ŷ (0) (16) Note that in the context of this article, each training sample is multivariate sequence. It is not appropriate to use Euclidean distance to define the similarity. In this work, we use the idea developed in the segmentation section to construct the affinity matrix. The merging cost function of eq. (13) is actually the accuracy of using the model trained with the first sequence to predict the other sequence. By its definition it can serve as the similarity measure of two sequences. To make it symmetric, we define the following similarity measurẽ The affinity matrix is defined as where NN k is the set of k nearest neighbors in X which is introduced to make the affinity matrix sparse [31].

B. CLASSIFIER TRAINING
The network model consists of a sequence input layer, an LSTM layer, a fully connected layer and a softmax layer. We use the weighted form of loss function for the semisupervised training process and it will be briefly introduced here. The loss function for the classifier training is defined as where L s denotes the supervised loss function of network f θ with parameter θ using labeled sample set Y and pseudo labeled sample setŶ. The weight w i of the training sample x i measures the uncertainty of the pseudo label using entropy [31] The main training process includes two steps which are repeated for T epochs. Train the network f θ using X and Y L ∪Ŷ  It is worth noting that when calculatingS it doesn't need to learn the model parameters for each sequence as it has been done in the segmentation step. It only needs to cache the learned model of eq. (13) and use those to calculate the prediction accuracy. This can significantly reduce the computation demand in the label propagation step.

VI. EVALUATION WITH INDUSTRIAL DATA A. DATASET PREPARATION
The data used for evaluating the proposed method were collected from the fused magnesia processing plant of Xinfazhan Ltd. of China. The original data are obtained with the industrial video surveillance camera system of RGB video with resolution of 1.3 million pixels (1280*1024) and frame rate of 25 frames per second. There are in total 22 hours of video selected for training and evaluation. We invited experienced operators of the FMF process plant to label all the video. There were 7 clips identified as semi-molten and the operators marked the beginning and the end of those occurrences.
Next, we extract the flame region near the furnace mouth making the ROI of dimension 1024*512*3 because the visual information of the semi-molten condition of the FMF concentrates on the flame area. Those images are further resized to a dimension of 128*64*3 and then fed into the SAE which consists of 6 pooling layers and each pooling layer connects to a convolution layer to smooth the image. Each frame of the encoded ROI image is then unrolled to a 64 dimensional feature vector.
The segmentation algorithm is used to segment the feature vector sequence extracted by SAE into subsequences. The segmentation operation is performed to the sequence clipped by a moving window of size 2000 and the segmentation algorithm terminates when the smallest subsequence is smaller than 250, which corresponds to clip length of 10 seconds. FIGURE 4 shows an example of the segmented subsequences. Each of the stacked subfigures represents a feature and only some of the features are displayed here. The red vertical lines on the time axis represent division points found by the segmentation algorithm. Note that for subsequences whose length are longer than 250, we equally divide those into pieces with 250 length each and discard the fragments with length smaller than 50. Those are the green vertical lines shown in the figure. The short subsequence started at 648 and ended at 686 is discarded because it is shorter than the predefined threshold 50. As a result, for the example shown in the figure, we get 9 valid subsequences. After the segmentation operation, the whole training set includes 4320 labeled sequences and among those 330 samples are labeled as the semi-molten working condition.

B. EVALUATION RESULTS AND COMPARISON
For the network model we use a single layer LSTM controller, where the step length is 125, the input dimension is 64, VOLUME 8, 2020  the number of the hidden layer units is 32 and the output dimension is 2. The gradient descent optimizer is used in training LSTM at a learning rate of 0.01. We choose and the training is carried out 300 epochs. We start training with 90% of the training samples being unlabeled. FIGURE 5 shows the prediction accuracy on the unlabeled samples during the training process sampled with every 30 epochs. The overall prediction accuracy is above 80% after 150 epochs and it reaches 85.8% after 300 epochs of training. The trained classifier is tested using the test dataset with 1296 samples and the confusion matrix is shown in Table 1.
Since the main idea of this article is to use dynamic segmentation to build the training set on the label propagation algorithm, we will evaluate the effects of introducing segmentation by comparison studies. In the following, the proposed method will be abbreviated as DynSeqLP (Dynamic Segmentation enhanced Label Propagation).
We implemented 5 other methods to compare with the proposed method. The first is the fully supervised method, which uses only labeled samples without label propagation, and it is served as the baseline. The rest methods are divided into 2 groups. The first group includes some of the most commonly used semi-supervised learning methods such as S3VM, self-training, co-training and entropy regularization (ER) methods [10]. Note that the methods in this group train the classifiers using the encoded feature vector directly without performing segmentation. The second group of methods train the LSTM classifiers using sequences as the training samples. It includes the method proposed in this article and a fixed length segmentation based label propagation method. The later one constructs the training samples by directly dividing the feature sequence into subsequences of equal length, where denotes the minimal length of the subsequence. We will refer it as FixSeqLP in the following. For the segmentation operation, we evaluate the performance under different lengths of the subsequence. All the results of classification accuracy are listed in Table 2.

C. DISCUSSIONS 1) EXPLANATION OF THE RESULTS
We can see from Table 1 that the proposed method significantly outperform the other methods for comparison. There are sever reasons contributing to the improvements. First, the ''patterns'' in the furnace flame related to the semimolten of FMF process are identified over continuous video. The poor performance of the algorithms in the first group is due to those are trained based on single image which is unable to learn the temporal structure in the video sequences. In the proposed method, the segmentation operation can be considered as a pre-clustering of the images subsequences having close temporal structure. By doing so, the granularity of the following label propagation operation is increased from image to video so that the LSTM model can effectively extract the temporal ''patterns'' presented in the video, and thus improve the classification accuracy.
Second, the significant improvements compared to the FixSeqLP method is due to the segmentation operation which ensures that each subsequence contains single working condition. Without using the dynamic segmentation approach, the samples constructed by arbitrarily dividing the sequence may result that one subsequence includes multiple working conditions. This will cause the training of LSTM network difficult to converge and having poor accuracy.
Finally, we will emphasize that the performance improvement observed is largely due to the adaptive segmentation method proposed in this article. The methods in group I for comparison have a common limitation that they strongly rely on the assumption that the unlabeled samples can be well separated by clusters. When using feature of single image, the highly noisy nature of the FMF flame image will cause it difficult to cluster the samples, which decreases the classification performance of those methods. However, using uniformly constructed sequences as training samples will not notably alleviate the problem because it may introduce mixture of conditions in one sequence as illustrated by the FixSeqLP method.

2) PARAMETERS
The parameters of the proposed approach including those related to the label propagation and the segmentation algorithm. The discussions of the former can be found in [15], [31] and we only discuss the later part. There are two parameters for segmentation: the minimal length of subsequence κ and the dynamic coefficient λ.
If we take a close look at Table 1 it can be seen that when the value of κ are between 250 and 300 its effects on the algorithm performance is not very noticeable. Its effects only becomes significant when κ is above 300. A particular visual pattern of the furnace flame of FMF process mostly lasts about 7-15 seconds. If the minimal length of segmentation is too large, it is difficult to keep the homogeneity in the subsequence and it will make consequent classifier training more difficult. In the experiment, we found that when κ = 250, which corresponds to a video clip of 10 seconds, the algorithm achieves the best overall performance.
The dynamic coefficient λ provides a control of balancing between the contributions of cross-correlation and autocorrelation in the segmentation. The Former tries to make a linear approximation of each subsequence using the linear combination of the extracted latent variables. While the later locates those segmentation points where the dynamics pattern change. In the experiment, we choose λ = 0.5 so as to give equal weights to the two aspects.

VII. SUMMARY
In this article, we analyzed the problem of identifying the semi-molten working condition of FMF process. To tackle the problem of insufficient labeled samples, we introduce a semi-supervised classifier training strategy. First, the features of images are extracted by a convolution auto-encoder. Then, based on the data features, a dynamic multivariate segmentation method is proposed to segment the multivariate time series into subsequences which are served as samples for classifier training. The graph based label propagation is used to iterative spread the labeling information and train a LSTM classifier. Our method was compared with S3VM, co-training, self-training, entropy minimization algorithms and the label propagation method without using dynamic segmentation, showing significantly better accuracy than the other approaches. The proposed method was tested using industrial video data and showed promising results. The algorithm can be used to solve similar problems in a variety of video pattern identification applications.
One limitation of the proposed method is that the segmentation algorithm is based on linear approximation, which is fast and suit for online applications. However, this approach corresponds to the assumption that the temporal pattern of the same working condition can be linearly expressed and it might not be accurate in some cases. For more complex applications, it might be necessary to consider using nonlinear approximation methods to more accurately partition the feature sequence.