Development of Indicator of Data Su ﬃ ciency for Feature-based Early Time Series Classiﬁcation with Applications of Bearing Fault Diagnosis

: Diagnosis of bearing faults is crucial in various industries. Time series classiﬁcation (TSC) assigns each time series to one of a set of pre-deﬁned classes, such as normal and fault, and has been regarded as an appropriate approach for bearing fault diagnosis. Considering late and inaccurate fault diagnosis may have a signiﬁcant impact on maintenance costs, it is important to classify bearing signals as early and accurately as possible. TSC, however, has a major limitation, which is that a time series cannot be classiﬁed until the entire series is collected, implying that a fault cannot be diagnosed using TSC in advance. Therefore, it is important to classify a partially collected time series for early time series classiﬁcation (ESTC), which is a TSC that considers both accuracy and earliness. Feature-based TSCs can handle this, but the problem is to determine whether a partially collected time series is enough for a decision that is still unsolved. Motivated by this, we propose an indicator of data su ﬃ ciency to determine whether a feature-based fault detection classiﬁer can start classifying partially collected signals in order to diagnose bearing faults as early and accurately as possible. The indicator is trained based on the cosine similarity between signals that were collected fully and partially as input to the classiﬁer. In addition, a parameter setting method for e ﬃ ciently training the indicator is also proposed. The results of experiments using four benchmark datasets veriﬁed that the proposed indicator increased both accuracy and earliness compared with the previous time series classiﬁcation method and general time series classiﬁcation.


Introduction
Bearings are one of the important components in rotary machines such as motors, wind turbines, helicopters, automobiles, and gearboxes [1]. The fault diagnosis of bearings is a crucial task because they are among the most important components of rotation machines; faulty bearings are one of the main causes of machine failure [2]. Consequently, predictive maintenance methods for bearings have attracted interest from both academia and industry. Jin et al. [3] developed a health index based on the bearing vibration signal and designed a method to detect bearing faults by selecting the appropriate threshold with a Box-Cox transformation. Singleton et al. [4] introduced a data-driven methodology, which relies on both time and time-frequency domain features to track the evolution of bearing faults. Kumar et al. [5] developed a health index using singular value decomposition, the average value of the cumulative feature, and Mahalanobis distance to evaluate and compared the four conditions of the bearing. Caesarendra and Tjahjowidodo [6] confirmed the change of the low-speed slew bearing condition from normal to failure using impulse factor, margin factor, approximate entropy, and largest Lyapunov exponent (LLE). Li et al. [7] proposed an approach for motor rolling bearing fault diagnosis using neural networks and time-frequency-domain bearing vibration analysis.
Time series classification (TSC) is a supervised learning task that assigns each time series instance to a predefined class, such as fault and normal [8]. In other words, TSC aims to train and use the classifier f to diagnose with time series x i for the bearing i asŷ i = f (x i ) whereŷ i is predicted fault status by the time series classifier f . Because each instance may have a different length, feature extraction is regarded as an essential step for the task when a typical classifier, other than RNN (recurrent neural network)-based ones such as long short-term memory (LSTM), is employed. Of course, RNN-based models can classify unequal time series but they may be improper for the early time series classification (ETSC) task due to their expensive computational costs.
TSC that includes feature extraction is called feature-based TSC and has been frequently used in various area including biomedical [9,10], manufacturing [11][12][13], and so forth.
Many studies have been conducted on feature-based TSCs for bearing fault diagnosis based on vibration signals. For example, Wu et al. [14] extracted features from vibration signals using multiscale permutation entropy and trained a support vector machine (SVM) for fault diagnosis. Goyal et al. [15] extracted statistical features, such as mean, standard deviation, root mean square, and skewness from a vibration signal collected by a noncontact sensor. An accelerometer then selected several features based on the Mahalanobis distance for training the SVM. They observed that a noncontact sensor can be applied to identify bearing faults, and that a linear SVM outperformed other SVMs. Gunerkar et al. [16] employed wavelet transform to extract time domain features from vibration signals and trained supervised models, including an artificial neural network (ANN) and a k-nearest neighbor algorithm. From their experiment, they observed that the ANN outperformed other models in terms of accuracy. In recent years, convolutional neural networks (CNNs) have frequently been applied to solve bearing fault diagnosis problems, because they have filters to extract features from images. For example, Zhao et al. [17] proposed a planet-bearing fault classification method based on synchrosqueezing transform and CNN. In this method, the vibration signal was converted into a time-frequency color map using a synchrosqueezing transform. Then, the map was input into the CNN with six convolution layers and four max pooling layers, which assigned the map to one of three predefined classes: inner race fault, outer race fault, and health.
In time-sensitive applications, such as fault detection, earliness is as important as accuracy because late fault diagnosis leads to delayed maintenance and can cause the bearing fault to become permanent despite the accurate diagnosis. Even a few seconds of delay can lead to a critical situation such as an engineering system breakdown. Earliness is a measure to determine how early a classifier begins the classification job and is computed by the average ratio of the time until the classifier starts the classification to the time to collect the full time series.
TSC that considers both accuracy and earliness is called early TSC (ETSC) [18]. A few studies have proposed ETSC methods. For example, Hatami and Chira [19] developed an ensemble for early classification consisting of two classifiers with reject option (CWRO), which determines whether it can classify a (partially) collected instance. A CWRO does not classify an instance if its maximum posterior probability (i.e., max k Pr(y = c k x)) or maximum decision function value is below a certain threshold. An instance is classified by the ensemble when every CWRO in the ensemble does not reject classification. The major limitations of the ensemble are as follows: First, it is difficult to determine the threshold of the reject option, especially for decision function values that are not probabilities. Second, the ensemble can reject an instance when it is hard to classify, even though it is fully collected, resulting in lower earliness. He et al. [20] proposed a shapelet-based early classification method for multivariate time series. The method extracts a set of shapelet candidates, conducts clustering of the candidates, and selects a core shapelet from each cluster based on the weighted mean of the accuracy and earliness for each class. A new time series instance is assigned to a class as long as the number of core shapelets for the class becomes a threshold or is randomly labeled. This method is very expensive in terms of computational complexity, and its performance, including accuracy and earliness, is highly dependent on the set of shapelet candidates. That is, if the candidate set is inappropriately constructed owing to missing values and ill-defined parameters, the classification performance may be poor. Xing et al. [21] proposed a method to extract interpretable features from time series for interpretability of ETSC in medical and health informatics, industry production management, safety, and security management. Ghalwash and Obradovic [22] presented multivariate shapelet detection (MSD) that extracts time series patterns from all dimensions of the time series that distinctly manifest the target class locally, and the time series were classified by searching for the earliest closest patterns. Mori et al. [23] presented a method for early classification based on combining a set of probabilistic classifiers together with a stopping rule (SR), which acts as a trigger to indicate when to output a prediction or when to wait for more data.
Even though both starting time and processing time impact the earliness, advancing the starting time is a more reasonable option because processing time is very difficult to be reduced and it is usually small and almost the same for every time series. In order to advance the starting time, it is important to decide whether the (partially) collected time series is enough for classification or not. In other words, decisions on data sufficiency should be made periodically, and one can start classification once data is decided as enough for classification. However, to the best of our knowledge, there is no previous research that has addressed the question, "is the time series long enough for classification?" Only a few studies addressed questions such as "is the financial time series long enough for clustering?" [24] and "is the time series long enough for identifying the qualitative changes?" [25].
This paper proposes a feature-based early classification method for bearing fault diagnosis with a data sufficiency indicator. This indicator determines whether a given partially collected signal is sufficiently long to be classified by a fault diagnosis classifier based on its similarity to a fully collected signal. If the indicator determines that it is sufficiently long, the classifier begins classifying the signal without further collection. The indicator does not have any risk of reject classifying an instance, which is a common problem in previous methods because it is not based on the shapelet or distance but on statistical features, and it predicts the bearing fault classifier decision rather than the actual status of the bearing. The remainder of this paper is organized as follows: Section 2 formalizes the early bearing fault diagnosis problem and develops a solution to the problem. Section 3 proposes a data sufficiency indicator for a time series classifier and explains how to use and train the proposed indicator in detail. Section 4 conducts experiments to demonstrate that the proposed indicator can increase both accuracy and earliness compared with previous methods. Section 5 concludes this paper and suggests future research directions.

Early Bearing Fault Diagnosis
Bearing fault diagnosis using TSC is based on assigning time series from a bearing (e.g., vibration signal) to one of several pre-defined statuses (e.g., normal, inner race fault, outer race fault, or ball fault). More formally, let x i = x i,1 , x i,2 , · · · , x i,T i or x i, 1:T i be the time series and y i be the fault status of bearing i (i = 1, 2, · · · , n), where x i,t is the signal collected at time t for the bearing. Then, the problem aims to train the classifier f to diagnose the status of bearing i asŷ i = f (x i ). However, it is difficult to develop a classifier with a raw signal (i.e., x i ), because the length of each bearing signal can differ from one another and be too long to train the classifier efficiently, and it even may not have significant features. For this reason, feature-based TSCs have been employed in many studies [9][10][11][12][13][26][27][28]. The usual process of developing a feature-based time series classifier is depicted in Figure 1.
A time series classifier with features can be expressed as follows: where ϕ k denotes the feature function k (k = 1, 2, · · · , m). The feature functions used in this study were adopted from [26] and are listed in Table 1. These features have been frequently used for bearing diagnosis problems, because they summarize bearing signals very well. For instance, crest factor indicates wear or cavitation, and root mean square shows the severity of bearing faults [29]. Readers can refer to [26,29] for more information on the nature of each feature function. A time series classifier with features can be expressed as follows: where denotes the feature function = 1, 2, ⋯ , . The feature functions used in this study were adopted from [26] and are listed in Table 1. These features have been frequently used for bearing diagnosis problems, because they summarize bearing signals very well. For instance, crest factor indicates wear or cavitation, and root mean square shows the severity of bearing faults [29]. Readers can refer to [26,29] for more information on the nature of each feature function. In Table 1, and indicate the frequency and power spectrum, respectively, of the spectrum line resulting from the estimation of the power spectral density of a signal.
As mentioned before, a bearing fault diagnosis task requires not only accuracy but also earliness. The accuracy and earliness of a classifier indicate how well it classifies instances and how early it can start and complete the classification, respectively. In other words, the signal should be classified as accurately and as early as possible. TSC considering earliness additionally is called early time series classification (ETSC). In order to classify a time series as soon as possible (i.e., for ETSC), it is important to start and complete the accurate classification early [8], as illustrated in Figure 2. As seen, ETSC starts (at ) and finishes the classification earlier than general time series classification (GTSC) starts (at ) and finishes. In order to classify a time series as soon as possible (i.e., for ETSC), it is important to start and complete the accurate classification early [8], as illustrated in Figure 2. As seen, ETSC starts (at ) and finishes the classification earlier than GTSC starts (at ) and finishes. In order to start earlier, a classifier should decide whether a bearing is a fault or not with partially collected time series , : = , , , , ⋯ , , . A classifier with a shorter classification time should be used to reduce the processing time. A feature-based classifier requires relatively smaller classification time and the feature values (e.g., mean, standard deviation, center frequency, etc.) do not change significantly once a sufficient amount of signal is collected. However, the problem to determine whether the partially collected time series is enough for decision on the fault still needs to be solved.

Feature Function Formula
Time domain  Table 1, e r and s r indicate the frequency and power spectrum, respectively, of the r th spectrum line resulting from the estimation of the power spectral density of a signal. As mentioned before, a bearing fault diagnosis task requires not only accuracy but also earliness. The accuracy and earliness of a classifier indicate how well it classifies instances and how early it can start and complete the classification, respectively. In other words, the signal should be classified as accurately and as early as possible. TSC considering earliness additionally is called early time series classification (ETSC). In order to classify a time series as soon as possible (i.e., for ETSC), it is important to start and complete the accurate classification early [8], as illustrated in Figure 2. As seen, ETSC starts (at τ) and finishes the classification earlier than general time series classification (GTSC) starts (at T) and finishes. In order to classify a time series as soon as possible (i.e., for ETSC), it is important to start and complete the accurate classification early [8], as illustrated in Figure 2. As seen, ETSC starts (at τ) and finishes the classification earlier than GTSC starts (at T) and finishes. In order to start earlier, a classifier should decide whether a bearing is a fault or not with partially collected time series The main problems considered in this study are whether the collected signal is sufficiently long to be classified by an early classifier, and when the classifier can start the classification. In other words, the problems are to determine whether Φ , : = Φ , , , , ⋯ , , and Φ = The main problems considered in this study are whether the collected signal is sufficiently long to be classified by an early classifier, and when the classifier can start the classification. In other words, the problems are to determine whether f Φ x i, 1: 2 , · · · , x i,T i are sufficiently similar that the classifier can begin classification, and to estimate the minimum timeτ i such that SIM f x i, 1: is the similarity between A and B as input for f , and α is a threshold.

Proposed Indicator
As explained in Section 2, it is important to decide whether the partially collected time series is enough for a classifier. In this study, we propose an indicator for the decision problem. The indicator is also a classifier and trained based on a bearing fault dataset, which is also used to train a classifier f , and thus it makes a decision quickly and accurately.
This section describes the proposed indicator in detail, focusing on its application to ETSC. Then, we explain how the indicator determines whether the partially collected signal is sufficiently long for classification based on the similarity between the partially and fully collected signals. Finally, we explain how the indicator is trained.
Let I f be an indicator to determine whether the collected signal x i,1:t is sufficiently long for classification by f until t for bearing i, expressed as follows: where "x i,1:t is sufficiently long for classification by f " impliesŷ i = f (x i,1:t ) = f (x i,1:T ), that is, the decisions of f for x i,1:t and fully collected signal x i,1:T = x i are the same. Thus, one can start classifying the signal of bearing i with x i,1:t when I f (x i, 1:t ) = 1. Note that the decisions of f for x i,1:t and fully collected signal x i,1:T = x i being the same does not guarantee a correct classification result (i.e., y i and f (x i ) = f (x i,1:t ) may be different). The specific process is presented in Figure 3, where τ 0 and τ are the start time and period of the indicator, respectively. That is, I f x i, 1:τ 0 +z×τ is calculated for z = 0, 1, 2, · · · until it becomes 1, and the partially collected signal starts to be classified.
classifying the signal of bearing with , : when , : = 1. Note that the decisions of for , : and fully collected signal , : = being the same does not guarantee a correct classification result (i.e., and = , : may be different). The specific process is presented in Figure 3, where and ∆ are the start time and period of the indicator, respectively. That is, , : ×∆ is calculated for = 0, 1, 2, ⋯ until it becomes 1, and the partially collected signal starts to be classified. As mentioned above, the indicator where is the decision function value of for class (c ∈ 1, 2, ⋯ , . The reason for using cosine similarity is that it is proper to express similarity between two vectors not based on their scales but on their directions [30], and direction is more important to measure the similarity between and , : .  As mentioned above, the indicator I f (x i, 1:t ) = 1 when x i,1:t and x i = x i,1:T i are similar to each other as input for f . In other words, x i,1:t is considered sufficiently long when the classification results of x i,1:t and x i,1:T i by f are similar to each other. The similarity between x i,1:τ i and x i as input of f , SIM f x i,1:τ i , x i is defined as the cosine similarity between two vectors δ(x i ) = (δ 1 (x i ), · · · , δ C (x i )) and δ x i, 1:τ i = δ 1 x i, 1:τ i , · · · , δ C x i, 1:τ i as follows: where δ c (x i ) is the decision function value of x i for class c (c ∈ {1, 2, · · · , C}). The reason for using cosine similarity is that it is proper to express similarity between two vectors not based on their scales but on their directions [30], and direction is more important to measure the similarity between δ(x i ) and δ x i, 1:τ i .
For the decision function δ c (x i ), one can use the hyperplane, w c x i + b c , for class c if an SVM is adopted as the classifier. If the classifier is an ANN, then the output node c can play the role of a decision function, and Pr(y = c) × Pr(x i y = c) can be used if the naïve Bayes classifier is used. δ(x i ) and δ x i, 1:τ i are used instead of the predicted classes f (x i ) and f x i, 1:τ i to prevent the case where f (x i ) and f x i, 1:τ i are coincidentally the same. Cosine similarity is adopted because it is appropriate to calculate directional similarity, and setting the similarity threshold is easy because its value is in [-1, 1]. It should be noted that the indicator, which is also regarded as a classifier, does not directly calculate SIM f x i,1:τ i , x i but predicts whether the similarity is greater than a threshold because it is used at time τ i < T i when x i, τ i +1:T i is unknown. Figure 4 shows the training process of I f , which consists of four steps. It should be noted that the classifier f is also trained in parallel in the process. 7 of 13 is appropriate to calculate directional similarity, and setting the similarity threshold is easy because its value is in [-1, 1]. It should be noted that the indicator, which is also regarded as a classifier, does not directly calculate , : , but predicts whether the similarity is greater than a threshold because it is used at time < when , : is unknown. Figure 4 shows the training process of , which consists of four steps. It should be noted that the classifier is also trained in parallel in the process. As shown in Figure 4, the feature dataset = Φ , | = 1, 2, ⋯ , is generated by extracting features from the raw dataset = , | = 1, 2, ⋯ , . The classifier is trained with the feature dataset . The indicator training dataset = ⋃ , : , , : | = , + Δ , + 2 × Δ , ⋯ , , where , : = 1 if , : , is equal to or greater than the threshold ; otherwise, , : = 0 and generated using the following algorithm. In this algorithm, and Δτ denote the first time and period, respectively, to check if partially collected signals are enough for classification by the indicator. , , and Δτ are user-defined parameters, which impact on both training time and processing time of the proposed indicator. Specifically, the bigger is, and the smaller and Δ are, the greater the number of iterations to train the indicator is possible, and, thus, the more accurate the indicator is expected to be. As shown in Figure 4, the feature dataset D f = (Φ(x i ), y i ) i = 1, 2, · · · , n is generated by extracting features from the raw dataset D = (x i , y i ) i = 1, 2, · · · , n .
The classifier f is trained with the feature dataset D f .
The indicator training dataset x i ) is equal to or greater than the threshold α; otherwise, ψ i,1:τ = 0 and generated using the following algorithm. In this algorithm, τ 0 and ∆τ denote the first time and period, respectively, to check if partially collected signals are enough for classification by the indicator. α, τ 0 , and ∆τ are user-defined parameters, which impact on both training time and processing time of the proposed indicator. Specifically, the bigger α is, and the smaller τ 0 and ∆τ are, the greater the number of iterations to train the indicator is possible, and, thus, the more accurate the indicator is expected to be. Algorithm 1. Generation of the indicator training dataset.
Step 8. If i > n, terminate the algorithm. Otherwise, go back to step 2 Output D I Finally, the indicator is trained with D I . It should be noted that D I is usually class-imbalanced (i.e., I f (x i,1:t ) = 0 for most i and t); therefore, oversampling or undersampling may be necessary to solve the problem and train an unbiased indicator. Clearly, the parameters τ 0 and ∆τ have an impact on the effectiveness and efficiency of the proposed indicator. There are, however, no ground rules in setting the parameters. When training the indicator, there is no information about τ 0 and ∆τ. In this case, a tenth of the sampling frequency may be a good choice. When using the indicator, we can consider the first time of x i , τ 0,i as "sufficiently long for classification by f " (i.e., the smallest t satisfying I f (x i,1:t ) = 1 in Step 3 of Algorithm 1). We suggest that τ 0 be set as min τ 0,1 , τ 0,2 , · · · , τ 0,n because the indicator can find τ 0,i with the highest efficiency when τ 0 = min τ 0,1 , τ 0,2 , · · · , τ 0,n . It should be noted that one cannot guarantee min τ 0,1 , τ 0,2 , · · · , τ 0,n < τ 0,n for every n > n (i.e., the instance that is not in the training dataset), and min τ 0,1 , τ 0,2 , · · · , τ 0,n − τ 0,n is the loss of decision time when min τ 0,1 , τ 0,2 , · · · , τ 0,n > τ 0,n .

Objective and Process
The objective of the experiment is to verify whether the proposed indicator is better than CWRO in increasing earliness without loss of accuracy. The specific processes using a dataset are as follows: Step 1. The dataset is randomly split into a training and test dataset for objective evaluation of the proposed indicator. Specifically, the set of indices I = {1, 2, · · · , n} of time series instances is randomly separated to I Train and I Test with a ratio of 7:3. That is, 70% of samples is randomly selected whose indices are in I Train and is used to train the model, and the remaining 30% of samples in I Test is to test it. Step 2. A classifier and an indicator are trained using the training dataset (x i , y i ) i ∈ I Train , as depicted in Figure 4. We selected ANN and SVM as a classifier, because they have been most frequently used as a feature-based time series classifiers in previous research, e.g., in [14][15][16][17]. Each classifier is trained by means of all features presented in Table 1, as was done in [26].
Step 3. The trained classifier is tested using the test dataset, (x i , y i ) i ∈ I Test , in terms of the micro f1-score. It is employed as an accuracy measure because it is a proper measure of multiclass classification, which may have a class imbalance problem. The micro f1-score, which is the harmonic mean of micro precision and recall, is calculated as follows: micro F 1 = 2 × micro precision × micro recall micro precision + micro recall (4) where micro precision and recall are calculated as follows: where TP c , FP c , and FN c indicate true positive, false positive, and false negative, respectively, when class c is regarded as positive.
Step 4. The accuracy and earliness of the classifier trained with the proposed indicator are calculated using x i, 1:τ i , y i i ∈ I Test , where τ i indicates the minimum value among t ∈ {τ 0 + z × τ|z = 1, 2, · · · } satisfying I f (x i,1:t ) = 1 (i.e., SIM(x i,1:t , x i ) ≥ α). Earliness is calculated as follows: where τ i' denotes the classification start time for time series instance i'. For accurate and efficient decision for the indicator, our computational experience shows that α should be equal to or bigger than 0.9. τ 0 and ∆τ can be determined as proposed in the final paragraph of Section 3.
Step 5. The accuracy and earliness of the classifier trained with the CWRO approach are calculated using x i, 1:υ i , y i i ∈ I Test , where υ i indicates the maximum value among t ∈ {τ 0 + z × τ} satisfying the standard deviation of δ(x i ) ≥ ε.
Step 6. Accuracy and earliness obtained from Steps 4 and 5 are compared.

Datasets
We collected four benchmark datasets of bearing vibration from an accelerometer from the existing literature. The dataset information is presented in Table 2 and refers to [31,32] for detail setting to collect the datasets. Note that because the sampling durations of the datasets are 10 or 40 s, one may think that ETSC is not necessary. However, the signal is continuously collected when bearing fault detection is applied in the real world; thus, the real sampling duration could be in hours, days, and even weeks.  Table 3 compares GTSC, CWRO, and the proposed model (a classifier with the proposed indicator) in terms of accuracy and earliness. As explained in Section 4.1, accuracy and earliness were measured using (4) and (7), respectively. In Table 3, each line denotes the accuracy and earliness of a classifier (SVM or ANN) for the dataset (#1-#4) when a preprocessing model for ETSC (GTSC, CWRO, or the proposed model) and its parameter (none for GTSC, ε for CWRO, and α for the proposed model) is applied. Numbers in boldface represent the best among the results of the given dataset and classifier.

Results
Since GTSC does not consider earliness and uses the entire signal, earliness is zero. For dataset #1, the earliness of CWRO is zero except for the case where the classifier is SVM and ε is 0.5, implying that there is no clear difference between partially collected time series under normal and fault status, and thus CWRO is not appropriate for this kind of dataset. However, the proposed model shows not only high earliness but also higher accuracy than that of GTSC. For dataset #2-#4, CWRO shows non-zero earliness but small earliness and low accuracy. From the results, we observe the following: First, a general time series classifier does not always yield the best classification performance in terms of accuracy. This implies that using the fully collected time series does not guarantee higher accuracy; instead, dimension reduction techniques, including early classification, may increase accuracy. Second, the proposed model outperforms CWRO and general time series classifiers in terms of both accuracy and earliness for all cases. In addition, the range of ε is [0, ∞), but the range of α is [−1, 1], implying that it is easier for the user to set the value of α than that of ε. Third, the accuracy and earliness highly depend on the dataset and classifier used. It is obvious that classification performance depends on a classifier and dataset for every classification problem, and the proposed indicator depends on the used classifier.

Conclusions
Bearing fault detection is one of the most important tasks in the manufacturing industry, which is often accomplished by TSC. Most previous studies focused on accuracy but failed to consider earliness. In time-sensitive applications such as bearing fault detection, earliness is a very important measure for a time series classifier because it is highly related to cost and safety. Although a few ETSC methods have been proposed, they are unsuitable for applications in fault detection problems because of reasons such as the difficulty of parameter setting, improper features for fault detection, and low accuracy.
In this paper, we proposed an early bearing fault diagnosis method based on a data sufficiency indicator. The indicator determines whether a signal collected within a specific period is sufficiently long to be classified by the fault diagnosis classifier. The experiment with benchmark datasets confirmed that the proposed method outperforms previous methods in terms of accuracy and earliness. Although this study focused on bearing fault diagnosis, the proposed indicator can also be applied to any type of ETSC problem.
There are two future research directions based on the limitations of the present study. First, we employed feature functions from previous studies. Although these are frequently used in research, it remains uncertain whether they are effective for ETSC. Therefore, it is necessary to develop feature functions for early bearing fault diagnosis. Second, the proposed indicator was specially designed for the given classifier. In other words, the indicator is highly dependent on the classifier for bearing fault diagnosis, implying that it may not show good performance when another classifier is used. Therefore, the second research direction is to develop a robust indicator that is almost independent of a specific classifier and shows good results regardless of the classifier used. Third, we will modify and apply the proposed indicator for ETSC in a different area whose sampling rate is much longer than seconds. Fourth, selecting the τ 0 and τ 1 impact on the processing time of the proposed indicator, and thus we will develop a method to select the best values of them for each time series to reduce the processing time in future research. Finally, we will develop a hybrid model of the proposed model and CWRO for more efficient and effective ESTC.