Real-time equipment condition assessment foR a class-imbalanced dataset based on heteRogeneous ensemble leaRning ocena stanu spRzętu

This study proposes an ensemble learning model for the purpose of performing a real-time equipment condition assessment. This model makes it possible to plan desired preventive maintenance activities before an unexpected failure takes place. This study focuses on the class-imbalanced problem in equipment condition assessment research. In reality, equipment will experience multiple conditions(states), most of the time remaining in the normal condition and relatively rarely being in the critical condition, which means that, from the perspective of data modelling, the distribution of samples is highly imbalanced among different classes(conditions). The majority of samples belong to the normal condition, while the minority belong to the critical condition, which poses a great challenge to the classification performance. To address this problem, a genetic algorithm-based ensemble learning model is presented. Furthermore, a self-updating learning strategy is presented for online monitoring, contributing to adaptability and reliability enhancement along with time. Many previous studies have attempted feature extraction and to set thresholds for equipment health indicators. This study has an advantage of omitting these steps, as it can directly assess the equipment condition through the proposed ensemble learning model. Numerical experiments, including two types of comparison studies, have been conducted. The results show the greater effectiveness of our proposed model over that of previous research in terms of the stability and accuracy of its classification performance.


Introduction
Prognostics and health management (PHM) is beneficial for daily operation and maintenance [21].PHM covers condition assessment, fault diagnosis, remaining useful life (RUL) prediction, maintenance decision and other considerations.Condition assessment is a fundamental activity to identify the current condition/state of equipment.Equipment ages and degrades with time.When equipment degrades to a certain degree or pass a certain threshold, it cannot operate well, which results in unqualified products, system breakdown or even casualties.Since equipment's reliability and stability are meaningful for ensuring the safe and continuous operation, effective equipment condition assessment is an important prerequisite.Moreover, condi-tion assessment could provide a convenience for several subsequent activities, such as condition-based maintenance, planning and scheduling [35,39].Condition assessment could be performed through either removing a component from operation (off-line) or doing online monitoring.Considering the cost and complexity of installation and removal, real-time condition assessment with continuous on-line monitoring is more economical and feasible.
Overall, there are three major categories, (i) criteria-based approaches [36,41], (ii) statistical-based approaches [3,13,18,20], and (iii) data-driven approaches.In criteria-based approaches, health indicators (i.e.main functions, reliability degree, working time, and deterioration degree) are proposed [37] to evaluate equipment condition.But these approaches have difficulties on indicators quantifying and Xiaohui Chen Zhiyao ZhAng Ze ZhAng

metoda opaRta na uczeniu zespołowym
This study proposes an ensemble learning model for the purpose of performing a real-time equipment condition assessment.This model makes it possible to plan desired preventive maintenance activities before an unexpected failure takes place.This study focuses on the class-imbalanced problem in equipment condition assessment research.In reality, equipment will experience multiple conditions(states), most of the time remaining in the normal condition and relatively rarely being in the critical condition, which means that, from the perspective of data modelling, the distribution of samples is highly imbalanced among different classes(conditions).The majority of samples belong to the normal condition, while the minority belong to the critical condition, which poses a great challenge to the classification performance.To address this problem, a genetic algorithm-based ensemble learning model is presented.Furthermore, a self-updating learning strategy is presented for online monitoring, contributing to adaptability and reliability enhancement along with time.Many

sciENcE aNd tEchNology
indicators causal interrelationship quantification.The fundamental principle behind the statistical-based approaches is the formulation of theoretical mathematical models for interpreting equipment deterioration.Although they describe well the deterioration of equipment over time, they have a limitation on stability and sensitivity when facing unexpected impacts (power failure, shocks, instantaneous overload or no-load, etc.) [11,12].As for data-driven methods [4,30,33], it tries to learn the data intrinsic properties and underlying relations through monitoring data in order to assess the equipment condition.Compared the former approaches, data-driven approaches are less complex and more applicable.We do not need to construct complex hierarchical structure, or extract feature, because most data-driven techniques always have automatically learning ability.However, these approaches rely heavily on properties of the training data.So, for a real-life condition assessment problem, new challenge comes out, as its database is highly class-imbalanced.
Our study focuses on this class-imbalanced problem for condition assessment.A dataset exhibits the class-imbalanced problem when the data samples of one class (majority class) outnumber the data samples of the other classes (minority classes).The latter usually denotes a topic of interest in a data classification problem.Actually, in real-world data-oriented applications, the class-imbalanced dataset is prevalent, e.g. in fraud/intrusion detection and medical diagnosis/ monitoring.With class-imbalanced dataset, the standard classifier, such as Decision Tree (DC), Random Forest and Support Vector Machine (SVM) [4,30], performs badly, because it has a tendency to bias towards the over represented class.Dominated by the majority class, the classifier lacks the generalization ability of classification rules for classes with minor samples, because the classifier may consider the samples in minority classes to be noise.
The research on classification with the class-imbalanced dataset has gotten much attention, since Japlowicz performed the experiments on a dataset with characteristics of various size, complexity and classimbalanced in 2000 [17].In his study, he discussed the assumption that the training set is well balanced in the majority of concept-learning systems, and he verified that class-imbalanced hinder the performance of standard classifiers.Further studies pay more attention to the classification performance for the class-imbalanced classification problems.Standard classification algorithms on class-imbalanced dataset suffer from a significant loss of performance, providing suboptimal classification results [5,22].The results often bias the majority class, leading to a higher classification error for minority classes [28].Therefore, new rules have been studied to better generalize the minority class to avoid treating them as noise.Increasingly, research has focused on trying to excavate and magnify the data intrinsic properties of the minority class.
Generally, the approaches for tackling classification problems for the class-imbalanced dataset are typically categorized as datalevel, algorithmic level and the combination of these two levels [14].Re-sampling is a common approach at the data-level, which aims at re-balancing highly imbalanced class distributions.Under-sampling strategies [1,23] decrease samples in the majority class, and oversampling strategies [9,24] increase samples in the minority class (classes).However, both strategies show drawbacks.Over-sampling may increase the risk of over-fitting and worsen computational burden of the learning algorithm, and under-sampling may lose some useful information.As a result, the re-sampled dataset can be completely different from the original one, because the original class distribution is altered.Mathew et al. [27] emphasised that fault stage diagnosis in industrial machines are often imbalanced and consist of multiple categories or classes.In their study, a weighted kernel-based oversampling algorithm has been put forward to generate minority samples in order to balance the class distribution in an SVM classifier.With this algorithm, a higher overall accuracy have been obtained.
Cost-sensitive learning incorporates approaches at the data-level, at the algorithmic level, or at both levels combined, considering higher costs for the misclassification of examples of the positive class with respect to the negative class and therefore trying to minimize higher cost errors [38].Cost-sensitive learning allocates unequal costs for different classes in the learning process based on the assumption that misclassification costs are already known [10].However, there is difficulty in determining the costs because the prior cost information is not available.If positive instances are sparse, cost-sensitive learning may not have the ability to construct appropriate decision boundaries.Another limitation is that cost-sensitive learning may work well when facing a not-highly imbalance dataset, but fail when dealing with a highly imbalance dataset [16].In [40], instability events were considered to be the reason for class-imbalanced dataset in power system short-term voltage stability assessment problem.combined the forecasting-based nonlinear synthetic minority oversampling technique and cost-sensitive learning, respectively dealing with classimbalanced dataset in data-preprocessing and algorithm aspects and achieved desirable performance.
Regard to algorithm level, ensemble learning is one of the best performing approaches.It is oriented towards the adaptation of base learning methods to be more attuned to class imbalance issues.The basic idea behind ensemble learning is to use more than one classifier to improve the overall accuracy.Ensemble learning has been widely used in many fields e.g.finance, manufacturing, bioinformatics, geography, medicine, information security and recommender systems to improve the classifier performance of single models [25].Othman et al. [29] proposed an ensemble discriminant classifier with four base learners to power transformers condition assessment problem, and verified that the proposed ensemble model outperformed the SVM classifier.Boosting, bagging and stacking are the three main strategies [16].Among them, boosting is the most commonly used.It highlights the misclassified samples at each iteration and reduces the bias from data by combining classification results from several weak learners.In each round, the weights for samples in minority classes are increased.
The ensemble learning model optimizes the overall classification performance depending on two factors, individual success of the individual learners and diversity.One way of providing diversity is to use different types of individual learners.Another way is using different training datasets.In this way, the same type of individual learner is adopted.According to whether different individual learners are used, ensemble learning can be divided into two groups: heterogeneous ensemble and homogenous ensemble.Homogenous ensemble is a convenient and prevalent approach, as choosing a certain amount of the same type of individual learner (homogeneous individual learner).Adaboost is the classic demon for boosting model with homogeneous individual learner [8,34].Lee et al. [19] conducted a SVM-based Adaboost model to address the class-imbalanced classification.Different factor scores were computed by categorizing samples based on the SVM margin.Another strategy for individual learners is to choose heterogeneous individual learners.Models with heterogeneous individual learners have advantages in learning different characteristics of the training dataset, since they use a diverse set of individual learners.However, in this strategy, previous research always makes the implicit assumption that every heterogeneous individual learner category only has one individual learner.For example, in [15], 20 individual learners coming from five categories were chosen to construct the individual learner base and genetic algorithm was implemented to search for the appropriate individual learners to combine the ensemble learning model.In our study, we doubt this assumption, and we argue that the heterogeneous ensemble learning model constructed by heterogeneous individual learners with a certain amount may have better performance than heterogeneous ones with only one learner in every heterogeneous individual learner category.

sciENcE aNd tEchNology
This study offers two main contributions.First, this study establishes a genetic algorithm-based ensemble learning model, which can greatly enhance the classification accuracy of the class-imbalanced dataset.Another contribution is that we optimize this model to realize dynamic condition assessment and achieve self-updating ability.This optimized model can help to improve the availability and reliability.Another contribution is that the proposed model, which omits the steps of feature extraction and setting thresholds for equipment health indicators, can directly be used to assesse equipment condition.In practice, the result of equipment condition assessment may help managers to the set for key measurements, a row vector, one data about the key measurements, external data among the key measurements sciENcE aNd tEchNology make decisions about operation and maintenance of the equipment.For example, it helps managers to decide when to prepare necessary materials and human resources before the occurrence of a failure.The remainder of this paper is organized as follows.Section 2 formulates the equipment condition assessment problem.Section 3 explicitly describes the proposed model and the way optimize it.Experiments are conducted to verify the performance of the proposed model in Section 4. Finally, concluding remarks and future research suggestions are given in Section 5.

Problem formulation
The notation that will be used throughout the paper is summarized in the Table 1.
Equipment condition assessment is an important activity that can visually reflect the current condition of equipment.This activity benefits managers, as it provides information about equipment condition and makes it possible to plan maintenance activities before failure.
In our study, equipment condition is graded into three broad classes, (i) "Healthy", (ii) "Minor defect", and (iii) "Critical defect".The descriptions of these three conditions are provided in Table 2. Additionally, as depicted in this table, three colour codes [2,7] are utilized to visually indicate the corresponding potential danger level of the three conditions.
Condition "Healthy" is the initial condition under which equipment can work well.Under the condition of "Minor defect", managers should pay more attention to the equipment and the maintenance plan should scheduled (which means the spare parts, human resources, maintenance tools and other required resources should be considered and prepared in case of need.) to prevent consequential damages and avoid undesirable consequences.Under the condition of "Critical defect", the maintenance activity is a pressing need because a fault could occur at any time. Let denote the whole dataset, where the measurement set X( ) is a matrix consisting of N key measurements from condition monitoring reflecting the health condition of the equipment and M is the number of samples in this dataset.The value i x ( ), in which N=N 1 +N 2. External data are related to the operation settings of the equipment, while internal data contain internal information such as vibration, temperature increases.The class label set Y ( denotes the condition of equip-ment (Eq.1).In the following parts, class 0, class 1 and class 2 are alternative expressions for condition "Healthy", "Minor defect" and "Critical defect", respectively: In essence, the condition assessment problem is a classification problem.That means we should identify the current condition/state (class 0, class 1, class 2) of the equipment.So a classification learning model should be conducted to learning the corresponding relationship between X and Y .As a result, when a certain i x is given, the classification learning model should quickly give the corresponding condition for the equipment, namely, i y .The relationship described by this model is denoted as Eq.2: ( ) ( ) In reality, equipment is working with desired reliability most of the time.That means, for the equipment condition assessment problem, the majority of samples belong to class 0, while minority of samples belong to class 1 and class 2. In this paper, we call class 0 the majority class, and call class 1 and class 2 the minority classes (class).
As the distribution of samples in these conditions is highly skewed, this equipment condition assessment problem is not a simple classification problem, but a specific one with class-imbalanced dataset.
In addition, the primary interest is devoted to class 1 and class 2 because they contain relevant information when making production operation and maintenance plans.Therefore, classification performance for the minority classes is key for condition assessment.

The genetic algorithm-based ensemble learning model
This section contains three main parts, a description of criteria for classification evaluation, the steps for the genetic algorithmbased ensemble learning model (GAEM) and how to optimise this model.

Criteria for classification evaluation
As minority class would be dominated by majority class, it is often meaningless to achieve high accuracy when dataset is class-imbalanced, especially when situation where the minority classes are more important and cannot be sacrificed.How to choose suitable criteria to evaluate the classification model's performance is also an important research point.In our study, for multi-class classification, a 3 3 × contingency table named confusion matrix is illustrated in Eq. 3, in which ij cm denotes the number of samples whose actual condition is i and the classification result for this sample is j : # 3 The criteria for classification evaluation in the class-imbalanced classification problem are important when comparing the performances of different classification models.Moreover, in regard to the multi-class classification, criteria become more crucial and intractable.As we should give more emphasis to the minority classes than the majority class, the commonly used criteria in binary classification,

Healthy
All the critical characteristic quantities are successively decreasing but always stay in a safe region, above the standard limit values.
Minor defect Some of the critical characteristic quantities are out of bounds, but the comprehensive influence is small.There appear slight defects in the ability to resist risks and adapt to the environment.

Critical defect
Serious deterioration appears, and critical characteristic quantities are out of bounds.The comprehensive influence is large, the equipment cannot normally carry out the regulated functions any longer, and failures can happen at any time.
accuracy and error, are not adequate.A single performance criterion can be misleading and may fail to evaluate performance on unseen data [29].Therefore, to obtain a more reliable evaluation, we utilize precision, recall, and F-measure to evaluate the output quality of our classification model.Precision measures the number of samples that are classified as positive and are actually positive, while recall measures the number of positive samples which are correctly classified as positive [16].More complicated than the binary classification problem, which only contains positive and negative samples, this multi-class classification problem contains three classes, denoted as 0, 1, and 2. A small difference from binary classification is that we make a fine adjustment about "positive samples" and "negative samples" in this multi-class classification.In terms of the precision for class 0, we consider the samples that belong to class 0 to be "positive samples".The other classes, namely, class 1 and 2, are considered "negative samples".In this way, these three classes will have their own precisions and recalls.i P and i R denote the precision and recall for class i, which are defined as below.Additionally, we adopt F-measure (also called F-score) [16] to give a balance to the conflict between precision and recall.Here, we use i F to denote F-measure, in which δ is a coefficient for the bias of the two criteria.
( ) ( ) In addition to those three criteria mentioned above, we also take the interrelationship between the assessment result and the actual result into account, not only in terms of the values of the two results, but also the time relationship.In terms of the relationship between the value, a symmetric matrix coe C is drawn (Eq.7), in which . The positive value in this matrix means award, while the negative value means penalty.In terms of the time relationship t C (see Eq. 8), two coefficients e c and l c ( e l c c ≤ ) are proposed, for early assessment and late assessment, respectively.We define three literal evaluations for the relationship in time order between the two results, shown in Table 3.Thus, the award-penalty matrix ap M (Eq.9) has been proposed to integrate the effect of the value and the time relationship between these two results: ( ) ( ) ( ) To evaluate the classification performance more accurately, we combine the aforementioned indicators to obtain the utility function u f (Eq.10), in which ij m denotes an element in ap M .This function contains two parts, F-measure and the award-penalty function value, with the weight p and q, respectively.The award-penalty value ( )

Steps of GAEM
A systematic method for condition assessment via a genetic algorithm-based ensemble learning model is proposed in this section.The flowchart of this method is graphically shown in Fig. 1.The blue part illustrates the process for training or retraining the ensemble learning model.The yellow part illustrates the process for the genetic algorithm.Two italic abbreviations, POP and IND, are used here.POP denotes the population and IND denotes the individual in the genetic algorithm.We will elaborate the key steps in the following parts.
Class strategy: In reality, some datasets may lack the labels (class) i y for each piece of data.In this case, first, we should label these data.For simplification, we consider RUL as the key factor in this strategy.Here, we define four terms, β 1 , β 2 , sp t and f t , where β β = β 2 , we consider the time the boundary value between condition "Healthy" and condition "Minor defect".In Fig 2, we depict the class strategy and the change curves of the point value in this figure denote the key measurements from condition monitoring reflecting the health condition of the equipment.
Data preprocessing: All the features are standardized to be a Gaussian distribution with zero mean and unit variance.Standardization of datasets is a common requirement for most ensemble learning models implemented in scikit-learn [32].With the un-standardized distributed dataset, the ensemble learning model might behave badly.
Form individual learner base: Different from previous studies, which assume that the number of each category of heterogeneous individual learners is the only one, this paper argued that the number maybe flexible.With this assumption, there are three main problems, (i) which heterogeneous individual learner should be taken in this ensemble learning model, (ii) the number of each heterogeneous individual learner, and (iii) how to rank these heterogeneous individual learners.
When tackling problem (i), some properties should be considered.Traditionally, it is generally believed that the individual learners should be as ac- curate as possible and as diverse as possible [16].Moreover, the simpler the individual learner, the better performance with lower variance the ensemble learning model will get.Thus, to ensure accuracy and diversity, pilot experiments on each individual learner should be conducted beforehand.In our study, we gathered 23 individual learners from the scikit-learn class libraries [32] for selection in Table 4. Pilot experiments are conducted with the given dataset.Then, the heterogeneous individual learner base is formed through a comprehensive selection strategy composed of a series of constraints, which is shown in Eq. 11.In this equation, com t and int t denote the computation time for the classification model and the interval time for condition monitoring, respectively: ,, , GA search: After selecting the suitable individual learners, GA is proposed to address the last two problems: the number of heterogeneous individual learners and how to rank these heterogeneous individual learners.In previous studies, greedy selection is the most widely used method for finding the best combination [26].Caruana et al. [6] used greedy algorithms for searching the best ensemble combination.They added one individual leaner at each step into the ensemble combination to maximize the model performance.Greedy selection is explicable and easy to operate, but this selection has obvious limitations that these algorithms can easily be stuck in a local optimum.Additionally, as the number of individual leaners increases, the number of possible combinations for ensembles increases exponentially.An exhaustive search for the optimal combination is not practical, since evaluation of each combination is computationally expensive [31].For this reason, heuristic algorithms, such as genetic algorithm, are more feasible for finding a near-optimal solution in a reasonable time.We used binary encoding to represent the number of each heterogeneous individual learner.The maximum number of each heterogeneous individual learner is E, and the total length of the chromosome is L. The chromosome is shown in Fig. 3.For simplicity, we set the utility function as the fitness function in this genetic process.
For the rank type of heterogeneous individual learners, we choose series connection strategy to combine heterogeneous individual learners to achieve the target that uses the next individual learner to opti-  mize the prior ones.In this way, the advantages of the diversity of individual learners can be retained.The results of the prior learner will be transmitted to the following ones.The samples that are correctly classified and wrongly classified will be split into two parts and greater weight will be put on the wrongly classified part.Then, these reweighted samples will serve as the new input data for the next individual learner to be reclassified.Repetition goes on until iterating through all the individual learners.In this way, we can easily see the phenomenon that individual learners increasingly focus on samples that are difficult to correctly classify, as in each round, the weight for samples in minority classes increases.
Train/Retrain the Ensemble Learning Model: The detailed processes for train/retrain the ensemble learning model are given in the blue dashed box in Fig. 1.K-fold cross-validation (K-CV) is adopted to construct a train dataset and validation dataset.The dataset is randomly and equally split into K folds.Out of these K folds, one is preserved as the validation dataset, and the other K-1 folds are used as the training dataset.This cross-validation process is repeated K times, with each of the K folds used exactly once as the validation dataset.
After giving a certain combination strategy for the heterogeneous individual learners, the structure of ensemble learning model is confirmed.Then, this model will be trained by feeding the training dataset.After that, a well-trained ensemble learning model will be validated by inputting the validation dataset.The results, known as the assessed condition, from the well-trained ensemble learning model will be compared with the actual condition of the validation dataset.These steps will be performed K times to realize the K-fold validation.Then the average fitness values of the K times will be saved as the final fitness value.

A self-updating strategy for GAEM
The self-updating process runs continually in parallel with the online monitoring assessment.At a certain time interval ( up t ), we put (save) the online monitoring data into the historical database (the database for GAEM) to form a new database.In this way, both the data volume and diversity are increased.By periodically inspecting and learning kinds of new situations, we can strengthen the reliability and adaptability of GAEM and to keep the classifier tightly tied to the newest equipment situation.With retraining/relearning from the new database, the parameters in GAEM are self-updated, and we named the new ensemble learning model (GAEM-II).The processes for GAEM and GAEM-II are illustrated in Fig. 4, there they are shown by solid line and imaginary line, respectively.
Through GAEM, we gain the Trained Classifier, which will give the condition assessment result ('0', '1', or '2') by feeding the realtime monitoring data.After up t , a new dataset is formed by adding the monitoring data into the historical dataset.Then, we retrain/relearn GAEM to update the parameters to attain GAEM-II.When GAEM-II is confirmed, GAEM will be replaced by GAEM-II to undertake in the work of real-time condition assessment for time up t .After time up t , GAEM-II will be re-updated.That means the self-updating process is always running in parallel with real-time assessment, and the period time for this process is up t .Here, we do not consider the updating time, because the time for updating the process is much faster than the monitoring interval int t .

Experiments and results
In this section, to evaluate the performance of the proposed model, numerical experiments, including two types of comparison studies, are described.The first type of comparison study is composed of (i) comparison with individual leaners, (ii) comparison with its homogeneous ensemble learning models, and (iii) comparison with common heterogeneous ensemble learning models.Another type of comparison study has been conducted by comparison with three popular ensemble learning models, namely, Adaboost, Random Forest and Gradient Boosting.

Dataset description
The dataset comes from the prognostics challenge competition at the International Conference on Prognostics and Health Management (PHM 2008).The dataset contains multiple multivariate time series, which are the life-cycle data of different engines, and the engines can be considered to be of the same type.Each engine starts from a different condition, and the degree of initial wear and variation is different and unknown.Therefore, the engine can be perfect or imperfect but not failing.In addition, the dataset contains noise and perturbations because of sensor noise.There are two types of data in this dataset, 3 operational settings data (internal data) and 21 sensor measurement data (external data data).All the experiments in this study were executed in an Wicro-Star with NVidia GeForce GTX 1050Ti GPU, an Intel Core i7-7700HQ (3.6 GHz, 4 cores) CPU and 16 GB RAM.All individual leaners are implemented from the software library skitlearn, and all codes are written by Python 3.6.
In the experiments part, the parameters are set as follows (Table 5).F1-measure (F1 for short, seen in Eq. 12) is utilized to balance the effect of precision and recall.Eq. 13 gives the expressions of the coefficient matrix coe C : ( ) As this dataset does not have labels for each piece of data, we first label this dataset through the class strategy mentioned in Section 3. The distribution of this dataset (Fig. 5) shows the majority of samples belong to condition "Healthy", which is more than 26 times to condition "Critical defect".

Fig. 5. Distribution of samples
To verify the challenge in class-imbalanced classification, first we use DT, which is a prevalent model for classification, to classify this dataset.We set the maximum depth of the tree to be 4. Table 6 illustrates the criteria results for the DT, and Fig. 6 shows two confusion matrixes, the original confusion matrix on the left-hand and the normalized confusion matrix on the right-hand.It is obvious that the precision, recall and F1 are very high (approximately 0.947) in class 0, while in class 1 and class 2, these three criteria are exceedingly low (approximately 0.562 and 0.130, respectively), which reflects that this DT classifier has a superior classification ability for the class with major samples, but a weakness for classes with minor samples, because the number of samples in class 2 are so few that the DT Classifier cannot accurately learn the features and properties of this class.Because of the skewed trend between classes, the fewest samples among class 2 are likely to be treated as noise, which also reduce the criteria for this class.In addition, in class 0, no samples among these 13776 test samples are classified as class 2, but in class 2, most samples are wrongly classified, with 76.4 percent samples classified as class 1 and 22.5 percent samples classified as class 0. Only 5 samples are correctly classified.There is another reason for the phenomenon that class 0 and class 2 are obstructed by class 1, it is easier to distinguish samples from class 0 vs. samples from class 2 than distinguish samples from class 0 vs. samples from class 1.
In reality, the conditions with relatively few samples, namely, class 1 (condition "Minor defect") and class 2 (condition "Critical defect") are more important than class 0 (condition "Healthy").Managers will put more emphasis on the classification performance of class 1and class 2. As traditional methods show weakness in terms of classes with relatively few samples, more suitable approaches should be proposed to achieve better performance in this class-imbalanced classification.

Experiment with GAEM
In this section, we report the results of the experiment performed with GAEM on the PHM 2008 database.After data pretreatment, the normalized dataset is attained.Then, pilot experiments on the 23 individual learners are conducted.Each of classifier is used to train the classification model on train dataset, and test on validation dataset.According to the selection strategy, we obtain the specified classifiers to form the individual learner base in Table 7.
The genetic algorithm searched for the optimal combination strategy of heterogeneous individual learners in GAEM: [ ] 8, 7,5, 4, 7,5,9 .The sequence of these 7 heterogeneous individual learners is Logistic Regression (LR), KNN, DC, Extra Tree Classifier (ETC), Quadratic Discriminant Analysis (QDA), MLP and SVC.Table 8 illustrates the criteria results for GAEM, and Fig. 7 shows two confusion matrixes, the original confusion matrix on the left-hand and the  As samples deteriorated from class 0 to class 2, the classification performance on different classes show a large difference, with an obvious downtrend in each criterion from class 0 to class 2.

Comparison studies 4.3.1. Comparison with individual learners, homogeneous and heterogeneous ensemble models
Comparison with Individual Leaners: To verify that the ensemble learning model will perform better than its individual leaners, we ran experiments on the 7 heterogeneous individual leaners.Fig. 8 shows the reports for criteria on these models.It is obvious that GAEM outperforms any individual leaners.So combining individual leaners indeed has the ability to optimize the classification performance.

Comparison with Homogeneous Ensemble Learning Models:
To verify the competiveness of the modified heterogeneous ensemble learning model to homogeneous ensemble models, we ran a set of comparison experiments.The homogeneous ensemble learning models here are composed of the individual learners used in GAEM, with the same number and the same parameter setting.The result is shown in Fig. 9. Comparing there results to those in Fig. 8, for most individual leaners, as the number of individual leaner increases, some criteria show better results because of the ensemble effect.Each individual learner1 optimizes the prior ones by adding the weight for the wrongly classified samples, while reducing the weights of correctly classified samples.On the other hand, GAEM still performs better than these homogenous ones, because GAEM has diversity, as it contains different types of heterogeneous individual learners, which the homogenous ensemble learning models do not have.and GAEM.The common heterogeneous ensemble learning model is composed of the 7 heterogeneous individual learners that have been chosen in GAEM.The result is shown in Fig. 10 and the criteria results are compared in Fig. 11.
From the confusion matrix in Fig. 10, the performance of this model is good and acceptable with a majority of samples correctly classified.That means combining these individual learners together to form this heterogeneous ensemble learning model is feasible and reasonable.Fig. 13 further shows that having more than one classifier in each heterogeneous individual learner will have better performance.That means it is indeed an optimal statedy to increase the number of each heterogeneous individual learners.

Comparison with Adaboost, Random Forest and Gradient Boosting
To validate the competitive performance, we contrasted the performance of GAEM with three popular ensemble learning models, namely, Adaboost, Random Forest and Gradient Boosting, under the same experimental setting on the PHM 2008 dataset.
To better understand the detailed performance, we have drawn the normalized confusion matrix for each classifier in Fig. 12, and Table 10 illustrates the performance of these four models.Adaboost and Random Forest show disadvantages in the recognition between class 0 and class 2 compared to Gradient Boosting and GAEM.All four methods perform well in recognizing class 0, as reflected in the high values for precision, recall and F1, because class 0 contains sufficient samples, which make it more amenable to learning the intrinsic properties of this class.However, Adaboost, Random Forest, and Gradient Boosting perform relatively badly on class 1, with all criteria under 0.8.This situation becomes worse in class 2, especially in terms of recall and F1, which fluctuate in [ ] 0.4, 0.7 .GAEM has higher precision and recall for class 1 and class 2, so better generalization is gained through this proposed method.
To compare the stability of these four models, a box-plot is drawn in Fig. 13.Because there is just a fine fluctuation in the criteria for class 0, the boxes for class 0 are omitted.It is clear from the figure that GAEM outperforms other ensemble learning models in terms of the stability of most reported criteria.On the criteria for class 2, these ensemble learning models perform unstably, with a wide range of fluctuations, while GAEM shows good stability and reliability with little fluctuations and high scores in all criteria.

Experiment on GAEM-II
In this section, we report the result of the experiments performed with GAEM-II by adding samples in the original dataset to form the new dataset.In four experiments, we add 5000 samples, 10,000 samples, 15,000 samples, and 20,000 samples, respectively.Then, we retrain the learning model to obtain the updated GAEM-II.The results are shown in Table 11 and the confusion matrixes are shown in Fig. 14, Fig. 15, and Fig. 16, in which GAEM-II.1, GAEM-II.2, and GAEM-II.3 denote these three new models with these three modified datasets.It is obvious that as the number of samples increase, the utility shows an uptrend from 0.835 to 0.848.That means GAEM-II can optimize the classification performance of GAEM through adding more samples to the dataset.

Conclusion
This study developed a genetic algorithm-based search method, replacing greedy search and exhaustive search, to search for a combination strategy for a heterogeneous ensemble learning model.In addition, a new attempt was made to modify the traditional heterogeneous ensemble learning models.We argue that the heterogeneous ensemble learning model constructed from a number of heterogeneous individual learners has better classification performance than that of heterogeneous models that only have one learner in every heterogeneous individual learner category.We made experiments and comparison studies to verify this opinion.
Another contribution of this study lies in the effectiveness of the proposed model.In contrast to other condition assessment method, the proposed method does not require feature extraction or indicators setting to assess the equipment condition.The proposed method can automatically extract the inherent and generalizable features of the dataset.In addition, with our model, real-time equipment condition assessment can be achieved, depending on the fast computation and the self-updating learning strategy.The biggest advantages of the proposed condition assessment method are the accuracy and stability in this class-imbalanced classification problem.
Our study discussed supervised classification for the equipment condition assessment with class-imbalanced dataset.Future work can also explore semi-supervised classifications in this field, as label process is costly and less available in reality.
-time equipment condition assessment foR a class-imbalanced dataset based on heteRogeneous ensemble leaRning ocena stanu spRzętu w czasie Rzeczywistym dla zbioRów danych o niezRównoważonym Rozkładzie w klasach.

1 N 2 N
the number of key measurements about external data the number of key measurements about internal data N the number of key measurements M the bias of the recall and precision CM the matrix for the result of condition assessment coe C the coefficient matrix for penalty and award t C the coefficient matrix for the relationship of results in time order e c , l c the coefficient for early assessment and late assessment ap M the award-penalty matrix p, q the weights for computing the utility function u f time t up t the time interval for updated the database K the times for K-fold validation

ijm
is mapped into the value domain of [ ]

Fig. 3 .
Fig. 3. Gene coding for the individual learners

Fig. 7 .
Fig. 7. Confusion matrixes with GAEM According to Fig. 8, the upper right corner and lower left corner of the confusion matrix are equal to 0, which means among 13776 test samples, no samples in class 0 is classified into class 2, and similarly, no samples in class 2 are classified into class 0. These results show the superiority of GAEM when recognizing between class 0 and class 2.

Fig. 9 .
Fig. 9. Reports for homogenous ensemble learning models and GAEM

Table 3 .
Literal evaluations for assessment result

Table 6 .
criteria results for DT

Table 5 .
Parameter valuesFig.6.Confusion matrixes with DT normalized confusion matrix on the right-hand.Table9illustrates the computation time of three main activities.The GA processing aims to find the proper individual learners' combination, and it costs approximately 2 hours.In addition, approximately 20 minutes are consumed for GAEM Training.These tow processes are conducted off-line, which means it is accessible for the manager, because they do not disturb real-time condition assessment process.For the online part, the proposed model GAEM only costs less than one second to assess the condition that the equipment remain in, which make it possible for real-time condition assessment.Therefore, even though the off-line parts are computationally expensive, the model GAEM still have the superiority to perform well in real-time equipment condition assessment.

Table 7 .
Individual learner base

Table 8 .
Criteria results for GAEM

Table 9 .
Computation time for activities

Table 10 .
Criteria results for Adaboost, Random Forest, Gradient Boosting and GAEM

Table 11 .
Criteria results for GAEM-II