Establishment and optimization of sensor fault identification model based on classification and regression tree and particle swarm optimization

The accuracy of structural state evaluation may be affected by the damaged piezoelectric sensors. Therefore, it is necessary to identify the sensor fault during monitoring. This paper proposes a method based on classification and regression tree (CART) and particle swarm optimization (PSO) to improve the efficiency of potential feature sets selection for sensor fault classification and build an identification model with the best performance. Firstly, the signal features of three structural changes and four sensor faults were extracted with five indexes. Then the decision trees (DT) for sensor fault classification were built based on different index combinations whose performances were then evaluated by the designed fitness function. Finally, PSO was used to optimize the searching for the best index combination. The results show that compared with the exhaustive method, adopting PSO for DT optimization can greatly simplify the search process. When the particle population is 5 and 10, the fitness converges to the optimal solution after only 6 and 4 iterations respectively. Although the DT with the best fitness is trained with only two indexes, its accuracy is higher than those trained with more indexes and the classification accuracy of 64 samples reaches 98.4% which shows the feasibility and practicability of the method.


Introduction
Structural health monitoring using piezoelectric materials as sensors and actuators has attracted the attention of many researchers in recent years [1][2][3]. However, the coupling effects of environmental erosion, material aging, long-term effect of load, and effect of fatigue and mutation will inevitably lead to damage accumulation and resistance degradation of the monitoring system. Due to the frangibility, piezoelectric devices become one of the most vulnerable parts of the system. It is easy to judge the complete failure of piezoelectric devices, while the partial damage will directly lead to the inaccuracy of structural damage identification and location. Therefore, identifying the piezoelectric lead zirconate titanate (PZT) fault is essential for the effective operation of the structural health monitoring system.
There are currently two electromechanical impedance (EMI) based methods for sensor fault identification and evaluation. The first one takes statistical indicator as the damage index to quantify the signal difference before and after the damage. The severe damage will cause a larger index than the mild damage. Common indexes include root mean square deviation (RMSD) [4,5], mean absolute percentage deviation (MAPD) [6,7], correlation coefficient deviation (CCD) [8,9], etc. Huynh et al [10] concluded that the structural damage influences the imaginary impedance only at resonances while the sensor defects cause significant variations over the whole frequency band. By using the RMSD metric, they diagnosed the debonding or breakage states of the sensor. Park et al [11] assessed the validation of faulty sensors with RMSD. They revealed that light fracture on the PZT would affect the sensor detection for small structural damage while a severe breakage would deprive its Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. capability of monitoring. The method can effectively extract the change of signal and is easy to operate. However, due to the complexity of different sensor faults' impact on the signal, this global method is easy to ignore the local characteristics of the impedance curve. Therefore, it is difficult to identify and classify the signals under different conditions by using single damage indexes.
The second method combines the EMI technique with intelligent algorithms. Lopes et al [12] utilized measured electrical impedance signals for input patterns of the artificial neural network (ANN) to assess the structure quantitatively. With experimental verification, they concluded the technique can detect the damage in an early stage without prior knowledge of the structure. De Oliveira et al [13] used particle swarm optimization (PSO) to automatically select the optimal parameters of the fuzzy ARTMAP network (FAN) algorithm and took the Kappa coefficient as the objective function to be maximized. After the optimization, the success rate of the damage identification is improved. Jiang et al [14] compared the admittance characteristics of structural damages and sensor faults and took LibSVM to identify the cases and degrees of sensor damage. However, the method they proposed needs to manually compare and screen out the indexes that can discriminate sensor damage from structural damage (have the potential to classify or predict). The process is cumbersome and complex. Besides, it is difficult for the method to judge whether the model trained with the selected indexes achieves the best performance.
To efficiently build a model for PZT fault classification with the best performance, five indexes were first used to extract the signal characteristics of structural changes and sensor self-faults. According to different input combinations of the five indexes, 31 decision trees (DT) were built with the classification and regression tree (CART) algorithm. After determining a fitness function that comprehensively evaluates the classification accuracy and tree size, we ordered the potential index combinations according to the fitness of their corresponding DTs, then introduced the PSO algorithm to simplify the process of choosing among a collection of potential index combinations for classification. The method realizes the construction of the indexes set that have classification potential and the DT model with the best performance.

Experimental scheme design and signal analysis
In the laboratory environment of 24°C and humidity of 30%, an experiment was carried out to study the influence of structural changes and different sensor faults on the signal. A 250 mm×250 mm × 3 mm square plate was taken as the host structure on which there are 4 bolts marked as A, B, C, and D respectively. The distances between the center of each bolt and the two nearest boundaries are both 40 mm. Four Φ 20 mm× 2 mm PZTs (PZT-5A) numbered as 1#, 2#, 3#, 4# were pasted at the four midpoints of bolt connections. The density of the sensor is 7730 kg m −3 , the Young's modulus is 75 GPa and the Poisson's ratio is 0.35. The piezoelectric constant is 450×10 -12 C N −1 , the resonant frequency is 98 kHz, and the static capacitance is 2500 PF. Three structural changes including bolt looseness, hole damage, and local stress change were set in the experiment. Under the normal working condition, the tightening torque of each bolt is 25 N·m and this state of the structure was taken as the benchmark. The specific setting steps of three structural changes are as follows: in order from A to D, we loosened only one bolt by 360°in each group and set four groups of bolt looseness; The occurrence of hole damage was simulated by thoroughly removing one bolt for one group, and the bolt was removed in the order of A to D; The local stress of the structure was changed by stacking a different number of specimens at the intersection of the plate diagonal lines. We added one specimen for each group and 1-4 specimens were superimposed in turn. The total mass of the plate is 512 g, and each specimen weighs 58 g. The inner and outer diameters of the specimen are 30 mm and 50 mm respectively. The distribution of sensors and bolts and the superposition location of the specimen are shown in figure 1(a). The setting of the local stress change is (taking two specimens as an example) as shown in figure 1(b). 1 V was taken as the excitation voltage and the signal was collected by a WK6500B precision impedance analyzer ( figure 1(c)). By experiment, the frequency domain was set as 30 kHz-1 MHz and the step frequency was 1.2 kHz. The relative size of the PZT and the specimen is shown in figure 1(d).
Control the environment temperature, humidity, and the surrounding vibration to eliminate the influence of factors other than the structure on the results. Each group of data was measured five times and the average was obtained to reduce the accidental error of the instrument. After comparison of the signals before and after the changes, the influence of three structural changes on 1#PZT conductance (the real part of electrical admittance) is shown in figures 2(a)-(c). The x-axis shows the serial number of the signal, the y-axis represents the frequency, and the z-axis denotes the conductance. The last curve on the plane YOZ reflects the superposition of 1#−5# signal. It can be seen from the figures that the curve trend does not alter significantly in the whole frequency domain after changes which indicates the high similarity of the benchmark and the signal under structural change conditions.  To explore the influence of sensor fault on signal, four degrees of pseudo soldering, debonding, wear, and breakage were studied in the experiment which was set on 1#−4# PZT respectively. As the pseudo soldering of sensor will mainly increase the contact resistance between the solder joint and wire, we connected 1-4 of 20 Ω resistors in series between the positive electrode of PZT and the receiving end of the analyzer to simulate the occurrence of different pseudo soldering degrees. The resistance circuit for the series connection is shown in figure 3(a). For sensor debonding, the debonding area, Area 1, was set from 10% of the original bonding area to 40%. 2# PZT with 40% debonding area is shown in figure 3(b). The sensor fault of wear was set on 3# PZT and there was a thickness loss of 0.1 mm-0.4 mm after wearing the surface of the PZT in turn. Figure 3(c) shows the state of 3# PZT when its thickness was reduced to 1.6 mm. The working condition of sensor breakage is shown in figure 3(d) and the breakage area is set as 10%, 20%, 30%, and 40% of the original area in turn. The cases and degrees of sensor self-faults are shown in table 1.
Plot the benchmark and the signals of four sensors faults, as shown in figure 4 where figures 4(a)-(d) display the conductance of PZT pseudo soldering, debonding, wear, and breakage in turn. Among them, 1# curve serves as the signal of benchmark and 2#−5# curves are the data of mild damage to severe damage. It can be seen from figure 4(a) that the difference between the maximum and the minimum of the conductance spectrum gradually decreases with the increasing degree of pseudo soldering. In many studies, the peak and the peak frequency of the conductance are both important physical quantities to portray the tested structural properties which have a wide application in characterizing the degree of soil compaction in road construction [15], the thickness loss of metal materials after corrosion [16], and many other fields [17,18]. The flattening of the conductance means the gradual loss of the sensor detection capacity when the pseudo soldering is getting severe. By comparing the curves before and after debonding in figure 4(b), it can be concluded that the sensor  debonding does not have much influence on the trend of the signal. The observation of the signals' superposition curve indicates that the debonding signal still has a strong correlation with the benchmark and the defect mainly changes the signal at the extreme value. In figure 4(c), the curves under PZT wear demonstrate similar characteristics change with the pseudo soldering as the two faults both result in the decrease of the conductance. However, different from the curves of pseudo soldering, we can still easy to distinguish the peak and valley of the conductance after wear. Figure 4(d) displays the effect of the four breakage degrees on the signal. When the breakage area is 10%, its signal keeps a consistent trend with the benchmark. With the breakage area increasing from 10% to 40%, the conductance in the test frequency band decreases and it shows completely different characteristics from the benchmark in the frequency band greater than 500 kHz which may be caused by the unsmooth breakage edge. The sensor breakage markedly destroys the benchmark features which will greatly affect the effectiveness of detection.
To further quantify the influence of different PZT faults on the signal, five indexes were chosen to extract the characteristics of the conductance curve under different working conditions. The five indexes are Spearman's correlation coefficient [19,20] which represents the correlation degree between damage and benchmark signal; peak frequency shift; value change of the peak; RMSD to characterize the deviation between damage and benchmark signal; value change of the valley. The indexes were labeled as 1#−5# index. The expression of RMSD in 4# index is shown in equation (1).
Where, ( ) Y Re i 0 =the real part of electrical admittance before damage; ( ) Y Re i =the real part of electrical admittance after damage; i=the serial number of frequency points; N is the number of sampling points, N=800 in this experiment. To extract the curve features in the frequency band as comprehensively as possible, the 30 kHz-1 MHz was divided evenly into five segments and 2#, 3#, 5# indexes were used to extract the features in each frequency band. The normalized indication intervals of five indexes for structural changes, pseudo soldering, debonding, wear, and breakage are shown in table 2.

Rule mining of PZT faults classification based on CART algorithm of DT
In this section, we exploited the CART algorithm proposed by Breiman et al [21] in 1984 to construct the DT model which is used to distinguish sensor faults from structural changes and classify the four PZT defect cases. The algorithm derives a hierarchy of partition rules with respect to a target attribute of a large dataset [22] and uses the Gini index to select the optimal feature and determine the optimal binary segmentation point of the feature [23]. Suppose that there are n classes in the classification problem, and for a given sample, the probability that it belongs to the kth class is p k , the Gini index expression of the probability distribution is shown in equation (2). In particular, for the binary classification problem, let the probability of the sample points belonging to the first class be p, and the Gini index changes to . Given a data set y containing D data, and these data are divided into n classes. The number of data corresponding to each class is  C C C , , , . The Gini index estimation of the data set can be obtained, as shown in equation (3).
Gini(y) is also known as the empirical Gini index. It is easy to prove that when D is determined and Gini(y) takes the maximum value, 1-1/n. Further, define the conditional Gini index of the dataset y when a certain feature A is determined. Specifically, if A is continuous, the way to divide Y 1 and Y 2 is y y a Y y y a : , : , where y A represents the value of feature A, a p is the split criteria. Let the number of data contained in the data set of Y 1 and Y 2 be D 1 and D 2 , then under the condition of given feature A, the conditional Gini index of data set y is (equation (4)): Gini(y) represents the uncertainty of data set y, while Gini(y, A) represents the uncertainty of data set y after division based on a certain value of feature A. The greater the Gini index, the greater the uncertainty of the data set. The information gain of feature A to data set y defined by Gini index is called Gini gain, ( ) g y A , .

Gini
The expression of Gini gain is shown in equation (5).
The greater the Gini gain, the greater uncertainty reduction of the feature and its segmentation point to the classification of y which means stronger the classification ability. Based on the training data set, the DT model is built with the largest Gini gain and its corresponding split criteria. In the experiment, the constant coefficient of the index is 1 or 0 depends on whether the index is taken as the input parameter of the model or not. There are 2 5 coefficient combinations of five indexes and we studied the performance of the DT trained with 31 (12 5 -1) coefficient combinations except for the coefficient [0, 0, 0, 0, 0]. The combination of 31 index coefficients is shown in figure 5. Taking all the five indexes as the input parameter, i.e. the coefficients of all the indexes are 1 and the trained DT model is shown in figure 6. The x1-x5 at branch nodes represent 1#−5# index, and 1-5 at leaf nodes represent structural changes, pseudo soldering, debonding, wear, and breakage, respectively. For a DT model, we should evaluate not only the classification accuracy but also the scale of the DT as CART selects split attributes according to the information gain which may lead to too many and too long rules, thus affecting the speed of the classification. Therefore, to comprehensively assess the quality of the DT model, we adopted two common methods [24] resubstitution using the mean squared error (MSE) and 10-fold crossvalidation to estimate misclassification rates [25] and analyze the model size by extracting the rule number [26] figure 7, each index shows a downward trend of fluctuation when the index serial number gradually increases as there are more indexes taken to the training of the DT. It can be concluded that different indexes contain different damage information of structures and sensors. With more indexes used for training model, the classification of the DT will be more accurate. The observation from figure 7(b) shows that the error of the 31st coefficient combination [1, 1, 1, 1, 1] is greater than the error of the 11th combination [0, 1, 0, 1, 0]. The classification effect of the model trained with five indexes is inferior to that of the model trained with two indexes, indicating that there is not an absolute positive correlation between the classification accuracy and the number of indexes input. The decrease of the model accuracy may be caused by the introduction of noise after training DT with the index unrelated to the working condition of structure and sensor, resulting in the poor congenital structure of the tree. A conclusion can be drawn that the classification potential for each index is not the same. The effect of a classifier is not only affected by the various potential of each index input but also influenced by the combination of indexes. To further comprehensively and quantitatively evaluate the performance of the model, we defined an evaluation function, as shown in equation (6).
Where, sort 1 , sort 2 , sort 3 , sort 4 represent the ranking of the nth DT in the aspects of resubstitution error, crossvalidation error, number of the rules, and the average length of the rule. The classifier with higher ranking evaluation indexes will get a lower score. a b g s , , , represent the weight of sort 1 , sort 2 , sort 3 , sort 4 , respectively, and a b g s    figure 8. In figure 8, we can see from the T 3 that the DT under the 11th, 17th, 23rd, 25th, 26th, 28th, 30th, and 31st coefficients combination has the optimal scale, and the scores are all 9. According to curve T 2 , the classification accuracy reaches the highest score when the index coefficient combination is [0, 1, 0, 1, 0] at the 11th group and its score is 11.5. After a comprehensive evaluation of the classification accuracy and scale of the tree, the curve T 1 shows that when the coefficient combination is [0, 1, 0, 1, 0], the DT trained with 2# and 4# indexes has the best performance. It is worth noting that the accuracy and scale of the DT built in the 30th group (the coefficient   [0, 1, 1, 1, 1]) and the 31st group (the coefficient combination is [1, 1, 1, 1, 1]) are consistent. Combined with the absence of x1, i.e. 1# index, in spilt criterion in figure 6, it can be considered that 1# index cannot distinguish structural changes and sensor self-faults. By comparing the performance of DTs under different index coefficient combinations, we can conclude that the indexes selected for model training has a great impact on the classification performance. However, the way to determine the best index combination is still needed to be tackled. It is not advisable to enumerate every possible index combination, because when there are many indexes in the training set, the number of the combinations may be very large. If the characteristic indexes increase from 5 to 10, the possible combination of indexes will boost from 2 5 to 2 10 =1024. Using the exhaustive method to find the optimal combination will result in a huge computational cost. Therefore, we used the PSO algorithm with the powerful optimization ability to simplify the search for the best combination of indexes then determined the DT with the best classification effect.

Optimization of DT classification model with PSO algorithm
PSO, an optimization algorithm proposed by Kennedy and Eberhart [27] in 1995, has been successfully applied to solve many problems [28][29][30] as it is easy to understand and implement. The algorithm is derived from the study of the foraging behavior of birds. Researchers found that birds often change direction, disperse and gather suddenly in the flight whose behavior is unpredictable, while they often maintain the overall consistency and the most suitable distance between individuals. Through the study on the behavior of similar biological groups, it is found that there is a social information sharing mechanism in biological groups providing an advantage for the evolution of groups which is also the basis of the PSO algorithm formation [31,32]. Each particle adjusts its flight according to the experience of its own and companions. The best position of the particle in the flight process is the optimal solution found by the particle itself. The best position of the whole group is the optimal solution found by the whole group. The former is called the personal best value (pBest), and the latter is called the global best value (gBest) [33]. In practice, the fitness determined by the optimization problem is used to evaluate the 'good or bad' degree of the particles. Each particle constantly updates itself with pBest and gBestto produce a new generation of population. Obviously, each particle in PSO can be regarded as a point in the solution space. The ith particle will update its velocity and position according to equation [7] and [8] when the population of particle swarm is N, the position of the ith (i=1, 2,K, n) particle is expressed as X i , the 'best' position it has experienced is denoted as pBest [i], its velocity is represented by V i , and the index number of the 'best' position particle is represented by g.
Where, c 1 and c 2 =learning factors (constant); rand () and Rand() =random numbers on [0,1]; w=inertia weight; The expression consists of three parts. The first part is the previous particle velocity which explains the current state of the particle and balances the global and local search ability; the second part is the cognition model, which represents the thinking of the particle itself and enables the particle to have enough global searching to avoid local minima; the third part is social modal reflecting the information sharing among particles. The three parts determine the spatial search ability of particles and under the joint action of these parts, the particles can reach the best position effectively. In addition, V i is limited by the maximum velocity V max when the particle adjusts its position according to the velocity. If V i exceeds V max , it will be limited to V max . The principle of CART optimized by PSO is shown in figure 9.
In the experiment, PSO was used to search for the index combination whose corresponding classier gets the best performance. The initial population of particle swarm was set as 5and 10 in turn, c 1 =c 2 =0.5, w=0.6, V max =0.8, T 1 (n) was taken as the fitness function; The maximum iteration was 31. The updated position of the where round (x) means to round x. During the iteration, the fitness of the model changes as shown in figure 10.
With the iteration increasing, the optimal fitness solution of DT gradually converges to the global optimal solution of 10.25, seen in figure 10. When the population is 5 and 10, the convergence is achieved in the 6th and 4th iteration respectively which is far less than the 31 cycles in the exhaustive method, indicating that the PSO algorithm has strong optimization ability. It is of great significance in practical application as the larger population we set will reach the higher the convergence efficiency. For many characteristic indexes of signal, there is no need to go through artificial comparison and screening. The index combination with classification ability can be determined efficiently by using PSO. With the input indexes increasing, the advantage of PSO to save computing costs will become more prominent. The DT with the best fitness is shown in figure 11(a) and its   recognition effect of 64 samples in the test set is shown in figure 11(b). It can be seen from figure 11(b) that among 64 samples, only one sample of class 1 (Group 21, structural changes) is mistakenly identified as the sample of class 3 (debonding) and the classification accuracy reaches 98.4%, which indicates that the method combining CART and PSO can effectively distinguish structural changes and sensor defects with low computing cost. With the model, four cases of sensor fault can also be classified accurately.

Summary and conclusions
To efficiently build the sensor fault classification model with the best performance, five indexes were used to extract the characteristics of conductance under three structural changes (bolt looseness, hole damage, local stress change) and four sensor self-faults (pseudo soldering, debonding, wear, and breakage). Combined with different indexes, the CART algorithm was adopted to build the classification DT. Then the performances of the model were evaluated based on the designed fitness function. Finally, the PSO algorithm was taken to optimize the search process of the best index combination. The specific conclusions are as follows: • The model trained with 2# and 4# indexes has higher accuracy than the model trained with all five indexes.
The reason may be the unrelated indexes for the model training introduce noise to the process of DT construction thus increasing the misclassification rate.
• PSO can simplify the search process of the best index combination and greatly reduce the computational cost. Compared with 2 5 −1 iteration of the exhaustive method, PSO with 5 and 10 initial particles converges to the optimal solution after only 6 and 4 iterations.
• The best fitness model which was trained with two indexes can correctly classify 64 samples with an accuracy of 98.4%. It shows the feasibility of using DT to identify and classify sensor damage.
The method combined CART and PSO has a high practical application value for the determination of the sensor faults classification with the best performance. It can also serve as a reference for other classification scenarios in structural health monitoring. However, the method is still unable to distinguish the sample under different structural changes and identify the degree of structural changes and faults which are worthy of further study in the future.