A data-driven fault detection and diagnosis method via just-in-time learning for unmanned ground vehicles

Fault detection and diagnosis technologies for unmanned ground vehicles are important for ensuring safety and reliability. Due to the complexity and uncertainty of unmanned ground vehicles, it is challenging to realize accurate and fast fault detection and diagnosis. For the purpose of solving the data-driven fault detection and diagnosis problems of unmanned ground vehicles, improving the diagnostic accuracy and shortening the training time, a novel fault detection and diagnosis method is proposed, which is called JITGP-ELM. In the proposed method, a model estimator based on the just-in-time Gaussian process is designed for the online residual generation to cope with the dynamics and nonlinearity of systems. A fault classifier using Extreme Learning Machine is designed for fault identification with residuals extracted by the just-in-time Gaussian process modelling. The proposed method has online adaptability, noise-resistant ability, and high generalization. A field test on a real unmanned ground vehicle's steering-by-wire system demonstrates the effectiveness of the proposed method.


Introduction
Due to the development of artificial intelligence and computing science in recent years, the level of autonomous driving of unmanned ground vehicles (UGVs) has been significantly promoted, followed by the increasing complexity of their systems. This advancement brings tremendous challenges to system safety and reliability. For highly intelligent UGVs, the consequences caused by their faults could be significant because they may even threaten passengers' lives. Therefore, fault detection and diagnosis (FDD) strategies for UGVs are getting more and more attention in the academic area and industrial applications to ensure reliability [1][2][3][4][5].
In the past decades, model-based and data-driven fault detection and diagnosis have gained increasing attention parallelly [6]. Model-based approaches have been prevalent in the fault detection and diagnosis field and have accomplished many achievements [3,[7][8][9]. The model-based approaches construct residual signals to reflect the inconsistencies between the expected and actual behaviours by a precise physical model and measurements of the system [3]. In [10], a model-based method is proposed for fault detection of vehicle suspension and hydraulic brake systems, and a rule-based approach is used to judge the fault from model residuals. In [11,12], the model-based sensor fault detection and isolation methods for UGVs are researched from theoretical and experimental aspects, respectively.
Şahin et al. [13] propose a fault detection and isolation method for the wheel slippage and actuator faults of a holonomic mobile robot with the robot model. A significant advantage of these methods is that they can provide a description of the dynamic behaviour of the system and a physical understanding of the system, and residuals can eliminate the nonlinearity and dynamics of the system. However, the model-based methods rely on the accurate physical model of systems, which is often difficult to obtain in practice.
Data-driven methods are applicable in practice without prior knowledge of systems [14][15][16]. Data-driven fault detection and diagnosis approaches mainly consist of multivariate statistical analysis, machine learning (ML), and signal processing. Multivariate statistical methods are popular for effectively analyzing the available signal in UGVs, such as principal component analysis (PCA) [17], partial least squares (PLSs) [18], and independent component analysis [19]. Still, the threshold-based methods will be difficult to deal with multi-fault diagnosis problems that are commonly encountered because these methods can only provide binary classification solutions. The ML-based diagnosis methods for fault classification are proposed to deal with multi-fault classification, such as FDD based on Back Propagation (BP) neural network [20], Support Vector Machine (SVM) [21], and Extreme Learning Machine (ELM) [22]. For UGVs, Lei et al. [23] select the extremely randomized trees model as the best classification model for refrigerant leak detection in heat pump systems of electric UGVs. In [24], a CNNbased fault detection and classification model is used during the V-cycle development process and applied to the gasoline engine system. Similarly, Wallace et al. [25] use a long short-term memory (LSTM) neural network to learn the fault classification boundaries for UGVs. The fault classifier methods based on ML can learn from historical data of the system under normal and various fault statuses, which take classification accuracy as the learning target and have had a wide range of applications. However, the generalization performance of ML-based methods is a concern because the classification accuracy is closely related to the completeness and representativeness of training data. Furthermore, the selection of fault features is also challenging for learning efficiency.
Because model-based and data-driven approaches have their own characteristics, the researchers proposed hybrid methods to improve the fault diagnosis algorithm and achieved impressive results. Tidriri et al. [26] examine the characteristics of both modelbased and data-driven approaches, as well as existing hybrid research, and point to the potential of the latter. Besides, Khorasgani et al. [27] demonstrate a unified framework of data-driven and model-based diagnosis methods, which are elaborated on three cascaded tasks of data acquisition, feature extraction and fault diagnosis. More specifically, Jung et al. [28] propose a hybrid diagnostic system design that combines model-based residuals with incremental exception classifiers to identify unknown and multiple faults. Furthermore, Jung et al. [29] design a residual selection algorithm in a hybrid approach to improve the performance of fault detection and isolation. For UGVs, a nominal model is established in [6], and the fault analysis for automotive antilock braking systems is carried out by combining the support vector machine. Wolf et al. [30] use the dynamic Bayesian network with residual generation to combine with the knowledge-based and modelbased approaches to realize fault detection, diagnosis and prognosis for autonomous vehicles. From existing works, the hybrid of model-based residual extraction and the use of residual as fault features to be the inputs of classifiers is a practical approach to improve the performance of fault detection and diagnosis. However, in these approaches, the prediction model is assumed as prior knowledge and will not be updated in the process. Therefore, in the presence of environmental changes or interference, the performance of residual extraction may deteriorate, which motivated our study.
Based on the above analysis, we propose a JITGP-ELM method for multi-fault detection and diagnosis for unmanned ground vehicles in this paper. In the proposed method, a residuals generator based on a just-in-time Gaussian process regression (JITGP) is designed to adapt to the dynamics and nonlinearity. Then, a fault classifier using Extreme Learning Machine is designed for fault diagnosis with residuals extracted by just-in-time Gaussian process modelling. The proposed approach adopts JITGP modelling, inheriting its advantages of online adaptation, noise immunity, and low computational cost. Furthermore, the ELM network has advantages in learning speed and generalization ability. In the fault detection and diagnosis problem on the steering-by-wire system of a UGV, the proposed method effectively provides an application solution. The comparison between the conventional methods and the proposed method is shown in Figure 1.
The main contribution of the work includes the following two aspects. First, a fault detection and diagnosis method named JITGP-ELM is proposed to realize adaptive residual extraction and high generalization fault status classification in UGVs. Compared with previous works [6,14,30], our approach is designed for realizing adaptive residual generation, and crucially, JITGP is utilized to improve the flexibility to environmental changes or interference with noise immunity. Second, experiments have been performed using our approach for solving a real-world fault detection and diagnosis problem under three fault statuses on a UGV steering-by-wire system. The experiment results show that the approach is practical and effective. Also, the proposed method is data-driven and does not rely on the physical model of the system. So another advantage of our work is that it could be extended to fault detection and diagnosis problems of other systems like ships or aircraft.
The organizational structure of this paper is as follows. In Section 3, a fault detection and diagnosis problem formulation for a dynamic nonlinear system and the GP model for system modelling are introduced. In Section 4, the JITGP-ELM algorithm is presented. In Section 5, an application to a UGV and the experimental results are provided to illustrate the effectiveness of the proposed JITGP-ELM algorithm for fault detection and diagnosis. Finally, conclusions are drawn in Section 6.
To realize the above idea, modelling methods mainly include physical formulation and ML-based modelling in the previous works. In the field of UGV fault diagnosis, the residual generators are mainly realized by physical modelling [6,[10][11][12] and offline data training [30,31] with the inability to adapt to changes in vehicle dynamics and environment, which is inevitable in the working process of UGVs. Different from previous works, our work considers the need for such adaptability. As a popular local learning method, JITL establishes local predictive models for each test sample based on similar samples selected from the dataset [32,33], which has the advantage of low computational cost. Nevertheless, conventional JITL with linear local models could be sensitive to noise in the small sample case, leading to a conflict between speediness and accuracy in practice. The just-in-time Gaussian process (JITGP) algorithm, whose local models are developed by the Gaussian process (GP) model instead of linear models, was used in [34][35][36] to predict industrial processes with noise immunity. Furthermore, the GP model can provide the prediction variance reflecting probabilistic information.
In the second half of the idea, a fault classifier based on machine learning is considered for multi-fault diagnosis. There are many standard classifiers, like BP [20], SVM [21], CNN [24], LSTM [25] and other machine learning methods [6,23,30] for multi-fault detection and diagnosis problems. Among them, the ELM was proposed in [37] with a structure of single-hidden layer feedforward neural networks (SLFNs), which is believed to have the ability to provide high generalization and training speed. Therefore, the JITGP-ELM is proposed in this paper to realize adaptive residual extraction and high generalization fault status classification for fault detection and diagnosis problems of UGVs.

Problem formulation and research backgrounds
This section will introduce a fault detection and diagnosis problem formulation for a time-delay dynamic nonlinear system. Then, a brief preliminary on Gaussian process regression for the process estimation with noise will be provided.

FDD problem for a dynamic nonlinear system
When a fault occurs in a system, the manifestation of the system will be changed by the fault. The changes could include the mapping between input and output, which can be indicated by adding a fault signal in the mapping function. The fault signal varies depending on the fault locations, types, or other properties, which can be described as fault statuses. To model a multi-input multi-output (MIMO) dynamic nonlinear system with a fault, the output is as follows: where consisted of control inputs and delay outputs at k, in which u = [u 1 , u 2 , . . . , u d ] T and y = [y 1 , y 2 , . . . , y m ] T are respectively the input and output vector, k y and k u respectively represent historical steps of y and u which constitute the state x(k), k d is the time delay of the process, k represents a consecutive number of sample, f fault represents an unknown fault function which could be in any form where the status represents a fault state discretely, and ε is white noise. In this paper, the FDD method aims to determine the fault status of the system using its measurable input and output.

System dynamics modelling with Gaussian process model
The Gaussian process model is a non-parametric model in Bayesian statistics, which has been used for dynamic system estimation in [38]. In general, the system modelling with GP model includes three phases as follows:

GP prior model
As a common type of priors over a function, the GP model assumes that f (x) is a Gaussian random variable N (μ, σ 2 ) at any point x, where μ and σ are independent constants. Thus, the joint multivariate Gaussian distribution over a set of variables can be expressed The C(·, ·) is a covariance function whose standard form is where D is the dimension of x, θ i > 0 are parameters denoting the importance of x i on f (x) and parameter v 1 controls the variational vertical scale.
For an unknown function following the GP prior probabilistic framework, where δ pq = 1 if p = q, 0 otherwise. More details about the GP model can be found in [38,39].

Hyperparameters optimization
Given N observation states x 1 , . . . , x N and their target outputs y 1 , . . . , y N , the probability distribution p(Y | X, ) gives a likelihood of training data [38]: where T is the vector of hyperparameters, and K ∈ R N×N is the training covariance matrix. The hyperparameters can be estimated by maximizing the likelihood: where the optimization needs to compute the partial derivative of L to each of the hyperparameters:

Prediction
Once the estimation of hyperparameters * is given, we can predict the output y * at any untested state x * based on the relevant state-output pairs (x i , y i ) for i = 1, . . . , N. Thus, the unbiased prediction of y * can be obtained by the predictive distribution p(y * | Y, X, x * ) = p(Y, y * )/p(Y | X) with mean and variance [38]: where k( T is the covariance vector between the test and training states and k(x * ) = C(x * , x * ) is the covariance between the test state and itself.

The JITGP-ELM approach
Aiming at the problem formulated in Section 3.1, we will introduce the proposed JITGP-ELM method in this section. Figure 2 shows the block diagram of the proposed method. The JITGP-ELM method includes two procedures: fault feature extraction by JITGP and fault status determination by ELM with the fault feature, where the fault feature is the estimation residuals generated by JITGP. In the following, we will first present the framework of JITGP-ELM, then the JITGP modelling algorithm for fault feature extraction, and the offline ELM training process.

Framework description of the JITGP-ELM approach
The proposed method is under the JITL scheme, which has three steps [40]. (1) Select a set of samples from the dataset based on some similarity criterion.
(2) Establish a local model with the selected samples to predict the process output. (3) When a new sample needs to be predicted, we build a new local model by repeating the above two steps. This paper uses the GP model to improve conventional JITL to develop noise-resistant estimation and obtain high-quality residuals as a fault feature.
In conventional methods, the fault status is determined by a threshold with residuals between estimation and observation. Thus, the threshold-based methods could with two problems: the improper design of the threshold value and incompetence for multi-fault diagnosis problems. In the JITGP-ELM approach, the residuals are analyzed by an ELM classifier model that deals with multi-fault diagnosis problems without thresholds and is entirely data-driven. After the process estimation by JITGP, the extracted residuals will be input to the ELM fault classifier model, whose output is the FDD decision of fault status, where the ELM model is trained offline.
Based on the previous discussion, the JITGP-ELM framework can be described in Algorithm 1. In the JITGP-ELM algorithm, the fault feature extraction is integrated into the JITGP algorithm, and the training process of the ELM-classifier is integrated into the ELM-training algorithm, which will be discussed in the following subsections.
In the JITGP-ELM, the updating GP model is for the residual feature generation by estimating the healthy system states accurately and adaptively. Then, the ELM is for the fault classification with the residual features. Therefore, it is considered that when the system state estimation is relatively accurate, the updated GP model will not have a great impact on the mapping relationship between residual features and fault types. Thus, the offline training ELM model will not be updated during the working process. However, after the process, we could retrain the ELM model with the collected data and the actual fault label.

JITGP-based fault feature extraction
While the original data increases the useful information for fault diagnosis, it also inevitably brings some redundant information. Redundant features will reduce the efficiency and accuracy of fault diagnosis. Thus, feature extraction is necessary to improve the fault diagnosis performance [22]. Many methods have been proposed for feature extraction in fault diagnosis. In [41], a two-stage feature selection method, namely Hybrid Distance Evaluation technique, was proposed to select a subset of combined features with strong classification ability. Luo et al. [22] extracts three types of fault features to compose the compound feature set based on ensemble empirical mode decomposition. In [14,42], the system dynamics and nonlinearity are circumvented using models, and the residuals will be analyzed using a multivariate statistical method as a fault feature to receive fault detection results. In this paper, JITGP serves as a process model to predict the output for generating residuals, which are used to be the fault feature.
In the JITGP, the GP model is used to establish the local models instead of the autoregressive model (ARX), which is frequently used by the conventional JITL methods [14,34]. In conjunction with the JITL, the computational cost of the GP model is significantly reduced due to the presence of the sample selection step. The details of JITGP-based fault feature extraction are discussed below, including relevant sample selection with a similarity estimation, local model prediction with GP, and a residual vector will be generated according to the output estimation and observation at the last step.

Relevant sample selection
In JITGP, a similarity evaluation is used to be the criterion to select the most relevant samples to the current sample, which needs to be predicted. With reference to [14,33], the similarity between the current sample x q and a historical sample x i in a dataset is measured as where γ ∈ [0, 1] is a weight parameter, α i represents the angle between x q and x i , in which x q = x q − x q−1 and x i = x i − x i−1 ; S i ∈ [0, 1] represents the similarity and the larger S i denotes the higher similarity between x q and x i . According to [14], it should be noted that the cos(α i ) in (8) has to be larger than 0, which means that the angle α i < 90 • . If α i > 90 • , the two vectors will be considered dissimilar, and the corresponding x i will not be taken into account when establishing a local model. After the similarity evaluation, a subset of data can be constructed with the relevant data (x k , y k ) for k = 1, . . . , n with the biggest S i , where n is the number of samples to establish a just-in-time local model.

Local model prediction with GP
Consider an untested state x needs to be predicted with tested samples (x k , y k ) for k = 1, . . . , n. To simplify modelling computation, we assume that the various components of y at the current output are independent of each other. Therefore, the prediction of y = f (x) + ε can be decomposed into sub-problems: y i = f i (x) + ε i , that is a component estimation of y. According to Section 3.2, the prediction of y i can be obtained byŷ i = μ i (x) from (6) with mean squared error (7), after the GP modelling and hyperparameters optimization. Thus, the prediction of y can be obtained:ŷ = [ŷ 1 , . . . ,ŷ m ] T .

Residuals generation
To generate residuals as the fault feature for fault detection and diagnosis, the JITGP simulates the nonlinear and dynamic behaviour of the actual process. The residuals are the difference between the output estimation and the actual observation, which has eliminated the dynamics and nonlinearity of the process and extracted the essential information of faults. Thus, a residual is generated by the following equation: where y andŷ is the actual and predictive output of the current state, respectively. Because of the high computational cost of repeatedly applying GP regression, the proposed method reduces the computation cost by running the GP modelling in the framework of just-in-time learning, i.e. JITGP. In the framework of JITGP, the GP modelling is based on a small dataset by the sample selection process. The implementation of the techniques discussed above requires inversions of the covariance matrix K, with a computational complexity of O(n 3 ) and a memory complexity of O(n 2 ) with n-size training data. Here, the computational cost could be high since the requirement of the inversion of K in (5) [43]. This issue can be addressed with the natural advantage of just-in-time learning. Due to n N, i.e. the number of training data n is far less than the amount of data in the dataset N, the computational load for matrix inversion of K will be greatly reduced. Thus, the algorithm can better meet the requirement of real-time, which is one of the superiorities of JITGP modelling. Calculate the similarity S i between x and x i by (8); 3: if cos(α i ) ≤ 0 then 4: S i = 0 end 5: Construct a subset S n with n nearest neighbours from S N according to {S i } ; 6: Construct a GP prior model with the subset S n ; 7: Obtain the optimal hyperparameters * from (4); 8: Calculateŷ corresponding x by (6) with the subset S n and * ; 9: Calculate the residual r by (9); Output: the estimated residual r.
Based on the above discussions, the JITGP-based fault feature extraction algorithm can be summarized in Algorithm 2. In order to establish the JITGP dataset, a large amount of offline fault-free data is required. When a current sample demands a prediction, the most relevant data in the dataset are found for building a local model based on the measure of similarity S i from (8). The hyperparameters of GPR are optimized by (4) to build the local model. Then, the output prediction of the current sample is obtained by (6), and a residual could be computed by (9).
Since the JITGP is used to extract the residual features by estimating the output of the health system whose dynamics model is unknown, the GP model in the JITGP should be built based on fault-free data by selecting relevant data in the fault-free dataset. The fault-free dataset will be updated when judging that the system sensor is not faulty, so the GP model in the proposed method is updated based on an updated fault-free dataset for coping with the dynamic change of the system. Furthermore, compared to building a GP model based on a whole fault-free dataset, the selection of samples in just-in-time learning could reduce the computational burden because of the reduction in training data.
After that, a fault classification model will make the fault decision by analyzing residuals. Theoretically, the fault diagnosis can be obtained by a classifier, such as BP or SVM, when it is appropriate. Here, We use ELM as the fault classifier because of some of its excellent characteristics.

ELM fault classifier training
ELM is a single-hidden layer feedforward neural network whose hidden layer parameters are generated in an arbitrary way [37]. Compared with other learning methods, such as Single Layer Perceptron and SVM, ELM is believed to have advantages in learning speed and generalization ability. Figure 3 shows the network structure of ELM.
With an m-dimensional input r, the output of an SLFN iŝ where L represents the number of hidden nodes, w i = [w i1 , . . . , w im ] T is the input weight vector connecting the input and the ith hidden node, β i = [β i1 , . . . , β io ] T is the output weight vector of ith hidden node in which o is the number of the output nodes, g(·) denotes the activation function of the hidden layer, b i denotes the threshold of the ith hidden node, and h i (r) is the output of ith hidden node with input r. Thus, given N samples (10) format which can be compacted as where According to [37], H is the hidden layer output matrix, in which H ij is the jth hidden node output with input r i , that can be calculated after assigning random hidden nodes parameters (w j , b j ). According to [37,44], the smallest norm least squares solution of (11) can be computed asβ where H † is the Moore-Penrose generalized inverse of H. Thus, the ELM algorithm can be summarized as follows: For classification, the output of ELM could have o nodes with o equal to the number of classes. If the class label is k, the expected output vector will be t =  Therefore, the major steps for training the ELM classifier based on residual extraction are summarized in Algorithm 3. In order to train the ELM network, the the state-output data under normal and various fault conditions, and T f is the corresponding target label of fault statuses. Then, the training dataset [R f , T f ] is constructed by combining process residuals R f calculated by JITGP, with the corresponding fault status label T f . After determining the parameters and activation function, the ELM training process will then complete and obtain a fault classifier.

Experimental verification
In order to evaluate the performance of the proposed method, the following experiments were completed on fault detection and diagnosis problem for the steeringby-wire system in a real UGV. The experimental platform is a wheeled electric vehicle with a steeringby-wire system, as shown in Figure 5(a). As a new generation of the steering system, the steering-by-wire system achieves autonomous steering, relying on the electrical system instead of the traditional mechanical transmission mechanism that becomes the basis and guarantee of assisted or unmanned driving. However, faults could occur in the sensors and actuators that may affect the reliability and safety of the electrical systems. The experiment considers the fault detection and diagnosis for the steering wheel angle sensor with yaw rate and lateral acceleration sensors' signals in unmanned ground vehicles (Figure 4).

Data preparation
The experimental design of fault detection and diagnosis includes three sensor faults, which of steering wheel angle, yaw rate, and lateral acceleration. The input of the system is the steering wheel angle (SW), and the outputs are the yaw rate (YR) and the lateral acceleration (LA). The purpose is to detect and identify different sensor faults by analyzing the observable manifestation of the UGV. Due to the requirement to estimate the yaw rate and the lateral acceleration from observing the steering wheel angle, a set of data that can reflect the UGV's dynamics and nonlinearity needs to be collected for Algorithm 3 ELM fault classifier training based on JITGP: ELM-training(S f ) Dataset require: S f = [X f , Y f , T f ] Initialization: set ELM hidden layer nodes number L; set a residual dataset R f with empty; select an activation function g(·) that is infinitely differentiable. 1: for each x q ∈ X f , y q ∈ Y f do 2: r q = JITGP(x q , y q ) 3: Store r q in the dataset R f ; 4: end 5  the proposed data-driven method. The location of the experimental data collection and the trajectory of the vehicle test is shown in Figure 5(b), collecting a total of 10,800 samples of input-output data with a frequency of 10 Hz. Furthermore, the data on sharp turns, braking, and accelerating, was collected into the dataset to ensure the persistence of excitation. When a sensor in the system failed, the sensor's signal was a white noise close to zero.
From the collected data of 10,800 samples, the first 6000 samples were chosen to construct the fault-free dataset, which supported the online JITGP algorithm to build local models. The training dataset of fault classification was constructed from the 6001st to 10,000th sample, which consisted of 4000 samples. These 4000 samples were divided into four categories, with 1000 samples in each category. The categories' labels were the fault statuses, which were fault-free, fault 1 (yaw rate sensor fault), fault 2 (lateral acceleration sensor fault), and fault 3 (steering wheel angle sensor fault), respectively. Finally, the remaining 800 samples were used to form a test set for algorithm testing, which was also divided into four groups with 200 samples in each group, respectively corresponding to the above four fault statuses.

Fault feature extraction of steering-by-wire system
The experiments were conducted in the test set of 800 samples. In order to compare the estimation abilities of JITGP and conventional JITL, Figure 6 shows the output estimations of yaw rate and lateral acceleration for 200 fault-free samples. The JITL's local models are developed by conventional ARX models, while the GP model develops the local model of JITGP. The principle of parameter selection is to minimize the cost of calculation to ensure the performance of the algorithm. The parameters were set as follows: the amount of relevant data for JITGP and JITL local model, n = 15, using a small sample for speediness; the number of steps constructing local model states, l = 2. The estimated results comparison between conventional JITL and JITGP under the fault-free condition are shown in Figure 6. Figure 6(a) shows the estimations of yaw rate where the mean-square errors of JITL-ARX and JITGP are 1.141 and 0.779, respectively. Both perform well in estimating the yaw rate when the actual observation curve is relatively smooth, where JITGP estimation curve has fewer fluctuations than JITL-ARX. However, the actual observation curve of lateral acceleration shook violently due to the uneven road surface. Figure 6(b) shows the estimations of lateral acceleration where the mean-square errors of JITL-ARX and JITGP are 0.816 and 0.173, respectively. When estimating the lateral acceleration, the performance of JITL-ARX is unsatisfactory, and JITGP still maintains good performance. The results show that JITL-ARX is sensitive to noise and interference, while JITGP can obtain a smoothing and close estimated result with the observation. Hence, the estimation results show that JITGP improves conventional JITL effectively.
Furthermore, 800 samples were divided into four groups to simulate different fault statuses. In the simulation, the 201st-400th samples (20-40 s) Figures 7-9. From the residual results, we can find that if a fault occurs to one of the sensors of yaw rate and lateral acceleration, the residuals between the observations and estimations of corresponding output will increase (Figures 7 and 8). Another case is when a fault occurs to the steering wheel angle's sensor, which means that the input of the predictive model is a fallacious value observed by the sensor with fault. In this case, the two outputs' residuals will both increase ( Figure 9). The residual results in Figures 7-9 show that the faults and fault-free characteristics are separated more obviously with JITGP than with JITL-ARX, which is even more pronounced in lateral acceleration residuals. Predictive precision is the essential factor affecting the quality of residual information, and the extracted features are beneficial to the fault classification task in the following procedure.

Fault diagnosis based on JITL modelling
After modelling the system with JITLs, the residuals are generated as fault features, and the following work uses the extracted fault features for fault diagnosis. Six methods are BP, ELM, JITL-BP, JITL-ELM, JITGP-BP, and JITGP-ELM, whose performance was compared by being tested on the fault detection and diagnosis problem mentioned above. Among them, The BP and ELM networks were compared with fault classification, where the fault features to train the networks differed. In the JITL-based and JITGP-based methods, the fault features are residuals generated by the JITL-ARX and JITGP output estimations, respectively. On the other hand, the fault feature in BP and ELM only was original input and output data.
The parameters in algorithms should be set consistently in order to compare the performance of the algorithm. For the three-layer networks, there is a relationship between the number n 1 of input layer nodes and the number n 2 of hidden layer nodes: n hadden = 2 × n input + 1. Thus, if we consider 10 historical steps, the network structures of BP and ELM are 20-41-4 for the residual-based methods and 30-61-4 for the direct method. The network performance function of BP is the mean square error. The learning algorithm adopted the steepest descent method, and the error target of learning was 0.001. The transfer function of the hidden and output layer are the hyperbolic tangent S-type transfer function (tansig) and the linear transfer function (purelin), respectively. And the activation function used in ELM is a simple sigmoid function g(x) = 1/(1 + exp(−x)). Before training the networks, all data samples need to be normalized to avoid the  error of fault diagnosis structure caused by the differences between different data sets.
The comparison of methods' accuracy of fault diagnosis based on JITLs is shown in Figure 10 [20] and ELM [22] are existing fault classifier methods. The inputs of these methods are UGV state values, and outputs are fault status. The results show that the accuracy of methods in the proposed scheme is higher than methods in the conventional scheme. Furthermore, the JITGP-ELM has the best performance among methods in the proposed scheme. The experimental results show that BP only  is better than ELM, and ELM is better than BP when analyzing residuals as classification input. In the case of BP and ELM only, information needs to be extracted from more complex data in the classification process. BP has more parameters to optimize in this case, which is more conducive to extracting fault information from the original data. In the case with residuals as fault features extracted by JITL and JITGP, the fault features are more pronounced because the residuals have eliminated the system dynamics and nonlinearity; in this case of small samples, BP may suffer overfitting, while ELM has an advantage in the generalization ability to prevent overfitting. The proposed method has a mean accuracy of 99.46%, which is 1.68% higher than 97.78% for JITL-ELM. The main reason is that the ELM has a high generalization to deal with the noisy residual features, which illustrates the significance of the selection of the fault classifier. With the ELM as the fault classifier, the proposed method has further performed the highest accuracy diagnosis with the advantage of the fault feature. Figure 11 shows the fault diagnosis results of JITGP-ELM, and it can be seen that the residual is valid as a fault feature. The high-quality fault features depend on the accuracy of the system's prediction, and JITGP provides this online adaptive prediction at a low computational cost. The separation of fault features influences FDD performance significantly, and the residual as fault feature is effective for fault diagnosis. In summary, the data-driven fault detection and diagnosis method proposed in this paper has characteristic high classification accuracy. The diagnostic accuracy of the test set is 99.46%. Two samples of fault 2 are miss diagnostics, and two samples of fault 3 are wrongly diagnosed as fault 1.

Comparison with the data-driven diagnosis approaches
To further prove the reliability novel method, it is necessary to compare it with the existing algorithms. One-against-one SVM [21], BP neural network [20], and ELM [22] were selected as the control group in these algorithms. To ensure the fairness of algorithm comparison, the novel method and the control group were tested with the same data set of fault diagnosis and the same computer (CPU at 3.2 GHz, RAM at 16 GB) environment. The parameters in algorithms should be set consistently. For the three-layer networks of BP and ELM, the network structures are 20-41-4 for the proposed scheme (ELM in JITGP-ELM) and 30-61-4 for the direct method (BP and ELM) with the n hadden = 2 × n input + 1 principle. For ELM, the activation function is the sigmoid function. For BP, the transfer function of the hidden and output layer are the hyperbolic tangent S-type transfer function (tansig) and the linear transfer function (purelin), respectively; the network performance function is the mean square error (MSE); the learning algorithm adopts the steepest descent method; the error target of learning is set as 0.001; the learning rate is set as 0.02; the neural weights are initialized with random values in the range [−0.1, 0.1]; the maximum epochs of learning is set as 1000; the learning rate is set as 0.1. For SVM, there are six oneagainst-one models with the same parameters. For each of them, the kernel function is the radial basis function (RBF); the best penalty parameter c and the best kernel parameter g are searched by the cross-validation approach; the error target of learning is set as 0.001; the learning rate is 0.02. The training set and test set are consistent. The input and output of these models are 10 historical steps of vehicle state data and fault status, respectively. The classifier directly classifies these data in the direct method (ELM, SVM and BP), where the features are the states of the UGV, i.e. the data of steering wheel angle, yaw rate and lateral acceleration. In the proposed method, yaw rate and lateral acceleration are predicted according to steering wheel angle so that residuals are generated, and then residual features are used for fault classification.
The comparison of the results of four kinds of diagnosis methods is given in Table 1. The performances are expressed by the means and standard deviations(if have) of the results in 30 experiments. The precision of a fault diagnosis method is the ratio of the samples with correctly predicted a fault to the total number of detections of this fault; the recall is the ratio of the samples with correctly predicted a fault to the total number of samples containing the fault. From Table 1, the proposed method has the highest precisions and recalls, except the precision of fault 1, among the comparison methods. The precision indicates the reliability of a method when it announces a fault, and the recall indicates the reliability of detecting the faults. Moreover, the proposed method has the highest diagnostic accuracy of test samples, exceeding other methods by more than 10%. The reason is that the action of fault feature extraction by JITGP effectively enhances the fault diagnosis, while others were original input and output signals. The comparison shows that the proposed method benefits the fault diagnosis of the steering-by-wire system. The BP neural network has the longest training time, while the proposed method has the shortest. Because the hidden layer bias matrix and input weight matrix of the ELM algorithm is generated randomly, the problems of repeatedly training and modification of connection weights and thresholds are avoided. Therefore, the ELM algorithm can significantly reduce network training time and has a high training speed. Moreover, the proposed method requires fewer neurons than ELM because of the reduction in fault feature dimensions. The proposed method's running time is the longest because of the action of extracting residuals as fault features. However, the running time of the proposed method is 0.036 s and not more than 0.1 s; that is, it can meet the real-time requirement of 10Hz. The practical consequences show that the JITGP-ELM devised in this paper can significantly improve recognition accuracy while ensuring real-time computation.

Conclusion
This paper proposes a data-driven fault detection and diagnosis method called JITGP-ELM for unmanned ground vehicles with unknown nonlinear dynamics. In the proposed method, the model estimator based on the just-in-time Gaussian process is designed for online residuals generation to cope with the dynamics and nonlinearity, which has online adaptability and noiseresistant ability. Based on the model estimation residuals, the fault classifier using Extreme Learning Machine is then designed for fault identification. The proposed method can solve multi-fault diagnosis problems without needing to determine the residual thresholds, which combines fault detection, isolation, and identification to improve the efficiency of the diagnosis framework. Finally, the proposed method is tested in a real UGV's steering-by-wire system with sensor faults, and the test results show its effectiveness. For future works, faulttolerant control based on the proposed JITGP-ELM method can be considered.

Funding
This work is supported by National Natural Science Foundation of China under Grants 61825305, U21A20518.

Disclosure statement
No potential conflict of interest was reported by the author(s).