An Enhanced Ensemble Learning based Fault Detection and Diagnosis for Grid-Connected PV Systems

The area of ensemble learning has gained a wide attention from the scientific research community. Ensemble methods are techniques that aim to improve the accuracy of results in models by combining multiple models instead of using a single model. The objective of this article is to develop intelligent fault detection and diagnosis (FDD) frameworks in order to ensure the high-performance operation of Grid-Connected Photovoltaic (PV) systems based on improved ensemble learning approaches. Therefore, three ensemble learning-based fault detection and diagnosis techniques for Grid-Connected PV systems are proposed. First, an ensemble learning (EL) technique that combines predictions from Support Vector Machine (SVM), K-Nearest Neighbour (KNN), and Decision Tree (DT) is presented. The developed method will contribute to the reduction of the overall diagnosis error and will have the ability to combine various models. However, classical ensemble models ignore the time-dependence of PV measurements. In addition, the PV system data are frequently time-correlated. Accordingly, in the current work, the dynamic and multivariate nature of the measurements will be considered when designing the prediction models by using multivariate and dynamic techniques. To do these, kernel PCA (KPCA)-based EL and reduced KPCA (RKPCA)-based EL classifiers are developed. The two proposed techniques are addressed so that the features extraction and selection phases are performed using the KPCA and RKPCA models and the sensitive and significant characteristics are transmitted to the EL model for classification purposes. The presented results prove that the proposed methods offer enhanced diagnosis performances when applied to PV systems.

Grid-Connected Photovoltaic (GCPV) systems have been receiving an increased interest during the last decade [1]- [3]. However, the operation of these systems is generally accompanied by different types of failures due to harsh environmental conditions or internal malfunctions. These failures (i.e. open-circuit/short-circuit faults, shading effects, inverter fault, grid-connection fault [4], [5]) might cause serious physical damage, present a risk of fire, and affect the efficiency of the solar modules and electrical power generation [6]. Therefore, the implementation of fault detec-tion and diagnosis (FDD) techniques is becoming mandatory in ensuring safe and uninterruptible operation of Grid-Connected PV systems with low maintenance cost [7]- [9]. In recent years, several computational FDD techniques based on machine learning (ML) techniques have been successfully applied for PV systems [10], [11]. The ML techniques include Support Vector Machines (SVM) [12], Naive Bayes (NB) [13], K-Nearest Neighbors (KNN) [14], Decision Tree (DT) [15], Random Forest (RF) [16], discriminant analysis (DA) [17] and artificial neural networks (ANN) [18]- [21]. These techniques were reported to have performed better VOLUME 4, 2016 in fault detection and diagnosis of industrial systems than conventional techniques like linear regression [22]. SVM has been first introduced by Vapnik [12]. The main idea of SVM is to map the training data from the input space into a higherdimensional feature space via a mapping function and then apply linear SVM in this space. SVM classifier seeks to find an optimal separating hyper-plane as the decision plane by maximizing the margin between two classes. SVM works relatively well when there is a clear margin of separation between classes and it is memory efficient [23]. K-Nearest-Neighbor (KNN) is a widely-used parametric classifier because of its simplicity and effectiveness. KNN algorithm does not need any training before making predictions, new data can be added seamlessly, which will not affect the accuracy of the algorithm [23]. A DT is constructed by the division of the dataset into smaller subsets until no further splitting can be implemented (unless a limit for splitting is set). DT method can handle both continuous and categorical variables and it can automatically handle missing values [24]. In the last two decades, several fault diagnosis techniques based on ensemble learning techniques are proposed. The ensemble learning (EL) approach has gained significant attention and it is becoming more and more popular [25]- [27]. The main idea behind EL algorithms is to improve machine learning results by correctly combining several models into one predictive model in order to become more accurate and robust [25], [28]. The widely known ensemble learning techniques include bagging, boosting, random subspace, and stacking. EL seeks to decrease variance (bagging), bias (boosting) and improve predictions (stacking) [29]. Recently, EL techniques have been successfully used in monitoring processes in several sectors, e.g., chemical industry [30], pharmacology [31], energy [32], finance [29], agriculture [33] and many others. The need for developed better ensemble learning models for fault diagnosis of PV systems has become more and more important in the research area. Therefore, several EL techniques have been explored in recent years. Generally, bagging and boosting are the most common methods used in the literature [34]. Despite the proven performances of numerous works that use ensemble learning techniques, most of these methods use only a specific type of classifier. Additionally, another main drawback of the existing FDD techniques based on ensemble learning methods is the direct use of the raw information from the process data. To overcome this challenge, several FDD techniques based on features extraction and selection step using a single classifier are proposed in the literature [35], [36]. Based on the above-mentioned discussion, This paper exclusively focuses on the FDD problem for Grid-Connected PV systems. The main contributions are threefold: In the first stage, ensemble learning includes SVM, DT and KNN will be proposed to distinguish between the different PV system operating modes using the extracted raw data. In the second stage, in order to overcome the limitations of the proposed ensemble learning technique due to the direct use of the raw data, an intelligent framework based on features extraction and selection step using the KPCA technique will be developed. In effect, combining KPCA and ensemble learning models could improve the performance of FDD and more specifically the decision-making accuracy. In the final stage, to improve the use of kernel PCA in terms of computation time and storage cost, a reduced extension will be proposed. To summarize, an enhanced machine learning technique for FDD will be developed. The novel technique optimally merges ensemble learning methods and multivariate statistical analysis (KPCA and reduced KPCA) to achieve an overall improved accuracy.
The remainder of this paper is outlined as follows: Section I presents the list of abbreviations and acronyms. Section II introduces the paper. Section III presents a review of related works. A brief overview of EL and some ML techniques is given in Section IV. Section V presents the proposed techniques are presented in section V. Section VI describes the validation of the proposed techniques. At last, some conclusions and future research directions are presented in section VII.

III. RELATED WORK
Fault diagnosis in GCPV systems become more and more important to ensure optimal energy harvesting, low maintenance cost and reliable power production. Several FDD using powerful machine learning (ML) and ensemble ML techniques in PV systems have been proposed in the literature to improve their reliability and performance. In [37], a technique based on SVM approach for the classification of islanding and grid fault events in LV distribution grid is proposed. For instance, an artificial neural network (ANN) was developed in [38], [39]. The main idea of this proposal is to detect partial shading faults. Besides, the authors used a three-layer feed-forward ANN to detect short-circuit faults in PV arrays [38]. In [40], the authors developed a fast FDD based on the generalized local likelihood ratio test. In other studies, a diagnostic method based on two convolutional neural networks (CNN) has been proposed for fault classification in PV array [41]. In [42], the authors presented a fault diagnosis method for PV arrays using an extreme gradient boosting (XGboost) classifier. The developed technique is based on the string current, array voltage, temperature and irradiance measurements. Moreover, many works have attempted to use K-Nearest Neighbor (KNN) technique for fault classification purposes [43], [44]. The KNN classifier is a non-parametric technique that does not rely on the construction of a model during the training phase, and whose classification rule is based on a given similarity function between the training and the testing samples [44]. In [10], a new design of parity relation-based residual generator for fault detection method is proposed. The authors employed an iterative procedure that guarantees minimal regression error in the search for the optimal parameters to deal with linear and nonlinear systems. In [11], a fault diagnosis technique based on Semisupervised Ladder Network With String Voltage and Current Measures is developed. A data-driven-based FDD approach was introduced in [45]. The proposed technique consists of monitoring the nonlinear processes based on the available sensing measurements only using a locally weighted projection regression (LWPR) for the partition of the input space and modified principal component analysis (MPCA) for fault detection. In [25], the authors propose ensemble machine learning (EML) algorithms to detect a series DC arc fault in a modern electrical system using local DC distribution. In [28], an ensemble learning technique that incorporates support vector machine (SVM), k-nearest neighbor (KNN), logistic regression (LR), decision tree (DT), and random forest (RF) is developed in order to diagnose faults in refrigeration systems. Bagging is one of the most well-known and successful ensemble learning techniques. Recently, many ensemble learning (ML) based Bagging techniques were developed [46], [47]. In [46], an enhanced bagging (eBagging) method is presented. The main idea behind this proposal is to use a new mechanism (error-based bootstrapping) instead of traditional random bootstrap technique when constructing training sets. A bagging based multi-objective differential evolution algorithm (MODE) with multiple sub-populations (BagMPMODE) was proposed in [47]. This technique consists in incorporating the idea of bagging into the evolution process of MODE. For instance, data-driven approaches like principal component analysis (PCA) have been widely used for feature extraction and selection [36], [48]. In [36], an improved FDD technique was proposed by using the PCA technique for multivariate features extraction and selection, and single machine learning classifiers for faults classification. However, the PCA-based diagnosis technique has been only developed for linear systems while popular complex systems exhibit strong nonlinear correlations between their variables. Recently, Kernel PCA methods (KPCA) have been proposed to address the nonlinear relationships between process variables [49], [50]. The basic idea of the KPCA technique consists of (i) mapping the input data onto the feature space via a nonlinear kernel function, and (ii) perform PCA into a feature space. Although kernel PCA can extract nonlinear features in a high-dimensional space, it increases the space and time complexity compared to the PCA [51].

IV. PRELIMINARIES
In this section, we present the details of machine learning algorithms and ensemble learning techniques used in this work.

A. ENSEMBLE TECHNIQUES
The main idea behind multiple learning is to combine several models into a meta-algorithm in order to improve the classification results of any FDD techniques [28]. The ensemble learning methodology is based on three phases. The first one, member generation phase, consists of manipulating the training sets and building models with different learning algorithms. The second one, member selection phase, consists of selecting just models that are suitable for the prediction task. The third one, member combination phase, consists of combining the outputs of multiple classifiers into a final prediction [52], [53]. Besides, there are three steps to contribute to the task which require multiple classifiers. The existing steps are i) combining classifiers by deciding using different opinions ii) cooperating classifiers using one or more opinions iii) selecting classifiers by giving more importance to one or more classifiers according to various criteria like basic ensemble techniques. To combine the outputs of multiple classifiers into a final and more effective prediction, we use different basic ensemble techniques like average, weighted average, majority voting, and weighted majority voting [54]. There are three advanced EL techniques to combine machine learning classifiers which are bagging, boosting, and Random Subspace [55], [56]. Next, we present a brief discussion of the advanced combination techniques for ensemble learning.

1) Bagging
The basic idea behind the bagging method is to combine bootstrapping and aggregation ( decision trees ) to get a generalized result. Bagging technique is mainly applied in classification and regression. It reduces variance to a large extent by increasing the accuracy of models through decision trees in order to increases accuracy which is a challenge to many predictive models [57], [58]. Bootstrapping is a sampling technique with replacement that gives the selection procedure the particularity of being random. Aggregation in bagging makes predictions accurate taking into account all possible outcomes. Thus, aggregation is performed to incorporate all possible outcomes of the prediction and randomize the outcome. The main advantages of bagging are the elimination of any variance and the reduction of model overfitting since it creates several classifiers with fixed bias and combines their outputs by averaging. This technique is powerful when the characteristics of the data have high variance and low bias. The main disadvantage of bagging is VOLUME 4, 2016 the expensive calculation which can lead to more bias in the models when the proper bagging procedure is ignored. [25]. drawback of bagging is its random selection

2) Boosting
Boosting is a meta-algorithm that learns from precedent predictor mistakes to perform better predictions in the future [57], [58]. The main idea behind boosting technique is that each of the single models improves the performance of the ensemble. By boosting, every successive model depends on the preceding model where the errors of the previous model are corrected by each successive model in order to decrease the model's bias and to form one strong learner. Hence, the technique combines several weak learners to form one strong learner and so improves the predictability of models. Boosting takes many forms, including Adaptive Boosting (AdaBoost), gradient boosting, and Extreme Gradient Boosting (XGBoost). The boosting method is more reliable when the characteristics of the data have high bias and low variance [25].

3) Random Subspace
The random subspace method is an ensemble learning method that has a role to reduce the correlation between estimators in an ensemble by training them in feature space as a random sample instead of the entire feature set. Random Subspace can be presented in three steps: the first step is to select N subsets containing M features selected at random from F features, the second step is to train N weak learners using each random subset and the last step is to perform a prediction by majority vote.

1) Support vector machines (SVM)
Support vector machine (SVM) is one of the most powerful classification algorithms and it has been widely applied for fault diagnosis [59], [60]. The main idea of SVM is to map the training data from the input space into a higher-dimensional feature space via a mapping function and then apply linear SVM in this space. SVM classifier seeks to find an optimal separating hyper-plane as the decision plane by maximizing the margin between two classes. Consider a given training set of N samples where w is a weight vector and b denotes the bias vector.
The parameters w and b can be determined by solving a constrained optimization problem as, By introducing Lagrangian multipliers, can be rewritten as, where α i denote the Lagrange coefficients. As a result, the decision function can be obtained as follows: where x denotes the input vector to be classified.

2) K-Nearest Neighbors (KNN)
K-Nearest-Neighbor (KNN) is among the most models used for classification thanks to her performances and simplicity [61]. The main idea behind the KNN technique is to find the nearest neighbors for a given data based on some distance metric of interest [62], [63]. kNN is a nonparametric method used to identify in which class, already known, unknown data belong to it. To determine the KNN class, the Euclidean distance is used as follows, Consider that the elements of known class are and those of the data to be classified are y = [y 1 y 2 . . . y k ]. To define the distance between two samples, the Euclidean distance is used and it is defined as, Then, a class is assigned at which the distance defined as in Eq.5 is minimal.

3) Model-tree
Tree-based ML techniques are among the mostly used nonlinear models in many applications, where the Random Forest (RF) and Decision Tree (DT) are the most popular ones (they can be more accurate than neural networks) [64]. The goal of decision tree (DT) is to create a model that predicts the value of a target variable. DT model use two nodes, which are the decision node and leaf node [65]. Decision nodes have multiple branches and they are used to make any decision, while leaf nodes are the output of these decisions. The main idea behind the RF algorithm is to use a combination of randomized trees and make the prediction by a majority vote between all the produced decision trees [66] V

. RKPCA FOR FEATURES EXTRACTION AND SELECTION
KPCA method consists to map data into a feature space via a nonlinear mapping and then to calculate the kernel principal components (KPCs) [50]. Moreover, the advantage of using nonlinear kernel functions and integral operators allows KPCA to determine effectively the KPCs in the feature space.
However, KPCA is not very effective when a large number of variables are recorded. Therefore, the computational times increases and the storage cost become important. To overcome this challenging problem, we propose to use a model relationships between variables via a data-reduction framework.
Let us consider a data matrix where m corresponds to the number of process variables and N represents the number of samples, The basic idea behind the proposed reduced KPCA (RKPCA) method is to extract a reduced number of observations (samples) between the m measurement variables such that the preserved observations have more relevant data information and by turn it is used as a new data matrix. To extract the most pertinent samples from data, Euclidean distance metric will be used. The Euclidean distance q ij between the rows X i and X j of the data matrix X is given by Then, dissimilarity matrix Q which contains the measurement of dissimilarity between all pairs of the observations is presented as follows, Thus, the new reduced data matrix X is defined as where N is the size of the reduced training data matrix. the basic idea behind the RKPCA method is to apply the KPCA model in the reduced data matrix. 1) Feature extraction using RKPCA let consider a reduced data matrix [X ]. The mapped data in the feature space is arranged as h >> m is the dimension of the feature space. Using a kernel matrix whose elements are the kernel principal components (KP C s ) can be computed using the following eigenvector expression: where α and λ are the eigenvector and eigenvalue of the kernel matrix K. The kernel matrix K of interval valued data is expressed as: The eigenvector of the kernel matrix is given by is given by [67], The matrix of the principal eigenvectors of [K] representing the largest eigenvalues Λ = diag {λ 1 , ...., λ } is given by, where α is the eigenvector of the matrix [K] and λ is its corresponding eigenvalue. Then, the kernel principal components are defined as, [67], Additional to the first KPCs, squared prediction error (SPE) statistic, Hotelling's T 2 statistic and combined index ϕ, are used to choose the final effective features [68]. The statistical features are determined as follows: τ T 2 α and τ SP E α represent thresholds of T 2 and SP E at the confidence level α, respectively.
where F α ( , N r − ) an F-distribution with and N r − degrees of freedom. where , with a and b are the mean and variance of the SP E index, respectively.

A. PROPOSED METHODOLOGIES
The main objective of this paper is to develop a hybrid approach for fault diagnosis of grid-connected PV (GCPV) systems. The proposal methodology combines an ensemble learning technique and an improved data-driven method with dataset size reduction. Ensemble learning helps improve machine learning results by combining several models and it has already proven to be a powerful technique for creating classifiers. For this reason, we used three base learning classification techniques include Support Vector Machines (SVM), K-Nearest Neighbor (KNN), and Decision Tree (DT), with ensemble methods like Boosting, Bagging, and Random subspace for classification the dataset. However, the direct use of raw data by the proposed method limits their effectiveness.
To improve the use of the proposed EL technique, we apply the KPCA method for feature extraction and selection in order to extract the most relevant and sensitive features from data. This, in turn, plays a pivotal role in improving the fault diagnosis results using the proposed ensemble learning (EL) technique. Although KPCA can extract nonlinear features in a high-dimensional space, it increases the space and time complexity. Therefore, to enhance the use of KPCA for feature extraction and selection in terms of computation time and storage cost a reduced KPCA (RKPCA) will be proposed. The main idea behind RKPCA is: i) select only the effective samples from raw data using Euclidean distance metric, ii) use reduced data to build KPCA model. Hence, the proposed technique for fault diagnosis achieves the best tread-off in terms of computation time and diagnosis metrics.
In the classification phase, once the global features are extracted and selected using KPCA or RKPCA techniques, it is applied as input data for the ensemble learning (EL) classifier. Thus, some arbitrary groups of the significant selected features are used to train the EL model. Finally, a comparison between the ensemble learning output results using the different selected arbitrary groups is made to make effective decisions. The main steps of the KPCA-EL and RKPCA-EL techniques are illustrated in Algorithm 1 and schematic diagram 1 .
Time complexity analysis with Big-O notation is one of the most important concepts in learning in order to construct efficient code. In Big O analysis, we only consider the most dominant term, as the other terms and constants become insignificant asymptotically. Kernel PCA performs an eigen decomposition on the kernel expansion of the data, an mxN   N ). Therefore, the standard decision-tree learning algorithm has a time complexity of O(m * N 2 ). The decision tree complexity of a function is the minimum depth of a decision tree that computes this function. When training a decision tree, a split has to be found while a maximum depth d has been reached. This split is finding by looking at each variable (there are N of them) to the different thresholds (there are up to n of them) and the information gain that is completed (evaluation in O(n)). Figure 2 shows the synoptic of the PV system under study, where PV and grid emulators are used to emulate the operation (under different operating modes) of PV panels and a 3-phase grid respectively . Table 1 shows the system variables considered in this study, where the measurements are recorded each 5-15s depending on the nature of the faults and their occurrence. The faults were emulated at different system stages (common coupling point, inverter, sensors, emulated PV arrays,...) to ensure a comprehensive analysis [36], [69]. A first fault circuit, sudden disconnection) The healthy operation was assigned to class C0 while the 5 faulty modes (referring to faults F 1 -F 5 ) were assigned to classes C1 to C5 as per Table 2.

B. PERFORMANCE METRICS
For performance evaluation and comparison, the adopted criteria are: Accuracy (%), which represents the ratio of correctly predicted observation over the total number of observations. Recall (%) which represents the the ratio of correctly predicted positive observations to the all observations in the pertinent class. Precision (%) which represents the number of of correctly predicted positive observations divided by the number of total predicted positive observations. F 1 Score (%) which represents the weighted average of Precision and Recall, therefore, this score takes into account both false negatives and false positives. Computation time (CT (s)) which defines the time needed to execute the algorithm.   Table 3 shows the multi-class classification results where it can be clearly noticed that the results obtained using the proposed method in terms of accuracy (48.89), F 1 score (48.88%), recall (48.90%), and precision(48.90) are higher than the obtained results using other machine learning classifiers. It is easy to conclude that the proposed multiple learning methods enhance the fault classification performance. Moreover, a new EL-based framework (KPCA) is proposed to further enhance the fault diagnosis performance of the proposed multiple learning technique, where the data set is scaled to zero mean and unit variance. Then, the models are constructed under normal operating conditions. The retained number of KPCs using the CPV criterion is equal to 28. To illustrate the FDD the efficacy of the developed methods, a 10-fold cross-validation approach was used to obtain the classification accuracy. The healthy operation was assigned to class C0 while the 5 faulty modes (F 1 -F 5 ) were assigned to classes C1-C5 (Table 2). To get a good classification performance, it is important to select the best statistical characteristics from the extracted features. Accordingly, five arbitrary groups of features are performed and the best one is selected (Table 4).
First, a fault database is collected and labeled using the emulation data. Then, the labeled data are applied as inputs for the proposed KPCA technique which can be splitted into a multi-class classifier stage (see Table 5). A first comparison is led between five arbitrary groups (see Table 5) using KPCA model in terms of accuracy. The comparison results from Table 5 show that group 5 of features provide a classification accuracy equal 96.96% which present the best one compared to other used groups of features (less than 60%).
In order to more highlight the effectiveness of the proposed ML-based KPCA by decreasing the complexity and computational time, a proposed ML-based reduced KPCA method is done. The retained number of KPCs based RKPCA method using the CPV criterion is equal to 18. For multiclass classifiers, a comparison between the two proposed techniques in terms of accuracy and computation time is presented in Table 6. The results in Table 6, show that the proposed multiple model-based RKPCA achieves the best tread-off between accuracy (100% and computation time (110.36). Additionally, the computation time is reduced by more than 50% using EL-based RKPCA (110.36) compared to EL-based KPCA (221.85) technique. Thus, the proposed methods can not only reduce the computation cost but also guarantee the monitoring abilities. The confusion matrix (CM) is another performance measurement for machine learning classification. The CM provides more information not only about the performance of a predictive model, although about which classes correctly pre-dicted, which incorrectly, and the type of errors made. Therefore, to more investigate the effectiveness of the proposed techniques, the confusion matrices of the EL-based KPCA and EL-based RKPCA techniques are presented in Tables 7  and 8 where the correct and miss-classified observations for different condition modes are presented. The rows present the predicted process statuses class while the columns show the true classes. Referring to the results given in Tables 7  and 8, it is clear for the healthy case (C0) that the enhanced classifiers KPCA-EL technique identifies 1500 measurement (true positive) from 1501 measurements and RKPCA-EL technique correctly identifies 1501 measurement for data sets. The results show that the proposed techniques are able to differentiate the six different modes and to get good classification results. In addition, the precision is 100% and the recall is 100% with 0.0% of misclassification using RKPCA-EL for all faulty cases.   Next, we consider a bank of one class classifiers. At this stage, the bank applies six classifiers. Each one is trained in order to classify a specific class labeled by 1 or -1 as shown in Table 9. Table 10 presents the global performance accuracy using the selected features of group 5 as inputs in the case of one class classifier scenario. The comparison results presented in Table 10 show that the two proposed techniques KPCA-EL and RKPCA-EL provide good results during the training and testing phases with a mean of accuracy equal to 99.97 and 99.99 using KPCA-EL and RKPCA-EL, respectively.
In fault detection, the widely used metrics to assess the performance of diagnostic results are False Alarm Rate (FAR) and Missed Detection Rate (MDR). FAR is defined as the ratio between the number of misclassified measurements and the total number of measurements under healthy conditions (C 0 ). For each faulty scenario, the corresponding measurements classified in class C 0 are considered as missed detected. The MDR presents the ratio between the number of the missed detected measurements and the total number of measurements of its corresponding class. These metrics established for the two proposed methods KPCA-EL and and RKPCA-EL are illustrated in Table 11. The selected features feeding into the multiple models provides good results in terms of MDR and FAR. Overall, the best performance was obtained using the proposed RKPCA-EL method with FAR and MDR equal to zero. In order to more assess the obtained results and to support decision making process, we adopt the Friedman test methodology who is a non-parametric test at the significance level of α = 0.05 [46], [70]. The obtained p − values of the tests with the base classifiers Ensemble learning (EL), Bagging, RF, DT, kNN and SVM are showed in Table 12 Table 13. Table 13 summarizes the performance according to the Accuracy, F 1 score, Recall, Precision, and computation time (CT). The comparative analysis showed that the proposed RKPCA-EL method totally outperformed the other models in terms of Accuracy (100 %), F 1 score (100 %), Recall (100 %) and Precision (100 %). Additionally, it is obvious that the performance of the fault classification, as well as the classification accuracy, were significantly enhanced using the proposed KPCA-EL and RKPCA-EL methods compared to deep learning models. From Table 13, it is shown that because the KPCA and RKPCA models can manipulate the nonlinearity of the PV system it improves the feature extraction accuracy and outperforms other deep learning classifiers. Thus, KPCA-EL and RKPCA-EL classifiers are more useful for fault diagnosis. In addition, the application of the proposed techniques for fault classification makes the performance of fault diagnosis effective. The presented deep learning classifiers provide a classification accuracy less than 80% and a classification error greater than 20%. The poor classification results are due to the direct use of measured variables which indicates the success of the proposed KPCA-EL and RKPCA-EL methods which extract and select the more pertinent features before performing the classification. For NN, MNN, FFNN, CFNN, GRNN, PNN, and RNN classifiers, the highest classification rate was reached using CFNN and MNN with accuracy values of 79.86% and 79.09% and a misclassification rate of 20.14% and 20.91%, respectively. Thus, the use of these classifiers provides low classification accuracy which leads to poor fault diagnosis performances.

VII. CONCLUSIONS AND FUTURE RESEARCH DIRECTIONS
In this work, three ensemble learning techniques are proposed to provide a reliable prediction for Grid-Connected Photovoltaic (PV) systems. Ensemble machine learning paradigms aims at developing effective and reliable models with higher accuracy than single machine learning. The main contributions are threefold: first, using the SVM, KNN, and tree models, we constructed an ensemble learners in order to obtain accurate performance than single learner to distinguish between the different PV system operating modes using the extracted raw data. Second, in order to further enhance the diagnosis results, intelligent FDD techniques were proposed, where the main steps are: feature extraction, features selection, and fault classification. For the features extraction and selection steps, KPCA and RKPCA methods are performed to extract and select the most significant features. Then, the most sensitive and significant characteristics are transmitted to the ensemble learning models for classification purposes. The developed approaches were developed to monitor a grid-connected PV system under healthy and faulty conditions. The experimental results demonstrated the feasibility and effectiveness of the proposed FDD techniques. The fault detection results obtained using the developed approaches provided some false alarm and missed detection rates and a few faults were not correctly detected. Hence, one future research perspective is to develop an online KPCAbased methods to update the model which may lead to a reduction in false alarm and missed detection rates. The second perspective is to extend the online KPCA-based methods to deal with uncertainties by using interval-valued data representation [71]. Besides, we propose to improve our contribution on detection and diagnosis purpose by using online models for more features extraction and selection in order to enhance the diagnosis metrics and classification rate of complex systems under different operating conditions. In the current work, the classical EL algorithm was utilized to model the dynamic nature in both offline training and online update phase using the newly arrived measurements. Instead, using online extensions of RF model in the first place, such as online incremental RF (presented in [72]) or Mondrian forests (described in [73], [74]), may reduce the training and update time.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2021.3128749, IEEE Access He has served as an Associate Editor and on the technical committees of several international journals and conferences. He has significant experience in research on control systems, databased control, system identification and estimation, fault detection, and system biology. He has been awarded several NPRP research projects in these areas. He has successfully served as the lead PI and a PI on five QNRF projects, some of which were in collaboration with other PIs in this proposal. He has published more than 200 refereed journal and conference papers and book chapters. He is a senior member of the IEEE. Email: hazem.nounou@qatar.tamu.edu MOHAMED NOUNOU (Senior Member, IEEE) is a professor of Chemical Engineering at TAMU-Texas AM University at Qatar. He has more than 19 years of combined academic and industrial experience. His research interests are in the area of systems engineering and control, with emphasis on process modeling, monitoring, and estimation. He has published more than 200 refereed journal and conference publications and book chapters. He has successfully served as the lead PI and a PI on several QNRF projects (6 NPRP projects and 3 UREP projects). He is a senior member of the AIChE (American Institute of Chemical Engineers) and a senior member of the IEEE (Institute of Electrical and Electronics Engineers). Email: mohamed.nounou@qatar.tamu.edu KAIS BOUZRARA is a professor of Electrical Engineering at Laboratory of Automatic Signal and Image Processing, National Engineering School of Monastir, Monastir, Tunisia. He has more than 15 years of combined academic and industrial experience. His research interests are in the area of systems engineering and control, with emphasis on process modeling, monitoring, and estimation. He has published more than 80 refereed journal and conference publications and book chapters. Email: bouzrara.kais@gmail.com VOLUME 4, 2016