Ensemble Deep Random Vector Functional Link Neural Network Based on Fuzzy Inference System

—The ensemble deep random vector functional link (edRVFL) neural network has demonstrated the ability to address the limitations of conventional artificial neural networks. However, since edRVFL generates features for its hidden layers through random projection, it can potentially lose intricate features or fail to capture certain non-linear features in its base models (hidden layers). To enhance the feature learning capabilities of edRVFL, we propose a novel edRVFL based on fuzzy inference system (edRVFL-FIS). The proposed edRVFL-FIS leverages the capabilities of two emerging domains, namely deep learning and ensemble approaches, with the intrinsic IF-THEN properties of fuzzy inference system (FIS) and produces rich feature representation to train the ensemble model. Each base model of the proposed edRVFL-FIS encompasses two key feature augmentation components: a) unsupervised fuzzy layer features and b) supervised defuzzified features. The edRVFL-FIS model incorporates diverse clustering methods (R-means, K-means, Fuzzy C-means) to establish fuzzy layer rules, resulting in three model variations (edRVFL-FIS-R, edRVFL-FIS-K, edRVFL-FIS-C) with distinct fuzzified features and defuzzified features. Within the framework of edRVFL-FIS, each base model utilizes the original, hidden layer and defuzzified features to make predictions. Experimental results, statistical tests, discussions and analyses conducted across UCI and NDC datasets consistently demonstrate the superior performance of all variations of the proposed edRVFL-FIS model over baseline models.


I. INTRODUCTION
T HE efficacy of neural network-based models stems from their ability to discern intricate latent patterns within input and output vectors, leveraging their inherent ability to capture complex relationships in data.This attribute facilitates superior learning and predictive capabilities in diverse applications such as in health care [1], stock market predictions [2], Alzheimer's disease diagnosis [3], solving mathematical differential equations [4] and so on.
Neural networks (NNs) aim to discover a hypothesis function, denoted as h, within the hypothesis space H, to effectively approximate a target function f defined on a domain χ.The primary objective in learning h is to minimize generalization error by closely aligning the approximation with the true, albeit unknown, function.h serves as a mapping from input space to corresponding target labels or classes.
Mathematically, z r = h(v r , Θ), where z r and v r are the input and output vectors, respectively.The set Θ encompasses the learning parameters associated with the hypothesis function h.The accuracy of the hypothesis h in approximating the real function f is contingent upon the efficacy of determining/computing the parameters Θ.
In traditional NNs, the backpropagation (BP) algorithm iteratively seeks to optimize model parameters (Θ) by comparing predicted outputs to true outputs.However, in BP-based NNs, several challenges arise during the training process, such as potential slowness, susceptibility to local optima [5], and the critical influence of factors such as learning rate and initialization point.Training conventional NNs on large datasets demands significant computational resources and time due to the gradient-based iterative process of the BP algorithm.
Randomized NNs (RNNs) [6], introduced as an alternative to BP-based NNs, address the abovementioned limitations of conventional NNs.In RNNs, certain parameters are fixed during the training process, and the output layer's parameters are calculated using an iterative process or closed-form solution.The random vector functional link (RVFL) neural network [7,8] is a shallow feed-forward RNN characterized by randomly initialized hidden layer parameters that remain fixed throughout the training process.RVFL distinguishes itself from other RNNs through direct link incorporation between input and output layers.Direct links enhance the learning performance of the RVFL by behaving as an implicit regularization technique [9,10].Utilizing the Pseudo-inverse or least-squares approach, RVFL provides a closed-form solution to determine optimal output parameters, demonstrating universal approximation capability [11].This methodology minimizes the need for extensive computational resources, offering efficient learning with fewer adjustable settings.
The shallow RVFL model is associated with certain limitations, notably an unstable classifier [12], and its reliance on a single hidden layer hinders its ability to effectively capture the intricate hidden relationships present in the data.To overcome those limitations, within the standard RVFL model, notable enhancements have been implemented using two distinct research frameworks: ensemble learning [13] and deep learning [14].Shi et al. [15] proposed deep RVFL (dRVFL) and ensemble dRVFL (edRVFL) networks.The dRVFL introduces a deep architecture that can accommodate multiple hidden layers and captures complex relationships within the dataset.The edRVFL network is based on implicit ensemble learning and leverages each hidden layer as a classifier to form an ensemble.The incorporation of multiple RVFL models (base models) within the edRVFL framework enhances its stability and effectiveness compared to the shallow RVFL model.
Over the years, several variants based on the edRVFL architecture have been developed.The original edRVFL lacks provisions for online learning.Recognizing this gap, Gao et al. [16] introduced the dynamic edRVFL, a variant that introduces three crucial online components: online decomposition, online training, and online dynamic ensemble.In the field of cognitive research, Li et al. [17] proposed the spectral-edRVFL (Se-dRVFL) network, which specializes in feature learning from electroencephalogram (EEG) data to enhance passive braincomputer interface (pBCI) classification tasks.In [18], authors focused on medical imaging, where they combined magnetic resonance imaging (MRI) and positron emission tomography (PET) scans using a convolutional neural network (CNN) and an ensemble of non-iterative RVFL models for diagnosing Alzheimer's disease.Cheng et al. [19] contributed an edRVFL variant tailored for predictive analytics, specifically in the context of ship order book dynamics.He et al. [20] presented the model-embedded self-supervised edRVFL (ME-SedRVFL) network for underwater signal processing, particularly in estimating the direction-of-arrival (DOA) of acoustic signals.These advancements underscore the versatility of the edRVFL model across various domains, ranging from cognitive neuroscience to signal processing.
Fuzzy logic [21] represents a mathematical approach that addresses uncertain and imprecise information nuances.Within this framework, fuzzy inference system (FIS) [22] is a computational model employing fuzzy logic principles.The FIS operates by integrating fuzzy sets, linguistic variables, and a set of rules to emulate human-like IF-THEN reasoning [23].Its fundamental components include fuzzification, rule evaluation, and defuzzification [24].In the fuzzification phase, precise input values are transformed into fuzzy sets.Rule evaluation involves applying a set of conditional statements expressed in linguistic terms to generate fuzzy output values.The subsequent defuzzification process converts these fuzzy outputs into precise and actionable outcomes.FIS finds applications across diverse domains, including control systems [25], pattern identification [26], finance [27], weather forecasting [28], classification [23,24] and so on.
The performance of NN-based models heavily relies on the generation of hidden layers' features, which influence the subsequent propagation of information to the output layers.Since the edRVFL generates its hidden layers' features through random projection, therefore, certain salient features may be lost during hidden layer formation, or some non-linear features may got uncaptured by the edRVFL's base models (hidden layers).The studies suggest that the augmentation of feature enhancement in RVFL-based variants [29,30,31] markedly enhances their performance by capturing nuanced patterns and relationships in the data.However, the feature enhancement techniques remain largely unexplored in the context of edRVFL-based models.
Motivated by the successful implementation of feature enhancement techniques in RVFL-based models and considering the dynamic nature of feature enhancement capability of FIS through the generation of the number of fuzzy centres, as many as necessary (refer to Section III), to facilitate the learning process and improve the model's overall effectiveness; we propose an ensemble deep RVFL based on a fuzzy inference system (edRVFL-FIS).Within this architecture, input samples traverse a fuzzy layer employing fuzzification.Subsequently, the original input features are concatenated with fuzzy features, forming an extended feature layer and contributing to the generation of hidden layers' features.Each base model utilizes both the hidden layers and defuzzified features (obtained through the defuzzification process) to make predictions.This ensemble framework leverages the synergy between deep learning and fuzzy logic to enhance the model's capacity to capture intricate patterns in the data.
As the fuzzy layer is constructed based on fuzzy rule centers, and it interacts with hidden layers and defuzzified layer, the significance of fuzzy centers becomes evident in the generation of hidden features of the proposed edRVFL-FIS model.Thus, it can be inferred that different clustering techniques excel in distinct scenarios, contributing to the overall versatility and effectiveness of the proposed edRVFL-FIS model.Therefore, we propose three variants of the edRVFL-FIS model by incorporating diverse clustering approaches-randomly initialized centers (R-means), Kmeans, and Fuzzy C-means-to establish fuzzy layer centers.This yields three model variations, namely, edRVFL-FIS-R, edRVFL-FIS-K, and edRVFL-FIS-C, respectively.The paper's key highlights are as follows: 1) We propose the edRVFL-FIS model, in which each hidden layer's features are generated on the ground of enhanced feature space (obtained through the concatenation of original and fuzzy features).

II. RELATED WORKS
Before delving into the structure and mathematical framework of the proposed edRVFL-FIS model, it is necessary to go through the mathematical frameworks of RVFL and edRVFL models to establish a foundational knowledge base.Additionally, defining the Takagi-Sugeno-Kang (TSK) FIS is crucial as it forms the core component in our proposed edRVFL-FIS model (refer to Section III).Thus, firstly, we fix some notations and discuss the framework of RVFL (in Section S.I of the supplementary material) and edRVFL, followed by the TSK fuzzy systems.

A. Notations
Let M be the total number of training samples.Let V and Z be the collection of all input and target vectors, B. Ensemble Deep RVFL (edRVFL) [15] Derived from the principles of deep representation learning, the deep RVFL (dRVFL) represents an advancement of the shallow RVFL structure.This extension involves stacking multiple enhancement layers, creating a framework for profound representation learning.Each enhancement layer processes the input features to guide the generation of random features, fostering the creation of a varied feature set through hierarchical structures.By integrating ensemble learning into the architecture of dRVFL, ensemble deep RVFL (edRVFL) was proposed, leveraging both ensemble and deep learning frameworks.Diverging from common deep learning models characterized by a single output layer, the edRVFL undertakes the training of multiple output layers, utilizing all hidden layers to create a base model.
In an edRVFL network comprising L enhancement layers, each layer contains g enhancement nodes.Let the hidden matrix of the l th layer be denoted as G l ∈ R M ×g , for l = 1, 2, . . ., L. G 1 is calculated as: where X 1 ∈ R P ×g is the randomly initialized weights matrix, is the bias vector and γ is the activation function.G l (l > 1) is calculated as: Here, X l ∈ R (P +g)×g and ζ l ∈ R 1×g (for l > 1) denotes the weights matrix and the bias vector in the l th layer.The enhancement features, formed by concatenating the input and hidden features, serve as input to the respective output layer.The calculation of G l follows the identical procedure used in RVFL.The classification is attained by consolidating the outcomes from all base models (hidden layers) through averaging or majority voting methods.
C. Takagi-Sugeno-Kang (TSK) Neuro-Fuzzy Inference System (FIS) [32] The TSK neuro-FIS [32] is an efficacious approach for modeling intricate systems in uncertainty.TSK exhibits the ability to capture nonlinear relationships transparently through the amalgamation of fuzzy sets, fuzzy rules, and local linear models.Its broad applicability and practicality render it a widely adopted choice in diverse real-world scenarios in the realms of fuzzy logic and deep learning [33].The TSK operates on a collection of IF-THEN fuzzy rules and is commonly formulated as: where F kp is a fuzzy set, v rs is the system input (p = 1, 2, . . ., P ), and K is the number of fuzzy rules.Here, the function ξ k in a fuzzy system is often described as any relevant function capable of properly describing the output within the given range of fuzzy rules.Nevertheless, in practical terms, ξ k is presumed to take the form of a linear polynomial in the input variables within the context of a fuzzy system.The k th rule's fire strength is calculated as: where Θ kp is the membership function corresponding to the fuzzy set F kp .
The defuzzification of the TSK model is carried out as follows: .
(4) The above-discussed related works not only enhance our understanding of existing models but also pave the way for proposing the edRVFL-FIS model.

III. THE PROPOSED ENSEMBLE DEEP RVFL BASED ON A FUZZY INFERENCE SYSTEM (edRVFL-FIS)
In this section, we fuse the fuzzy inference system, (i.e., TSK) with the edRVFL structure and propose a hybrid edRVFL-FIS model that has a reach feature representation.The architecture of the proposed edRVFL-FIS is represented in Figure 1.In the edRVFL-FIS model, input samples undergo a fuzzification process and form a fuzzy layer.Further, the generation of a defuzzified layer takes place via the defuzzification process.The subsequent hidden layer is formed by the input, fuzzified, and preceding hidden layers (except for the first hidden layer), utilizing random projection.Each hidden layer within the model operates as an autonomous base model (RVFL).We refer to the hidden layer features, original input features, and defuzzified features collectively as enhanced fuzzified features.The sample with enhanced features for the l th base model serves as an input and propagates to the output layer of the model.Each base model within the edRVFL-FIS framework behaves as RVFL models that incorporate distinct input features.This architecture of the proposed edRVFL-FIS allows for a structured and modular processing of data, where each hidden layer operates as its own self-contained model within the larger network architecture.Moreover, the ensemble structure leverages diverse supervised defuzzified features for each base model, enhancing richness in feature extraction during output prediction.
First of all, we describe the general structure of the proposed edRVFL-FIS model, and then the subsequent subsections discuss the method of fuzzification and defuzzification, which will be incorporated in the proposed edRVFL-FIS model for feature enhancement.

A. Proposed Model: A Framework
Suppose K denotes the total number of fuzzy rules in the fuzzy layer.Let F ∈ R M ×K be the fuzzy layer's matrix and B ∈ R M ×C be the defuzzified matrix corresponding to all the training samples.Hidden layer formations and their corresponding outputs: 1 st hidden layer and output: The training input matrix V and the fuzzified matrix F traverse the first hidden layer, G 1 , through random projection followed by non-linear transformation by employing the activation function γ.Therefore where X 1 ∈ R (P +K)×g and ζ 1 ∈ R 1×g are the randomly generated weight matrix and bias vector, respectively, for the 1 st hidden layer.The first hidden layer generates the first base model for the ensembling.The enhancement features (formed by the input, hidden, and defuzzified features) serve as input to the first output layer.Therefore, the first layer's output matrix Z 1 ∈ R M ×C is defined as: where Q 1 ∈ R (P +g)×C is the unknown weight matrix connecting the input layer and the 1 st hidden layer to the output layer and l th hidden layer and output for l = 2, 3, . . ., L: The training input matrix V , the previous hidden layer's matrix G l−1 and the fuzzified samples matrix F traverse the i th hidden layer, G l .Therefore, where X l ∈ R (P +g+K)×C and ζ l ∈ R 1×g are the randomly generated hidden layer weight matrix and bias vector, respectively, for the l th hidden layer.The l th hidden layer behaves as a base model (RVFL) and the corresponding output matrix Z l ∈ R M ×C is defined as: where Q l ∈ R (P +2g)×C is the unknown weight matrix connecting the input layer and the i th hidden layer to the output layer and The anticipated output matrices Z l 's (for i = 1, 2, . . ., L) are combined through majority voting or averaging to find the final predicted output.However, how to find the fuzzy layer's matrix F , the defuzzified matrix B, and the weights matrix Q l connecting the base models (hidden layers) to the output layer remains to answer.We discuss answers to the aforementioned questions in the subsequent subsections.

B. Fuzzification Process for the Proposed Model
The proposed edRVFL-FIS model uses fuzzified matrix F into the generation of the hidden layer process.Therefore, each and every training sample is fuzzified using the TSK FIS.For this, let F kp be the fuzzy set with the membership function Θ kp of p th component of k th fuzzy rule of the fuzzy layer, where p = 1, 2, . . ., P , k = 1, 2, . . ., K.
If v r1 is F k1 and v r2 is F k2 . . .and v rP is F kP then, by considering the first-order TSK fuzzy system, we get Here β kp are the randomly generated coefficients.The weighted fire strength of the k th fuzzy rule in the fuzzy layer is given as: In our work, we consider the Gaussian membership function within fuzzy sets of the fuzzy layer.The membership value of v rp for the k th fuzzy rule is defined as : where c k = (c k1 , c k2 , . . ., c kP ) is the center and σ k = (σ k1 , σ k2 , . . ., σ kP ) is the standard deviation in the k th fuzzy rule.The number of fuzzy rules in the fuzzy layer is equal to the number of centers.
In the proposed edRVFL-FIS model, we generate the centers c k in the fuzzy layer using three different clustering methods, which are: 1) Randomly initialized centers (say R-means).
3) Using Fuzzy C-means clustering.Remark: Three distinct variants of edRVFL-FIS arise from the application of three different center initialization approaches, thereby enhancing the diversity within our proposed model.The variations of the proposed edRVFL-FIS initialized with R-means, K-means, and Fuzzy C-means clustering techniques are denoted as edRVFL-FIS-R, edRVFL-FIS-K, and edRVFL-FIS-C, respectively.Now, the unsupervised fuzzified vector of the r th training sample v r is given as: E r = (ω r1 y r1 , ω r2 y r2 , . . ., ω rK y rK ) , r = 1, 2, . . ., M. (11) The fuzzified matrix F corresponding to the input matrix V is given as:

C. Defuzzification Process for the Proposed Model
Here, we articulate the defuzzification process for the fuzzified vectors.This process enables the defuzzified output from the fuzzy layer of our proposed edRVFL-FIS model to contribute to the anticipation of the final output layers alongside the original features (via direct links) and the hidden layers.Since the target matrix Z = (z t 1 , z t 2 , . . ., z t M ) t ∈ R M ×C has C classes, therefore the output of each fuzzy layer must have C features.The defuzzified output B r ∈ R 1×C of the sample v r is defined as follows: where Therefore, for the input matrix V , the defuzzified matrix is given as: Here ⊙ represents the componentwise matrix multiplication,

D. Output Layers Parameters for Each Base Model of the Proposed Model
To get the output of the base models, we put the value of B from Eq. ( 15) in ( 6) and ( 8), we get where Here, To get the output of each base model of the edRVFL-FIS, Q Λ l is the matrix of unknowns that need to be found.The resultant optimization problem of Eq. ( 17) is given as: The weight matrix Q Λ l is computed as follows: where I is an identity matrix of conformal dimension.

E. Final Output: Test Condition
For an unknown sample x, the output of the l th (l = 1, 2, . . ., L) base model is The base model outputs, Z l (x), are combined using majority voting to get the final output as follows: where I(Z l (x) = j) is the indicator function that evaluates to 1 if Z l (x) = j, and 0 otherwise.

F. Further Takeaways from the Defuzzification Matrix of the Proposed Model
Let #Row(Q l ) denote the number of rows in the matrix Q l , 0 be a matrix of dimension #Row(Q l ) with all the entries equal zero and I K as an identity matrix of dimension K. Since, multiplying (from left) both sides by the matrix [0, I K ], we get Eq. ( 25) implies that the matrix Λ holds distinct values for each base model (hidden layer l) within the ensemble, emphasizing the uniqueness of each model's contribution.The ensemble structure strategically harnesses the diverse supervised defuzzified matrices B across multiple base models, capitalizing on the inherent diversity embedded within the proposed edRVFL-FIS model.Consequently, the proposed edRVFL-FIS demonstrates rich feature extraction capability achieved by leveraging original (through direct link), hidden layer, and defuzzified features.

G. Time and Space Complexity of the Proposed edRVFL-FIS Models
The complexity of the proposed edRVFL-FIS models primarily depends on two components: (a) clustering technique and (b) requirement of matrix inversion in (21).The number of hidden layers L for the edRVFL-FIS has taken ≤ 10 (see Section S.II of the supplementary materials), which is very small compared to other variables; therefore, we generally do not give much attention to L in the computational complexity.Time/Temporal Complexity: Following [34], the time complexity of K-means and Fuzzy C-means are O((M P K)iter) and O((M P K 2 )iter), respectively, where "iter" represents the number of iterations.The time complexity of R-means is O(K).Following [15], time complexity in computing inverse in (21) 9).9: Calculate: E r using Eq. ( 11).10: Calculate: F using Eq. ( 12).11: Construct: B using Eq.( 15).12: Construct: β and Ω using Eq. ( 16).13: Calculate: G 1 using Eq. ( 5).14: Construct: Z 1 using Eq. ( 6).15: Calculate: G l using Eq. ( 7).16: Construct: Z l using Eq. ( 8).17: Construct: J l using Eq. ( 18).18: Construct: Q Λ l using Eq. ( 19).19: Find: The output weights parameters for each base model Q l Λ min using Eq. ( 21) for l = 1, 2, . . ., L.

20:
Validate: For an unknown sample x, calculate the output of the l th base model using Eq. ( 22).21: Output: Final output of x is found using Eq.(23).

IV. EXPERIMENTS AND RESULTS
The section presents comprehensive details on the datasets, the compared models, and the experimental configuration in Section S.II of the supplementary material.Subsequently, we delve into the experimental results and conduct statistical analyses on both UCI and NDC datasets.Finally, we examine the influence of fuzzy rules, hidden layers and robustness of the proposed edRVFL-FIS models.

A. Datasets
To evaluate the efficacy of the proposed edRVFL-FIS models, we employ datasets from UCI [36] and NDC [37].We TABLE I: Classification accuracies of the proposed edRVFL-FIS models and the baseline models on Category-1 UCI datasets.Tables S.VI, S.VII, and S.VII of supplementary materials contain the standard deviations, ranks, and best hyperparameter setting, respectively, for each model and dataset.
categorize UCI datasets into two groups: Category-1 with sample sizes ranging from 100 to 1000, and Category-2 with sample sizes between 1000 and 10000.These datasets span diverse domains and sizes, with detailed statistics provided in supplementary Tables S.IV and S.V.The NDC dataset, generated using David Musicant's NDC Data Generator [37], spans from 10 thousand to 1 million samples, consistently featuring 32 features.

B. Compared Models
We compare the proposed edRVFL-FIS models with eight baseline and state-of-the-art RNN-based models.The enumerated models under examination include: 1) RVFL: Standard shallow RVFL model [7].

C. Experimental Results, Discussions and Statistical Analysis on Category-1 UCI Dataset
The proposed edRVFL-FIS models are evaluated against existing baseline models using category-1 datasets from the UCI repository.The accuracy of the models based on category-1 datasets is reported in Table I, and the corresponding standard deviation, rank, and best hyperparameter settings are reported in the supplementary Tables S.VI, S.VII, and S.VIII, Evaluating model performance solely based on average accuracy can be misleading, as it may mask variations in performance across diverse datasets.Following the recommendations of Demšar [43], we employed a suite of nonparametric statistical tests to compare classifiers developed on multiple datasets, especially when the conditions for applying parametric tests are not met.These tests included the ranking, Friedman, and Wilcoxon-signed-rank tests, allowing for a comprehensive evaluation and comparison of the classifiers' performance.In the ranking approach, models are ranked based on their performance across individual datasets, assigning higher ranks TABLE III: Classification accuracies of the proposed edRVFL-FIS models and the baseline models on Category-2 UCI datasets.Tables S.IX and S.X of supplementary materials contain the best hyperparameter setting and rank, respectively, for each model and dataset.
ELM [38] BLS [39] H-ELM [40] dRVFL [15] edRVFL [15] Fuzzy BLS [41]   to poorer performers and lower ranks to top performers.Consider a scenario with M models assessed across D datasets, where the rank of the m th model on the d th dataset is denoted as R d m .Mathematically, the m th model's average rank can be computed as follows: R m = D d=1 R d m /D.The models' average ranks, presented in the second-to-last row of Table I, highlight a consistent trend.The proposed models consistently secure the lowest average ranks, emphasizing their superior generalization performance.Specifically, edRVFL-FIS-C achieves the lowest average rank at 2.53, followed by edRVFL-FIS-R at 2.84 and edRVFL-FIS-K at 3.39.In contrast, H-ELM performs least favourably, with an average rank of 9.03.This underscores the clear superiority of the proposed edRVFL-FIS models over existing shallow and deep state-of-the-art variants of the RNN family.
Moreover, the Friedman test [44] is applied for further statistical insights to compare models' average ranks and determine significant differences based on their rankings.Using a chi-squared statistic (χ 2 F ) with M − 1 degrees of freedoms (DoFs), the test is formulated as follows: . Iman and Davenport [45] pointed out that Friedman's χ 2 F statistic might be overly cautious.To address this, they introduced an enhanced statistic called the F F statistic, calculated as: . The F F statistic's distribution has (M − 1) and (D − 1)(M − 1) DoFs.In our experiment, M = 11 and D = 19, therefore, we get χ 2 F = 82.6811and F F = 13.8676.Referring to the F -distribution table, we note that F F (10, 180) = 1.8836 at a 5% significance level.The null hypothesis is rejected because 13.8676 > 1.8836, indicating significant differences among the models.Now, we employ the Wilcoxon-signed-rank test to assess the pairwise significant distinction between the proposed edRVFL-FIS models and existing baseline models.This test operates on the assumption that baseline models and the proposed edRVFL-FIS models exhibit equivalence performance under the null hypothesis.For each dataset, the absolute difference in ranks between the two models is ranked, considering the sign of the differences.If the p-value of the model comparison falls below 0.05, the null hypothesis is rejected, indicating that the model with the lower rank is statistically superior to the one with the higher rank.In Table II, the p-values and the corresponding status of the null hypothesis are shown.The pairwise comparisons between the proposed edRVFL-FIS models and baseline models underscore a compelling narrative, i.e., the consistent rejection of the null hypothesis signifies a statistical superiority of all three proposed models, i.e., edRVFL-FIS-R, edRVFL-FIS-K, and edRVFL-FIS-C over state-of-the-art RNN models.This analysis solidifies the commendable performance and statistical excellence of the proposed edRVFL-FIS models across the board.

D. Experimental Results, Discussions and Statistical Analysis on Category-2 UCI Dataset
The performance evaluation of models based on category-2 datasets is detailed in Table III.Supplementary Tables S.IX and S.X provide the optimal hyperparameter settings and ranks, respectively.Among the models assessed, the proposed edRVFL-FIS-C, edRVFL-FIS-K, and edRVFL-FIS-R exhibit the highest average accuracies at 72.318%, 72.0565%, and 72.0415%, respectively.In contrast, the neuro-fuzzy-based BLS model, i.e., Fuzzy BLS, displays the lowest accuracy among all compared models.This underscores the significance of the learning process and adaptation of fuzzy rules within the FIS framework, as their efficacy directly impacts model performance.The analysis affirms that the proposed edRVFL-FIS models adeptly extract deep features from dataset samples by effectively employing fuzzification and defuzzification processes.A parallel observation emerges when considering the average ranks of the models, as outlined in the final row of Table III.The proposed edRVFL-FIS-K and edRVFL-FIS-C models boast the lowest ranks, standing at 3.06, while the edRVFL-FIS-R model follows closely with a rank of 3.39.TABLE V: Classification accuracies of the proposed edRVFL-FIS models and the baseline models on NDC datasets.Tables S.XI, S.XII, and S.XIII of supplementary materials contain the standard deviations, ranks, and best hyperparameter setting, respectively, for each model and dataset.
In contrast, Fuzzy BLS lags behind with the least favorable ranking, averaging 7.11.
In our experiment, where M = 10 and D = 9, we obtained χ 2 F = 29.4344and F F = 4.5665.From F -distribution table, we get F F (9, 72) = 2.0127 at a 5% significance level.The rejection of the null hypothesis is warranted, as the observed value 4.5665 exceeds the critical value 2.0127, indicating substantial differences among the models.To delve deeper into these distinctions, we employed the Wilcoxon-signed-rank test to assess pairwise significance between the proposed edRVFL-FIS models and existing baseline models.The corresponding p-values are presented in Table IV.
Upon closer examination, it becomes evident that the proposed edRVFL-FIS-R and edRVFL-FIS-K models exhibit greater statistical significance than all other baseline models, except for edRVFL and dRVFL.The proposed edRVFL-FIS outperforms RVFL, ELM, H-ELM, and edRVFL, showcasing its statistical superiority.The Wilcoxon test fails to reveal a significant distinction between the proposed edRVFL-FIS-C model and BLS, dRVFL, and Fuzzy BLS models.However, it is crucial to highlight that despite the absence of clear statistical differences in certain comparisons, all proposed edRVFL-FIS models consistently achieve the lowest rank.This underscores their superior generalization performance across the evaluated models.

E. Experimental Results, Discussions, Statistical Analysis on NDC Dataset
Our experiments are conducted on big datasets generated using David Musicant's NDC Data Generator [37].This experiment provides a comprehensive understanding of the performance of the proposed edRVFL-FIS models across a varied range of data samples, spanning from 10, 000 to 1 million.Following the methodology outlined in [46], we set the regularization parameter C for all the models to a fixed value of 1 for experiments with NDC datasets.For example, NDC-10K, NDC-75K, and NDC-1M indicate that the dataset comprises 10, 000, 75, 000, and 1 million data samples, respectively.The results, presented in Table V, offer insights into the performance of the models on NDC datasets.Tables S.XI, S.XII, and S.XIII of supplementary materials present each model's standard deviation, rank, and best hyperparameter, respectively, on NDC datasets.Table S.III in the supplementary material presents the training times for all the proposed models.Our comparison excludes models with an average accuracy lower than 70% in the category-2 datasets, ensuring a focus on models that meet a certain performance threshold.
Upon analysis of Table V, it's evident that the proposed edRVFL-FIS-K model excels as the top performer, achieving the highest average accuracy at 97.9286%.Following suit, edRVFL-FIS-C and edRVFL-FIS-R secure the second and third positions with average accuracies of 97.6929% and 97.45%, respectively.This consistent trend is reaffirmed by their average ranks, where edRVFL-FIS-K, edRVFL-FIS-C, and edRVFL-FIS-R outshine all other models with ranks of 1.43, 2.21, and 3, respectively.Furthermore, regarding average standard deviation, the proposed models demonstrate their confidence, holding the first, third, and fourth positions.This compelling performance underscores the efficacy of our ensemble-based edRVFL-FIS models, particularly over extended feature layers.They adeptly leverage fuzzification and defuzzification features to significantly enhance generalization performance, especially on large datasets such as NDC.This establishes the proposed edRVFL models' superiority over competing models and highlights the reliability of our proposed approach in handling substantial and complex data sets.Furthermore, the Win-tie-loss outcomes for each individual proposed model against existing baselines are detailed in Section S.III and in Table S.I of the supplementary material.

V. DISCUSSION, SENSITIVITY ANALYSIS AND EROSION EXPERIMENT
In this section, firstly, we discuss some noteworthy remarks about the proposed models based on the above experiments and results.Then, we discuss the importance of hyperparameter setting in the proposed edRVFL-FIS model in the following manner: sensitivity analysis based on the number of fuzzy rules (in Section V-C) and the number of hidden layer selections (in Section S.IV of the supplementary material).In Section S.V of the supplementary material, we perform the erosion experiment by introducing the Gaussian noise at different levels-5%, 10%, 15%, and 20%-to corrupt the features of the datasets to assess the robustness of the proposed model.

A. Discussions Based on the Results on Category-1 Datasets
• The proposed edRVFL-FIS models excel in both average accuracy and average ranks on category-1 datasets.• All the proposed edRVFL-FIS models have the highest degree of confidence in terms of decision-making, as evidenced by their minimal standard deviation.

• Statistical analyses, including the Friedman and
Wilcoxon-signed-rank tests, affirm the proposed models' statistical superiority over existing baseline models.
• The findings suggest that the proposed model can be preferred over the state-of-the-art graph-based deep model, i.e., edEGERVFL, due to its superior performance in accuracy, ranking, and standard deviation.

B. Discussions Based on the Results on Category-2 Datasets
• The proposed edRVFL-FIS models excel in extracting nuanced information from the augmented feature space, achieved through concatenating original and fuzzy features.This underscores the effectiveness of their fuzzification and defuzzification processes.In contrast, the diminished performance of Fuzzy BLS highlights its limitations in learning fuzzy rules compared to the proposed edRVFL-FIS models.• The RVFL family, including existing and proposed models, outperforms the ELM and BLS family-based models in terms of average accuracy.This trend emphasizes the superior generalization performance of RVFL models over other RNN counterparts on category-2 datasets.

C. Number of Fuzzy Rules (Clusters) V/S Performance of edRVFL-FIS Models
To examine the impact of fuzzy rules (clusters) on the performance of the proposed edRVFL-FIS models with varying sample sizes in the datasets, we conducted experiments by varying the number of fuzzy rules (K).The resulting accuracy of the edRVFL-FIS models is investigated and visualized in Figure 2 w.r.t. the NDC datasets of size 10K, 100K, and 1M.The noteworthy analyses are as follows: 1) edRVFL-FIS-K: The accuracy of edRVFL-FIS-K shows a notable increase with the rise in fuzzy rules.Notably, tuning the fuzzy rules for edRVFL-FIS-K around or beyond 35 tends to yield optimal performance.2) edRVFL-FIS-C: The optimal performance range for edRVFL-FIS-C is generally observed between fuzzy rules 15 to 35.Fine-tuning in the neighborhood of 25 tends to yield the highest accuracy.3) edRVFL-FIS-R: The performance pattern for edRVFL-FIS-R exhibits variations.For NDC-10K datasets, the optimal performance is at K = 35; for NDC-100K, it peaks at K = 5; for NDC-1M, the best performance is at K = 15.The mixed patterns in edRVFL-FIS-R highlight the dataset-dependent nature of optimal fuzzy rule selection.Observations reveal discernible patterns in the performance of the proposed edRVFL-FIS-K and edRVFL-FIS-C models based on fuzzy rule variations.However, the performance varies across datasets and applications in accordance with the No Free Lunch Theorem [47].Hence, a tailored approach to tuning fuzzy rules is recommended for optimizing the generalization performance of the proposed edRVFL-FIS models.

VI. CONCLUSION AND FUTURE WORK
We proposed the edRVFL-FIS model, a result of the seamless adaptation of FIS systems with edRVFL, meticulously crafted through a rigorous mathematical framework.The edRVFL-FIS model adeptly harnesses the synergy of deep learning, ensemble learning, and fuzzy logic, empowering the ensemble model to extract rich features effectively and enhance the learning process.The proposed edRVFL-FIS model integrates two significant feature components in each base model: unsupervised fuzzified features and supervised defuzzified features.Utilizing diverse clustering methods (R-means, K-means, Fuzzy C-means), it establishes fuzzy layer rules, resulting in three model variations (edRVFL-FIS-R, edRVFL-FIS-K, edRVFL-FIS-C) with unique fuzzified features.In addition, the ensemble structure utilizes diverse supervised defuzzified features for each base model, which improves the depth of feature extraction when making final predictions.The efficacy of the novel edRVFL-FIS models is showcased through extensive experimentation across multiple UCI datasets that cover a wide range of domains and sizes.
Furthermore, we conducted experiments on big datasets using the NDC dataset.This experiment offers a comprehensive insight into the performance of the proposed edRVFL-FIS models across a diverse spectrum of data samples, ranging from 10, 000 to 1 million.The empirical results demonstrate that the proposed edRVFL-FIS adaptably leverages fuzzification and defuzzification features to enhance generalization performance on large datasets.Additionally, we explored the influence of varying numbers of fuzzy rules (clusters) and hidden layers on the performance of the proposed edRVFL-FIS model.Moreover, we analyze the robustness of the proposed edRVFL-FIS models.
Within this study, we formulated an enhanced feature space incorporating both original and fuzzy features.Consequently, the presence of redundant features may contribute to increased computational complexity in the proposed models.This issue necessitates further discussion in the future.Moreover, we aim to advance our research by extending this work to regression problems and their application to biomedical domains such as brain age prediction.The source codes of the proposed models are available at https://github.com/mtanveer1/edRVFL-FIS.

c
.) t is the transpose operator.L denotes the number of hidden layers, and g denotes the number of hidden layer nodes.⊗ and ⊕ denote the Kronecker product and the Concatenation operator, respectively and are defined below.For C ∈ R r×s , D ∈ R t×u and E ∈ R t×v , then D ⊕ E = [D E] ∈ R t×(u+v) and C r1 D . . .c rs D    ∈ R rt×su .
and λ kc is the parameter used in THEN part of the fuzzy inference system used in the proposed edRVFL-FIS model.λ kc 's are the unknown parameters and are calculated along with the output layer weights of the edRVFL-FIS model in the subsection III-D.

TABLE II :
Wilcoxon-signed-rank test of the proposed edRVFL-FIS models w.r.t.baseline models on Category-1 UCI datasets.† represents the proposed models.
The boldface in each row denotes the performance of the best model corresponding to the datasets.† represents the proposed models.

TABLE IV :
Wilcoxon-signed-rank test of the proposed edRVFL-FIS models w.r.t.baseline models on Category-2 UCI datasets.† represents the proposed models.