uEFS: An efficient and comprehensive ensemble-based feature selection methodology to select informative features

Feature selection is considered to be one of the most critical methods for choosing appropriate features from a larger set of items. This task requires two basic steps: ranking and filtering. Of these, the former necessitates the ranking of all features, while the latter involves filtering out all irrelevant features based on some threshold value. In this regard, several feature selection methods with well-documented capabilities and limitations have already been proposed. Similarly, feature ranking is also nontrivial, as it requires the designation of an optimal cutoff value so as to properly select important features from a list of candidate features. However, the availability of a comprehensive feature ranking and a filtering approach, which alleviates the existing limitations and provides an efficient mechanism for achieving optimal results, is a major problem. Keeping in view these facts, we present an efficient and comprehensive univariate ensemble-based feature selection (uEFS) methodology to select informative features from an input dataset. For the uEFS methodology, we first propose a unified features scoring (UFS) algorithm to generate a final ranked list of features following a comprehensive evaluation of a feature set. For defining cutoff points to remove irrelevant features, we subsequently present a threshold value selection (TVS) algorithm to select a subset of features that are deemed important for the classifier construction. The uEFS methodology is evaluated using standard benchmark datasets. The extensive experimental results show that our proposed uEFS methodology provides competitive accuracy and achieved (1) on average around a 7% increase in f-measure, and (2) on average around a 5% increase in predictive accuracy as compared with state-of-the-art methods.


Introduction
In the domain of data mining and machine learning, one of the most critical problems is the task of feature selection (FS), which pertains to the complexity of the appropriate choosing of features from a larger set of such [1]. FS performs a key role in the (so-called) process of PLOS  without (1) using any learning algorithm, (2) high computational costs, and (3) the existence of any individual statistical biases of state-of-the-art, feature-ranking methods. The current version of the UFS has been plugged into a recently developed tool named the data-driven knowledge acquisition tool (DDKAT) [19] to assist the domain expert in selecting important features for the data preprocessing task. The DDKAT supports an end-to-end knowledge engineering process for generating production rules from a dataset [19]. The current version of the UFS code and its documentation are freely available and can be downloaded from the GitHub open source platform [20,21]. Similarly, the TVS provides an empirical algorithm to specify a minimum threshold value for retaining important features irrespective of the characteristics of the dataset. It selects a subset of features that are deemed important for the classifier construction. The motivation behind the uEFS is to design and develop an efficient FS methodology for evaluating a feature subset through different angles and to produce a useful reduced feature set. In order to accomplish this aim, this study was undertaken with the following objectives: (1) to design a comprehensive and flexible feature-ranking algorithm to compute the ranks without (a) using any learning algorithm; (b) high computational costs; and (c) any individual statistical biases of state-of-the-art, feature-ranking methods and (2) to identify an appropriate cutoff value for the threshold to select a subset of features irrespective of the characteristics of the dataset with reasonable predictive accuracy.
The key contributions of this research are as follows: 1. The presentation of a flexible approach, called UFS for incorporating state-of-the-art univariate filter measures for feature-ranking 2. The proposal of an efficient approach, called TVS, for selecting a cutoff value for the threshold in order to select a subset of features 3. The demonstration of a proof-of-concept for the aforementioned techniques, after performing extensive experimentation which achieved (1) on average a 7% increase in the fmeasure as compared with the baseline approach, and (2) on average a 5% increase in predictive accuracy as compared with state-of-the-art methods.

Related works
This section briefly describes various existing studies related to the FS methodologies to filter out the irrelevant features. This study focused on presenting a comprehensive and flexible FS methodology based on an ensemble of univariate filter measures for the classifier construction.
The following includes some relevant FS studies, which contain research surveys and ensemble-based approaches for ranking of features as well as identifying a cutoff value for the threshold in the domain of FS. Lastly, the overall perspectives of literature reviewed are presented. A review of applied FS methods for microarray datasets was performed by Bolón et al. [22]. Microarray data classification is a difficult task due to its high dimension and small sample sizes. Therefore, FS is considered the de facto standard in this area [22]. Belanche and Gonzalez [7] studied the performance of different existing FS algorithms. A scoring measure was also=introduced to score the output of FS methods, which was assumed as an optimal solution. To automate the FS, Liu and Yu [23] proposed a framework, which provided an important infrastructure to integrate different FS methods based on their common traits. Chen et al. [24] performed a survey on FS algorithms for an intrusion detection system. Experiments were performed for different FS methods i.e., filter, wrapper, and hybrid. Since the present study was not focused on comprehensible classifiers, it did not study the effects of FS algorithms on the comprehensibility of a classifier. In addition to this, no unifying methodology was proposed that was capable of categorizing existing FS methods based on their common characteristics or their effects on classifiers.
Regarding ensemble-based, feature ranking studies, Rokach et al. [9] and Jong et al. [10] examined the available ensemble-based, feature-ranking approaches to show the improvement in steadiness of FS. Similarly, Slavkov et al. [11] investigated numerous aggregation approaches of feature ranking and observed that aggregating feature rankings produced better results as compared with using the single feature-ranking method. In addition, Prati [8] also obtained better results using an ensemble feature-ranking approach. In the literature, a hybrid approach by combining the filter and wrapper methods was also presented that is able to eliminate unwanted features by employing a ranking technique [25]. A similar concept to an EFS approach has also been mentioned previously [2,26]. For ensemble feature ranking, two aggregate functions called arithmetic mean and arithmetic median, respectively, were used to rank features [27]. Authors obtained the ranking by arranging the features from the lowest to the highest. Investigators assigned rank 1 to a feature with the lowest feature index and rank M to a feature with the highest feature index [27]. Similarly, other researchers aggregated several feature rankings to demonstrate the robustness of ensemble feature ranking that surges with the ensemble size [10]. Onan and Korukoğlu [12] presented an ensemble-based FS approach, wherein different ranking lists obtained from various FS methods were aggregated. They used a genetic algorithm to produce an aggregate-ranked list, which is a relatively more expensive technique than a weighted aggregate technique. The authors performed experiments of binary class problems, and it was not clear how the proposed method would deal with more complex datasets. Popular filter methods used for the ensemble-based FS approach include IG, gain ratio, chi-squared, symmetric uncertainty, one rule (OneR), and ReliefF. Most of the FS methodologies use three or more of the aforementioned methods for performing FS [1,8,15,18,27,28].
With respect to identifying an appropriate cutoff value for the threshold, Sadeghi and Beigy [29] proposed a heterogeneous ensemble-based methodology for feature ranking. These authors used the genetic algorithm to determine the threshold value; however, a θ value is required to start the process. Moreover, the user is given an additional task of defining the notion of relevancy and redundancy of a feature. Osanaiye et al. [18] combined the output of various filter methods; however, a fixed threshold value i.e. one-third of a feature set, is defined a priori, irrespective of the characteristics of the dataset. Sarkar et al. [15] proposed a technique that aggregates the consensus properties of IG, chi-squared, and symmetric uncertainty FS methods to develop an optimal solution; however, this technique is not comprehensive enough to provide a final subset of features. Hence, a domain expert would still need to make an educated guess regarding the final subset. For defining cutoff points to remove irrelevant features, a separated validation set and artificially generated features approaches can be used [8], though it is not clear how to find the threshold for the features' ranking [17,18]. Finding an optimal cutoff value to use in selecting important features from different datasets is problematic [17].
Taking into consideration the aforementioned discussion, a significant amount of research [7-12, 15, 18, 24, 29] has focused on proposing improved FS methodologies; however, not so much consideration has been paid regarding selecting features from a given feature set in a comprehensive manner. These methodologies either used relatively more expensive techniques to select features or required an educated guess to specify a minimum threshold value for retaining important features.

Materials and methods
This section first explains the process of uEFS methodology. Second, the UFS algorithm is explained through algorithms. Third, the TVS algorithm is presented and, lastly, the uEFS: An ensemble-based feature selection methodology statistical measures, used for evaluating the performance of the proposed uEFS methodology, are explained.

Univariate ensemble-based features selection methodology
In the FS process, normally, two steps are required [17]. In the first step, features are typically ranked, whereas, in the second step, a cutoff point is defined to select important features and to filter out the irrelevant features for building more robust machine learning models. In this regard, the proposed UFS algorithm [19] covers the first step of FS, while the TVS algorithm covers the second step.

Unified features scoring
UFS is an innovative feature ranking algorithm that tries to unify various filter-based methods [19] for the purpose of obtaining the final-ranked list of features. In this algorithm, univariate filter measures are employed to assess the usefulness of a selected feature subset in a multidimensional manner. These measures are better suited to high-dimensional datasets and provide better generalization [4,13]. The UFS algorithm uses the ensemble FS (EFS) approach, which has been examined recently by some researchers [2,26]. The EFS, an concept of ensemble learning, obtains a ranked list of features by incorporating the outcomes of different featureranking techniques [1,27]. Generally, the intention of the EFS approach is to give an improved estimation to the most favorable subset of features for improving classification performance [2,27,30,31]. As mentioned elsewhere [27], fewer studies have focused on the EFS approach uEFS: An ensemble-based feature selection methodology to enrich the FS itself. Although ensemble-based methodologies have additional computational costs, these costs are affordable due to offering an advisable framework [32]. As discussed previously [27], there are three types of filter approaches: ranking, subset evaluation, and a new FS framework that decouples the redundancy analysis from relevance analysis. The UFS uses a ranking approach, as it is considered an attractive approach due to its simplicity, scalability, and good empirical success [27,33]. Feature ranking measures the relevancy of the features (i.e., independent attributes) by their correlations to the class (i.e., dependent attribute) and ranks independent attributes according to their degrees of relevance [1]. These values may reveal different relative scales. To neutralize the effect of different relative scales, the UFS rescales the values to the same range (i.e., between 0 and 1) to make it scale-insensitive. For rescaling, the UFS allocates rank 1 to a feature with the highest feature index, as opposed to research that has been done previously [27], which assigned rank 0 to a feature having the topmost feature index. Following that, the UFS orders all scaled ranks in an ascending order and then aggregates them, as it is considered to be an effective technique [8]. The ordered-based, ranking-aggregation method combines the base rankings and considers only the ranks for ordering the attributes [8]. Finally, the UFS computes a mean value to compute weights and priorities of each feature.
UFS is described through Algorithm 1, which takes a dataset (i.e., D) as input and computes the ranks (scores) of the features after passing through key steps of the algorithm. UFS depends on n univariate filter-based measures, where the key rationale for n filter measures is to evaluate a feature through different considerations. In Algorithm 1, the first step is to compute the number of features from a given dataset. Then, in the second step, each feature in a dataset can be ranked using n number of univariate filter-based measures, as shown in Line 4 to Line 7 of Algorithm 1. After that, Algorithm 2 was used to scale (normalize) all computed ranks using the first filter measure. This step was repeated for the remaining (n − 1) measures as well as shown in Line 9 to Line 12. After the evaluation and scaling process, ranks aggregations were performed, as shown in Line 18 of Algorithm 1. Later, the comprehensive score as well as the weightage of each feature were computed, as shown in Line 25 and Line 26 of Algorithm 1. Finally, based on the contribution (i.e., individual measure score and relative weightage), a priority value of each feature was computed. This priority value of a feature was further utilized for ranking and feature subset selection.
For the proof-of-concept, five univariate filter-based measures-namely, IG, gain ratio, symmetric uncertainty, chi-squared, and significance [1,8,19,27,28]-were used to explain the process of the proposed unified features scoring algorithm. The reasons for selecting these five measures are described elsewhere [19]. Using these five filter measures, the process of the UFS is depicted in Fig 2. This process is also explained through an example.

Threshold value selection
The process of FS starts once features are ranked. In order to select a subset of features, the TVS algorithm is introduced, which provides an empirical approach of specifying a minimum threshold value. Those attributes that score less than the minimum threshold value can be discarded for building more robust machine learning models. The proposed algorithm is implemented in Java language using WEKA API. uEFS: An ensemble-based feature selection methodology TVS is explained through Algorithm 3. This algorithm takes n datasets (i.e., D) and m classifiers (i.e., C) as input and sequentially passes them through mandatory steps of the algorithm to find the cutoff value from a predictive accuracy graph. In Algorithm 3, first consider the n number of benchmark datasets having varying complexities. After that, compute the feature ranks using a ranker search mechanism and then sort them in an ascending order, as shown in Line 3 and Line 4 of Algorithm 3. Then, partition each dataset into different chunks (filtered datasets) from 100% to 5% features retained. Once filtered datasets are created, then consider m number of classifiers from various classifiers categories/families having varying characteristics (where m ( n) and feed each filtered dataset to these classifiers as shown in Line 6 and Line 11 of Algorithm 3. Following this, record predictive accuracies of these classifiers to each chunk of dataset partitioning using 10-fold cross validation approach (Line 12). Later, compute the average predictive accuracy of all classifiers as For the proof-of-concept, eight datasets of varying complexities were used to explain the process of the proposed threshold selection algorithm. The process of threshold value selection is depicted in Fig 3. As depicted in Fig 3, each dataset (Cylinder-bands, Diabetes, Letter, Sonar, Waveform, Vehicle, Glass, Arrhythmia) was fed to the IG filter measure for computing attributes' ranks. Then, all measured ranks of attributes of each dataset were sorted in ascending order. Afterwards, each dataset was partitioned into different chunks (filtered datasets) from 100% to 5% features retained, e.g., in case of an 80% chunk, the dataset retains nearly 80% of the highly ranked features, while 20% of the features, which are below the rank, are discarded. Each filtered dataset was fed to five well-known classifiers from various classifier categories/families having varying characteristics [e.g., naive Bayes from the Bayes category, J48 from the Trees category, k-nearest neighbors (kNN) from the Lazy category, JRip from the Rules category, and support vector machine (SVM) from the Functions category] and, using a 10-fold crossvalidation approach [8], predictive accuracies of these classifiers were recorded to each chunk of dataset partitioning, as illustrated in Table 1. Finally, an average predictive accuracy of all classifiers as well as the datasets against each chunk of dataset partitioning were computed. The main intuition of this process is to identify an appropriate chunk value that provides reasonable predictive accuracy and considerably reduces the dataset as well. Through empirical evaluation, it was found that a 45% chunk provided a reasonable threshold value of feature subset selection (Fig 4).

Algorithm 3: TVS (D, C)
State-of-the-art feature selection methods for comparing the performance of the proposed univariate ensemble-based feature selection methodology. In this study, both single-FS methods-namely, IG, gain ratio, symmetric uncertainty, chi-squared, significance, OneR, Relief, ReliefF, and decision rule-based FS (DRB-FS) -and ensemble-based FS methods such as gain-ratio-chi-squared (GR-χ 2 ), the Borda method, and ensemble-based multifilter FS (EMFFS) method were used as state-of-the-art FS methods for comparing the performance of the proposed uEFS methodology [1,8,15,18,19,27,28]. Each of the FS methods is defined as follows: IG is an information theoretic as well as a symmetric measure and is one of the popular measures for FS. It is calculated based on a feature's contribution in enhancing information  about the target class label. An equation for IG is given as follows [14]: where IG(A) is the IG of an independent feature or attribute A, Info(D) is the entropy of the entire dataset, and Info A (D) is the conditional entropy of attribute A over D.  Gain ratio is considered to be one of the disparity measures that provides normalized score to enhance the IG result. This measure utilizes the split information value that is given as follows [14]: where SplitInfo represents the structure of v partitions. Finally, gain ratio is defined as follows [14]: Chi-squared is a statistic measure that computes the association between the attribute A and its class or category C i . It helps to measure the independence of an attribute from its class. It is defined as follows [14]: where F 1 , F 1 , F 3 , and F 4 represent the frequencies of occurrence of both A and C i , A without C i , C i without A, and neither C i nor A, respectively, while N represents the total number of attributes. A zero value of CHI indicates that both C i and A are independent. Symmetric uncertainty is an information theoretic measure to assess the rating of constructed solutions. It is a symmetric measure and is expressed by the following equation [34]: where IG(A|B) represents the IG computed by an independent attribute A and the class-attribute B. While H(A) and H(B) represent the entropies of the attributes A and B.
Significance is a real-valued, two-way function used to assess the worth of an attribute with respect to a class attribute [35]. The significance of an attribute A i is denoted by σ(A i ), which is computed by the following equation: where AE(A i ) represents the cumulative effect of all possible attribute-to-class associations of an attribute A i , which are computed as follows: where k represents the different values of the attribute A i . Similarly, CE(A i ) captures the effect of change of an attribute value by the changing of a class decision and represents the association between the attribute A i and various class decisions, which is computed as follows: where m represents the number of classes and + (A i ) depicts the class-to-attribute association of the attribute A i . OneR is the rule-based method to generate a set of rules, which test one particular attribute. The details of this method can be found elsewhere [36].
Relief [37] and ReliefF [38] are distance-based methods to estimate the weightage of a feature. The original Relief method deals with discrete and continuous attributes; it does not support attempts to deal with incomplete data and is limited to application in two-class problems. ReliefF is an extension of the Relief method that covers the limitations of the Relief method. The details of these methods can be found elsewhere [37,38].
DRB-FS is a statistical measure to eliminate all irrelevant and redundant features. It allows one to integrate domain-specific definitions of feature relevance, which are based on high, medium, and low correlations that are measured using Pearson's correlation coefficient, which is computed as follows [29,39]: where " x and " y represent the sample means and S X and S Y are the sample standard deviations for the features X and Y, respectively. Here, n represents the sample size.
GR-χ 2 is an ensemble ranking method that simply adds together the computed ranks of the gain ratio and chi-squared methods [29].
The Borda method is a position-based, ensemble-scoring mechanism that aggregates ranking results of features from multiple FS techniques [15]. The final rank of a feature is computed as follows: where n represents the total number of FS techniques and pos(i, j) is the j th position of a feature ranked by the i th FS technique. EMFFS is an ensemble FS method that combines the output of four filter methodsnamely, IG, gain ratio, chi-squared, and ReliefF-in order to obtain an optimum selection [18].
Statistical measures for evaluating the performance of the proposed univariate ensemble-based feature selection methodology. In this study, precision, recall, f-measure, and the percentage of correct classification were used as evaluation criteria for FS accuracy [8,12,15,18,29,40]; second for processing speed; and third as part of a 10-fold cross-validation technique for computing predictive accuracy to evaluate the performance of machine learning methods or schemes [8,12,18,[41][42][43].
In order to compute the statistical measures (i.e., precision, recall, f-measure, and the percentage of correct classification), the following four measures were required: • True positives (TP) represents the correctly predicted positive values (actual class = yes, predicted class = yes) • True negatives (TN) represents the correctly predicted negative values (actual class = no, predicted class = no) • False positives (FP) represents a contradiction between the actual and predicted classes (actual class = no, predicted class = yes) uEFS: An ensemble-based feature selection methodology • False negatives (FN) represents a different contradiction between the actual and predicted classes (actual class = yes, predicted class = no) Joshi [44] defined these measures as follows: "Accuracy is a ratio of correctly predicted observations to the total observations," which is computed as follows: "Precision is the ratio of correctly predicted positive observations to the total predicted positive observations," which is computed as follows: "Recall is the ratio of correctly predicted positive observations to all observations in the actual class-yes," which is computed as follows: "F-measure is the weighted average of Precision and Recall," which is computed as follows:

Experimental results of the threshold value selection algorithm
This section demonstrates the results of the proposed TVS algorithm. The purpose is to interpret as well as comment on the results obtained from the experiments. Table 1 presents the predictive accuracies of eight datasets (i.e., Cylinder-bands, Diabetes, Letter, Sonar, Waveform, Vehicle, Glass, and Arrhythmia) against five classifiers (naive Bayes, J48, kNN, JRip, and SVM) with varying threshold values from 100 to 5. In this table, predictive accuracies are recorded as percentages, which were determined by the 10-fold cross-validation technique, whereas, each threshold value represents the percentage of features retained. After recording the predictive accuracies, the average predictive accuracy of all classifiers as well as datasets against each threshold value was computed, which is shown in Fig 4. This figure depicts the summarized effects of different threshold values on the predictive accuracy of the datasets noted in Table 1.
Furthermore, predictive accuracies using training examples of the aforementioned eight datasets were also recorded against the same five classifiers with varying threshold values from 100 to 5. After recording the predictive accuracies, again, an average predictive accuracy of all classifiers as well as datasets against each threshold value was computed, which is shown in Fig 5. It can be observed from Figs 4 and 5 that the average predictive accuracy remained consistent from the 100% feature set retained (i.e., no FS) to 45% features retained. After reducing the dataset from 45% retained features to 5% retained features, the predictive accuracy started to decline as well. Therefore, a threshold value of 45 was selected and the top 55% features were chosen. This chunked value (i.e. 45%) was utilized in experimentation for evaluating the uEFS methodology, which provided the best results. This value can also be used to cut off the irrelevant data in future datasets, as this value is also comparable to values obtained in other studies, for example 40% [12,29] and 50% [45].

Evaluation of the univariate ensemble-based feature selection methodology
The evaluation phase of any methodology has a key role in investigating the worth of any proposed method. This section covers the experimental setup as well as execution to evaluate the proposed uEFS methodology with state-of-the-art FS methods. The purpose was to check the impact of the proposed methodology on FS suitability in terms of features' ranking according to the precision, recall, f-measure, and predictive accuracy performance measure factors.

Experimental setup
For holistic understanding, two studies were performed to evaluate the uEFS methodology by involving nontext and text benchmark datasets. In each study, the methodology was compared with the state-of-the-art FS methods using precision, recall, f-measure, and predictive accuracy performance measure factors. The motivation behind comparing the results achieved with the text and nontext datasets was to check the scalability of the proposed uEFS methodology from small-to high-dimensional data, where dimension represents the number of attributes or features.
For the first study, eight nontext benchmark datasets of varying complexity (i.e., small to medium size and binary to multiclass problems), were chosen, including Cylinder-bands, Diabetes, Letter, Sonar, Waveform, Vehicle, Glass, and Arrhythmia, as shown in the Table 2. These datasets were collected from the openML repository available at http://www.openml.org/.
For the second study, the following four text datasets of varying complexity were selected: MiniNewsGroups (http://kdd.ics.uci.edu/databases/20newsgroups/20newsgroups.html), Course-Cotrain (http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-51/www/co-training/ data/course-cotrain-data.tar.gz), Trec05p-1 (https://plg.uwaterloo.ca/gvcormac/treccorpus/), and SpamAssassin (http://csmining.org/index.php/spam-assassin-datasets.html). These datasets are in text form and, to apply the feature-ranking algorithms on these datasets, there is a need to preprocess the text data into a structured form. In order to perform text preprocessing, the following tasks were completed: 6. Eliminate the low-length terms whose length is less than or equal to 2 7. Finally, generate the feature vectors representing document instances by computing the Term Frequency-Inverse Document Frequency weights. Table 3 shows the characteristics of the structured form of the text datasets. These datasets also have varying complexity (i.e., small to medium size and binary to multiclass problems).
To select a suitable classifier for assessing the proposed uEFS methodology, initially, five well-known classifiers were used: naive Bayes, J48, kNN, JRip, and SVM [8,12,15,18,29,40,45,46]. Using each classifier, predictive accuracy was measured with a varying percentage of features retained values from 100 to 5, as illustrated in Fig 6. The pictorial results show that, of the five classifiers, SVM and kNN tended to perform best with regard to the above-mentioned datasets. Fig 6 shows the four datasets-namely Cylinder-bands, Diabetes, Waveform, and Arrhythmia-on which SVM performed better. Likewise, Fig 6 shows the three datasets (Letter, Sonar, and Glass) on which kNN performed best. In recent years, the SVM classifier has been considered as a dominant tool for dealing with classification problems in a wide range of applications [45] and is largely preferred over other classification methods [46].
Keeping in view with the Fig 6 results and state-of-the-art classifier considerations, finally, the SVM classifier was used to assess the proposed uEFS methodology, as it tends to outperform the F-measures and predictive accuracies for the benchmark datasets [29,45]. Further, the SMOreg function (SVM with sequential minimum optimization) of the SVM classifier was used, which is an improved version of the SVM [47]. Table 4 shows the parameters of the selected classifier. For comparison purposes, a standard open-source implementation of this classifier was utilized as provided by the Waikato Environment for Knowledge Analysis (WEKA) available at http://weka.sourceforge.net/doc.dev/. Using open-source implementation, a method in Java language was written, which computes precision, recall, f-measure, and predictive accuracy of this classifier using the 10-fold cross-validation technique.
Finally, to compare the computational cost, the performance speed of the proposed methodology as well as state-of-the-art methods were measured on a system having the following specifications:

Experimental execution
For the first study, a comparison was made between the proposed uEFS methodology and the aforementioned five univariate filter measures, which were used for the proof-of-concept.   For comparison purposes, computed precision and recalls were also used, as recorded in Tables 5 and 6. The results of these two tables also reveal that the proposed methodology provides better results. The proposed uEFS methodology yields significant precision and recall on all nontext datasets except Glass against all existing feature selection measures. On recall comparison, the closest competitors to the uEFS methodology were IG, gain ratio, and symmetrical uncertainty measures, which achieved a similar recall of 0.869 with the Waveform dataset. Regarding the other datasets, the existing measures achieved a much lower recall as compared with the uEFS. Similarly, with respect to the precision comparison, the chi-squared and symmetrical uncertainty remained the closest competitors to the uEFS for the Glass dataset. For the rest of the datasets, the uEFS outperformed the existing FS measures with a significant difference.
A comparison was also made between the predictive accuracies of the uEFS methodology and the five aforementioned univariate filter measures. Table 7 illustrates the comparison of  uEFS: An ensemble-based feature selection methodology the predictive accuracy of the uEFS methodology with the five FS measures that are used in the uEFS methodology. It can be observed from the Table 7 results that the proposed methodology provides competitive results as compared with existing FS measures. Similarly, it can also be seen from the results shown in Fig 7 and Tables 5, 6, and 7, respectively, that, in terms of fmeasure, precision, recall, and predictive accuracy, the proposed methodology did not perform better than existing FS measures on the Glass dataset due to having a small size of data, multiple classes, and imbalanced class characteristics. The result of one-sample t-test and paired-samples t-test is also illustrated in Table 7. The purpose of performing this test was to determine whether the values obtained from the proposed uEFS methodology were significantly different from the values obtained from existing • H 0 : 81.11 = " x ("the mean predictive accuracy of the sample " x is equal to 81.11") • H 1 : 81.11 6 ¼ " x ("the mean predictive accuracy of the sample " x is not equal to 81.11") In this case, the mean FS measures score for the Cylinder-bands dataset (M = 80.22, SD = 0.28) was lower than the normal uEFS score of 81.11, with a statistically significant mean difference of 0.89 (95% confidence interval: 0.54-1.23, t(4) = −7.141, p = .002). Since p < .05, we rejected H 0 due to mean predictive accuracy of sample " x is equal to 81.11 and concluded that the mean predictive accuracy of the sample is significantly different from the existing methodologies' results. It can be observed from Table 7 that most of the significance (i.e. p) values are less than 0.05 (i.e. p < .05), which shows that the proposed uEFS methodology results are statistically significantly different from the results of existing methodologies.
Similarly, the paired-samples t-test was also performed, to analyze the significance of the proposed methodology. Table 8 reports the paired-samples t-test results. It can be observed also from Table 8 that both of the significance (i.e. p) values (one-tailed and two-tailed) are less than 0.05 (i.e. p < .05), which shows that the proposed uEFS methodology results are statistically significantly different from existing methodologies result.
For evaluating the computation cost of the proposed FS methodology, the performance speed was also computed, as shown in Table 9. The results indicate that, on average, the proposed methodology takes 0.37 seconds more time than the state-of-the-art filter measures.
The proposed FS methodology was also compared with traditional well-known FS methods (i.e., OneR and ReliefF), as illustrated in Table 10. The results of Table 10 show that the proposed methodology provides competitive results as compared with existing FS methods.
Finally, for the first study, a comparison of the proposed uEFS methodology with the two state-of-the-art ensemble methods, namely Borda and EMFFS [15,18], was performed. A methodological comparison of these two methods with the proposed uEFS methodology is illustrated in Table 11. For the proof-of-concept as well as the aforementioned comparisons, five filter measures were used; however, to compare the proposed uEFS methodology with these two state-of-the-art ensemble methods, three [15] and four [18] filter measures defined in each state-of-the-art ensemble method, were used, respectively, as mentioned in Table 11. After applying the ensemble-based Borda and EMFFS methods, the predictive accuracy and F-measures of the proposed uEFS methodology, using three and four filter measures, respectively, were computed, as shown in Tables 12 and 13. The results of Tables 12 and 13 reveal that the proposed methodology provides better results as compared with the two state-ofthe-art ensemble methods [15,18]. It can be observed from the results shown in Tables 12 and 13 that, in terms of predictive accuracy and f-measure, the performance of the proposed methodology is the same as the state-of-the-art ensemble methods regarding the Letter dataset, while the proposed methodology did not perform better than the EMFFS method for the Arrhythmia dataset due to having a small size of data, multiple classes, and imbalanced class characteristics.
For the second study, a comparison of the proposed uEFS methodology with state-of-the-art FS methodologies was performed. The proposed methodology outperforms most of the existing algorithms and individual FS measures in terms of f-measure as well as predictive accuracy. uEFS: An ensemble-based feature selection methodology existing algorithms with a precision of 0.858. Similarly, the uEFS achieved an average of 0.669 precision for the Course-Cotrain data which is close enough to the Relief algorithm with a difference of 0.004, which achieved the highest precision against the existing algorithms. On the other hand, while comparing the average classifier recall, shown in Table 15, it was noticed that the proposed uEFS methodology outperforms all of the existing algorithms with a recall of 0.850 and 0.864 for the Trec05p-1 and SpamAssassin benchmarks, respectively.
It can also be observed from the results, shown in Tables 14 and 15 that, in terms of precision and recall, the proposed methodology did not perform better than the DRB-FS measure for some datasets due to considering only those measures in terms of proof-of-concept purposes, which measure only relevancy and ignore the feature redundancy factor. As the DRB-FS measure eliminates all irrelevant as well as redundant features and is also based on predefined domain-specific definitions of feature relevance [29,39], there is a chance that the DRB-FS can produce better results as compared with the proposed methodology. However, in terms of f-measure, which is the weighted average of precision and recall, overall, the proposed methodology performs better than the DRB-FS measure as shown in Fig 8. The uEFS methodology was evaluated rigorously with respect to text and nontext benchmark datasets having small-to high-dimensional data size and provides competitive results as compared with state-of-the-art FS methods, which indicates that our proposed ensemble approach is more robust across text and nontext datasets. The above-mentioned results also provide evidence that the uEFS methodology is stable towards producing a similar and most likely higher degree of predictive accuracy and f-measure value across a wide variety of datasets.

Conclusions and future directions
FS is an active area of research for the data mining and text mining research community. In this study, we introduce an efficient and comprehensive uEFS methodology to select informative features from a given dataset. For the uEFS methodology, we first proposed an innovative UFS algorithm to generate a final-ranked list of features without the use of any learning algorithm, high computational cost, and any individual statistical biases of state-of-the-art featureranking methods. For defining a cutoff point to remove irrelevant features, we then proposed uEFS: An ensemble-based feature selection methodology a TVS algorithm. An extensive experiment was performed to evaluate the uEFS methodology using standard benchmark datasets; the results show that the uEFS methodology provides competitive accuracy as compared with state-of-the-art methods. The proposed uEFS methodology contributes to FS, which is a key step in decision support systems. It can be utilized in real-world applications such as DDKAT [19] to assist the domain expert in selecting informative features for generating production rules from a dataset, or extracting relative information from open data for constructing reliable domain knowledge. The current version of the UFS code and its documentation are freely available and can be downloaded from the GitHub open-source platform [20,21]. Currently, the proposed methodology incorporates state-of-the-art univariate filter measures to consider the relevance aspect of feature ranking and ignores the features' redundancy aspect. In the future, we will extend our methodology for incorporating multivariate measures to consider the redundancy aspect of feature subset selection. Similarly, the proposed methodology does not evaluate the suitability of a measure or its precision. In order to consider that factor, we will also investigate the application of fuzzy logic for determining the cutoff threshold value in the future. Lastly, the proposed methodology was applied to text and nontext benchmark datasets to evaluate the model performance. In the future, we will experiment with our proposed uEFS methodology on other application domains such as microarray datasets to check the goodness on all applications. Above all, we also intend to integrate our proposed methodology into another research project, called Intelligent Medical Platform (IMP) available at http://imprc.cafe24.com/.