Correlation-Based Weight Adjusted Naive Bayes

Naive Bayes (NB) is an extremely simple and remarkably effective approach to classification learning, but its conditional independence assumption rarely holds true in real-world applications. Attribute weighting is known as a flexible model via assigning each attribute a different weight discriminatively to improve NB. Attribute weighting approaches can fall into two broad categories: filters and wrappers. Wrappers receive a bigger boost in terms of classification accuracy compared with filters, but the time complexity of wrappers is much higher than filters. In order to improve the time complexity of a wrapper, a filter can be used to optimize the initial weight of all attributes as a preprocessing step. So a hybrid attribute weighting approach is proposed in this paper, and the improved model is called correlation-based weight adjusted naive Bayes (CWANB). In CWANB, the correlation-based attribute weighting filter is used to initialize the attribute weights, and then each weight is optimized by the attribute weight adjustment wrapper where the objective function is designed based on dynamic adjustment of attribute weights. Extensive experimental results show that CWANB outperforms NB and some other existing state-of-the-art attribute weighting approaches in terms of the classification accuracy. Meanwhile, compared with the existing wrapper, the CWANB approach reduces the time complexity dramatically.


I. INTRODUCTION
Bayesian networks (BN) are often used for classification. However, the correlative studies confirm that learning the optimal BN structure from an arbitrary BN search space is an non-deterministic polynomial-time hard (NP-hard) problem [1]- [3].
Naive Bayes (NB) is the simplest form of Bayesian network classifiers [4]. However, its predictive performance can be competitive with state-of-the-art classifiers for data mining applications due to its simplicity, efficiency, and efficacy [5]- [7]. NB classifies a test instance using the assumption of independence for attributes that all attributes are independent given the class label [8]- [10]. If we assume that A 1 , A 2 , · · · , A m are m attributes, given a test instance x, which is represented by an attribute value vector < a 1 , a 2 , · · · , a m >, NB uses Equation 1 to predict the class The associate editor coordinating the review of this manuscript and approving it for publication was Senthil Kumar. label of a test instance x.
c(x) = arg max c∈C P(c) m j=1 P(a j |c), where a j is the value of the jth attribute A j , c(x) is the class label of the test instance x predicted by NB, C is the collection of all possible class labels c. NB is still one of the top 10 data mining algorithms, but it circumvents its predicament due to the limitation that its attribute independence assumption is often violated, so its probability estimates are often suboptimal [11]- [13]. A mass of enhancements to NB have been proposed to reduce inaccuracies that result from the conditional independence assumption [14], [15]. Such enhancements can fall into five categories: 1) instance selection, 2) instance weighting, 3) structure extension, 4) attribute selection, 5) attribute weighting. Among all of the five categories, attribute weighting is a flexible and effective method to reduce the effects of the conditional independence assumption via assigning each attribute a different weight discriminatively. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ The same as attribute selection approaches [16], [17], attribute weighting approaches can be broadly divided into filters and wrappers according to whether the target classifier is used to obtain the attribute weights. Filters use general characteristics of the data to calculate attribute weights as a preprocessing step. Attribute weights are directly computed before the target classifier is run, and each attribute weight is proportional to its predictive capability of the class. Rather than calculating the attribute weights based on measures of predictiveness in filters, wrappers are more profitable to pursue functions that optimize the attribute weights to improve the prediction performance of the weighted classifier as a whole. Wrappers are hypothesis driven, which use the prediction accuracy estimates obtained from the actual target classifier itself to evaluate attribute weights. So, in most cases, filters are faster to calculate weights compared to wrappers, but wrappers have better prediction performance than filters.
Attribute weighting approaches always improve the classification accuracy at the expense of the time complexity of the final model [18]- [21]. In order to improve the time complexity of wrappers, it will be interesting to study whether a better performance can be achieved by the hybrid attribute weighting method where a filter is used to optimize the initial weight of all attributes as a preprocessing step of a wrapper. To the best of our knowledge, essentially all the existing attribute weighting approaches are either filters or wrappers [22], [23]. Few studies have pay attention to the hybrid attribute weighting method which combines attribute weighting filter with attribute weighting wrapper. The hybrid attribute weighting method is a new paradigm of attribute weighting.
In this paper, we focus our attention on attribute weighting, and propose a new paradigm for the hybrid attribute weighting approach. We call the improved model correlationbased weight adjusted naive Bayes (CWANB). The CWANB approach can accelerate the searching speed, maintain the simplicity of the final model, and can also guarantee the high classification accuracy of the optimal arithmetic. Firstly, we use the correlation-based attribute weighting filter for NB to initialize the weight vector. Each weight is initialized to a sigmoid transformation of the difference between the attribute-class correlation and the average attributeattribute inter-correlation. Then in order to pursue the higher classification accuracy, we use the attribute weight adjustment wrapper which is based on the objective function to adjust the weight vector. Each weight is updated by the wrapper method to find the optimal weight vector. Extensive experiments are conducted to compare the CWANB approach with NB and its state-of-the-art attribute weighting approaches on a collection of 36 benchmark datasets in terms of the classification accuracy and the elapsed training time. The results show that the CWANB approach performs the best compared with its competitors in terms of the classification accuracy. Yet at the same time, compared with the existing wrapper, the CWANB approach is more efficient.
The work presented in this paper is an extension to our previously published conference paper [24]. Compared to our previous conference paper, the main contributions and changes can be briefly summarized as: 1) In our previous conference paper, we rapidly published the weight adjusted naive Bayes approach (WANB) which is a wrapper. In this paper, we reviewed the related work on attribute weighted NB and found that all the existing attribute weighting approaches are either filters or wrappers. 2) In this paper, the CWANB approach is a hybrid attribute weighting approach which combines an attribute weighting filter with an attribute weighting wrapper. Firstly, it uses the correlation-based attribute weighting filter to initialize the attribute weights. Then, the attribute weights are constantly adjusted and optimized by the attribute weight adjustment wrapper. It is a new paradigm of attribute weighting. 3) Although the basic idea of the attribute weight adjustment is the same as in the conference version, the optimal weight initialization technology in the CWANB approach is totally different from our previous work. Compared with the WANB approach which initializes each attribute weight to 1, the CWANB approach uses a correlation-based attribute weighting filter to initialize each attribute weight. So the results presented in this paper are all new.
The rest of the paper is organized as follows. Section II provides a formal description of the attribute weighted NB and a compact survey on the proposed attribute weighting approaches including filters and wrappers. Section III proposes the CWANB approach. Section IV describes the experimental setup and presents the experimental results in detail. Section V gives our conclusions and presents directions for future research.

II. RELATED WORK
Many approaches on attribute weighting have been proposed to improve NB by relaxing its conditional independence assumption [25], [26]. In real-world learning applications, different attributes play different roles for classification problems, so attribute weighting can potentially improve the classification performance via assigning different weights to different attributes. Attribute weighting aims to weight each attribute according to its predictive capability of the classification [27]. It increases weights of highly predictive attributes and discounts weights of little predictive attributes. Thus, each attribute weight w j is incorporated into the NB formula to represent the importance of jth attribute A j as Equation 2: where w j is the weight of the jth attribute A j . Now, the only left question is how to learn the attribute weights. In this section, we review the existing researches on attribute weighting and divide them into two categories: filters which calculate each single attribute weight as a preprocessing step; and wrappers which use the target classifier itself to obtain the attribute weights.

A. ATTRIBUTE WEIGHTING FILTERS
Filters can calculate the attribute weights fast and achieve the low time complexity. There has been a growing trend in the use of attribute weighting filters to alleviate the effects of violations of the conditional independence assumption. Filters are data driven, and the attribute weights are predetermined on heuristic measure of the data. They calculate a proper weight to each attribute to estimate how much important the attribute is, and place more emphasis on highly predictive attributes than those are less predictive. Filters often choose the information theoretic method to decide and calculate attribute weight values due to its strong theoretical background.
Zhang and Sheng proposed gain ratio-based attribute weighted naive Bayes (GRAWNB) [28]. The GRAWNB approach is a filter method by learning the attribute weights to produce accurate ranking where the gain ratio is used to calculate the weight of each attribute. It assumes that the gain ratio provides measurements of the attribute weights, and an attribute with a higher gain ratio should deserve a larger weight in a weighted NB and vice versa. The weight of each attribute is proportional to the information gain ratio.
Decision tree-based attribute weighted naive Bayes (DTAWNB) was proposed by Hall [25]. This method calculates the weight of each attribute by building unpruned decision trees. The weight assigned to each attribute is inversely related to its degree of dependency on other attributes. If an attribute was not appear in the constructed decision trees, the weight of it would be set to zero. The DTAWNB approach uses a bagging procedure to stabilize the estimated weights by constructing multiple unpruned decision trees, and then averages the weights across the ensemble.
Kullback-Leibler measure based attribute weighted naive Bayes (KLMAWNB) was proposed by Lee et al., which is a filter method that calculates the attribute weights to enhance NB using the Kullback-Leibler measure [29]. It calculates each attribute weight based on the amount of information that the attribute gives to the class variable. Meanwhile, the amount of information a certain attribute gives to the class variable is calculated by the Kullback-Leibler measure, which reflects the importance of the attribute.
Naive Bayes with deep attribute weighting (DAWNB) was proposed by Jiang et al., which is a filter method to improve NB with deep attribute weighting [27]. Correlation-based attribute selection is employed to select the best attribute subset [30]. The weights of selected attributes in the best attribute subset are set to 2; meanwhile, other unselected attribute weights are set to 1. Note that the learned attribute weights are not only incorporated into formula of NB, but also incorporated into conditional probabilities.
Jiang et al. proposed correlation-based attribute weighted naive Bayes for NB (CAWNB) [31]. The weight of each attribute is directly defined as the difference between the mutual relevance and the average mutual redundancy. The mutual relevance reflects the correlation between the attribute and the class; the average mutual redundancy reflects the redundancy among attributes. The mutual information is used to calculate the average relevance and the average redundancy.
It can be seen that all above attribute weighting filters weight attributes according to their predictive capabilities of the classification, and then calculate appropriate weights as a preprocessing step. The primary value of appropriate weights is their capacity to reduce the error that results from violations of the conditional independence assumption of NB.

B. ATTRIBUTE WEIGHTING WRAPPERS
Rather than calculating weights based on measures of predictiveness, wrappers are more profitable to implicitly take the inductive bias of the target classifier into account. Wrappers use some search algorithms to search the attribute weights first, and then evaluate the weights by employing the target classifier on the weighted NB. The classification accuracy of the target classifier is directly used as the evaluation criterion to adjust the attribute weights, so wrappers usually have the higher classification accuracy than filters. However, in order to find the optimal weight vector, wrappers have to construct the target classifier multiple times to evaluate the attribute weights. Consequently, wrappers often have high time complexity which is not suitable for real-time detection in the place of mass data.
Wu and Cai proposed differential evolution-based attribute weighted naive Bayes (DEAWNB) [32], where the differential evolution algorithm is used to search the optimal attribute weights [33]- [35]. The evolution computation process is a complicated dynamic evolvement process, and goes through three main steps: mutation, crossover and selection. First, a population of attribute weights are randomly initialized to be between 0 and 1. Then, mutation is performed over the initial weights, and a fitness function is used to determine whether the current weight vector obtained by the mutation can be replaced with a new one. The DEAWNB approach employs a greedy search method. If the fitness of the mutated individual performs better than the target individual, the mutated individual will be selected in the next iteration, and vice versa. DEAWNB is understandable where it has the capability of making estimation without knowledge of the attribute weights in contrast to pre-computed weights in filters. The experimental results of the DEAWNB approach show that its classification accuracy is much higher than other state-of-the-art weighted NB algorithms.
Zaidi et al. proposed an attribute weighted naive Bayes approach to alleviate the attribute independence assumption, which is called the WANBIA approach [36]. Their VOLUME 8, 2020 work confirms the observation that appropriate weights of weighted NB can reduce the error resulted from violations of the attribute independence assumption, and improve the precision and performance of classification efficiently. The WANBIA approach searches for the best attribute weights using gradient descent searching. The essence of this approach is that making the change of weights become little by gradient descent searching method and finally attaining the minimal error or the best accuracy. This approach evaluates the attribute weights in a discriminative objective function that is either the mean squared error (MSE) or the negative conditional log likelihood (CLL). When the MSE objective function is employed, the total function has minimum value and the corresponding attribute weights are the optimized values. On the contrary, if the CLL objective function is employed, the best attribute weights are obtained just above the target to maximise the CLL. Extensive evaluations show that the WANBIA approach receives a big boost in terms of classification accuracy.
It can be seen that attribute weighting wrappers can optimize the weights to improve the prediction performance. Wrappers can achieve higher classification accuracy than filters, but this always happens at the expense of the time complexity of algorithms. Wrappers not only need search algorithms to search the attribute weights but also need the objective function when evaluating search results. The time complexity of wrappers is always much higher than that of filters, but their classification performance is also always much better than that of filters.

III. CORRELATION-BASED WEIGHT ADJUSTED NAIVE BAYES
Filters calculate the attribute weights faster compared with wrappers, but filters show lower ability in predicting the classification accuracy than wrappers. Wrappers optimize the attribute weights to improve the forecasting precision of the attribute weighted classifier as a whole. They can receive a bigger boost in terms of classification accuracy compared with filters, but the time complexity of wrappers is much higher than filters. The time complexity of wrappers is inversely proportional to the convergence speed of the searching and evaluating.
To our knowledge, all the existing attribute weighting approaches are either filters or wrappers. It is generally known that the difficulty existing in wrappers is how to accelerate the convergence speed of the searching and evaluating, and the effective means is reducing the search region. In order to improve the time complexity of a wrapper, optimal weight initialization technology will be chosen to narrow down the range of attribute weights to search. Therefore, in order to improve the convergence speed, attribute weighting filter can be used to optimize the initial attribute weights as a preprocessing step of a wrapper. So the hybrid attribute weighting method is proposed in this paper. It is a new paradigm of attribute weighting which combines attribute weighting filter with attribute weighting wrapper.
In this paper, we propose a new paradigm for the hybrid attribute weighting approach, and the resulting model is called correlation-based weight adjusted naive Bayes (CWANB). In CWANB, the attribute weights are initialized by the correlation-based attribute weighting filter. Then optimal attribute weights are obtained by the wrapper which defines the multiplication factor set and designs objective function based on dynamic adjustment of attribute weights. Although the basic idea of the attribute weight adjustment is the same as in our conference version, the optimal weight initialization technology is totally different. For fairness, the base probabilities P(c) and P(a j |c) are also estimated using the m-estimate. The prior probability P(c) and the conditional probability P(a j |c) are computed by Equations 3 and 4, respectively: where n is the number of training instances, q is the number of classes, n j is the number of values of the jth attribute, c i is the class of the ith training instance, a ij is the jth attribute value of the ith training instance, and δ(•) is a binary function, which is one if its two parameters are identical and zero otherwise.

A. THE CORRELATION-BASED ATTRIBUTE WEIGHTING FILTER
In the hybrid attribute weighting method, attribute weighting filter for NB is used as an optimal weight initialization technology to narrow down the range of attribute weights to search in the next section. The range of attribute weights can be controlled within a reasonable scope by the initialization technology. The farther away from the optimal attribute weights of the target search, the search space is wider, search time is longer and training speed is slower. In our CWANB approach, the goal of the filter is to find an initial set of attribute weights which narrow down the search region within a reasonable scope. The correlation-based attribute weighting filter defines the weight of each attribute as a sigmoid transformation of the difference between the attribute-class relevance and the average attribute-attribute redundancy [31]. Motivated by their work, the correlation-based attribute weighting filter is used to initialize the attribute weights in our CWANB approach. How to initialize the attribute weights by the correlationbased attribute weighting filter is described as follows.
Firstly, the mutual information is used to measure the correlation between each pair of random discrete variables. The relevance between the attribute and the class and the redundancy between two different attributes are respectively defined as: 51380 VOLUME 8, 2020 where a j and a i represent the values that two different attributes A j and A i take, respectively. I (A j ; C) represents the relevance between the attribute A j and the class C. I (A j ; A i ) represents the redundancy between two different attributes A j and A i . Secondly, we normalize I (A j ; C) into NI (A j ; C) and normalize I (A j ; A i ) into NI (A j ; A i ), where the formulaes are given by Equation 7 and Equation 8, respectively.
where NI (A j ; C) and NI (A j ; A i ) are the normalized values representing mutual relevance and mutual redundancy.
Thirdly, the attribute weight is defined as the the difference between the relevance and the redundancy. The formula is defined by Equation 9.
where w j is the weight of the jth attribute A j . Fourthly, an attribute weight must be a non-negative continuous value, but the weight w j may be negative, so a standard logistic sigmoid function is used to transform w j to keep within a reasonable scope. Namely, we use a sigmoid function to transform w j into the final form w j as: Therefore, the final form w j representing the weight of the attribute A j is defined as: At this stage, the attribute weights are all initialized by the correlation-based attribute weighting filter. This process is not only designed to narrow down the range of the attribute weights to search, but also can be used to accelerate the converging speed in the next subsection.

B. THE ATTRIBUTE WEIGHT ADJUSTMENT WRAPPER
In order to further optimize initial attribute weights calculated by the correlation-based attribute weighting filter in the last subsection, we will present the details on the attribute weight adjustment wrapper in this subsection. We are trying to find the best set of attribute weights which maximize the classification accuracy, and make the attribute weight adjustment to weight attributes to see whether we improve the objective function in terms of the classification accuracy. There are two important problems in this subsection. The first problem is how to construct the objective function to evaluate the weight change. The second problem is how to propose possible changes to the attribute weights.
To facilitate the understanding, let us give some definitions. Given a training dataset TD = {x 1 , x 2 , · · · , x n }, where n is the number of instances. We use W =< w 1 , w 2 , · · · , w m > to denote the set of values for attribute weights, where w j is the weight of the jth attribute A j , m is the number of attributes. F is a multiplication factor set which is used in the search algorithm to search the attribute weights. The objective function of our CWANB approach is defined as obj(TD, W , f ) with the parameter f described later.
To solve the first problem, the objective function based on dynamic adjustment of weights is proposed in our CWANB approach. In every iteration, the attribute weight vector with updated attribute weights need to be evaluated. In order to traverse across all attribute weight vectors to find the best optimal attribute weight vector, an objective function should be introduced and a evaluation criteria should be established. Based on the improvement that the classification accuracy gets better performance, we design the objective function based on dynamic adjustment of attribute weights denoted as obj(TD, W , f ), where W is the updated attribute weight vector. f denotes the classification accuracy of the weighted NB with the updated weight vector W , which can be formulated as: where c(x i ) is the predicted class label of the ith instance, y i is the true label of the ith instance, and δ(•) is a two binary function, which is one if its two parameters are identical and zero otherwise.
To evaluate the improvement in obj(TD, W , f ) in terms of the classification accuracy, we set a threshold value T . In our CWANB approach, we set the value of T to 0.1%. To determine whether the updated attribute weight vector W can replace the current attribute weight vector W , we compare the goodness of individual f (W ) with f (W ), and also compare obj(TD, W , f ) with obj(TD, W , f ). When the performance improvement in obj(TD, W , f ) is bigger than the threshold value T , the current attribute weight vector W will be replaced by the updated attribute weight vector W in the current iteration, and vice versa. The attribute weights are updated continuously by iterative evaluation in each iteration, only the attribute weight vector with the largest classification accuracy improvement will be chosen in the last iteration. So the optimal set of attribute weights which maximize the classification accuracy can be found by the objective function in the last iteration.
To solve the second problem, we propose possible changes to the attribute weights by optimizing the value of one attribute weight at a time. For the purpose of ensuring a more diverse searching to find an optimal set of attribute weights, the search algorithm based on the multiplication factor is proposed in our CWANB approach. For each attribute weight, we multiply the current attribute weight with different multiplication factors in the multiplication factor set. To estimate the attribute weight for each change, we evaluate whether the change of the special attribute weight can improve the objective function. After the inner loop is over, the global best multiplication factor is found for a special attribute. And after the outer loop is over, the global best attribute weight is searched for a special attribute. The updated attribute weight can improve the classification accuracy of the weighted NB mostly in the current iteration. Then we record the updated attribute weight vector and continue to the next iteration.
Crucially, how to define the multiplication factor set F is a crucial issue. The number of calculation increases as members in the multiplication factor set increases, such that the time complexity increases. The diversity of searching decreases as members in the multiplication factor set decreases, such that effect is affected. The choice of defining finite discrete variables in the multiplication factor set is a crucial issue. In order to keep the search algorithm simple and efficient, concision in design of the multiplication factor set is one of the key factors which should be considered. So the total multiplication factor number k in the F is set to 5. Meanwhile, each multiplication factor is denoted as F k , and k ranges k = 1 to 5.
In addition, we assume that the multiplication factor set can consist of a collection of prime numbers, because any natural number can be decomposed into a product of several prime numbers. The use of prime numbers as multiplication factors ensures the diversity of searching for the attribute weights. An attribute weight must be a positive continuous value. When setting the multiplication factors, some factors should decrease and others increase. For some multiplication factors, they are less than one, which can decrease the attribute weights. Meanwhile, for other multiplication factors, they are prime numbers larger than one, which can increase the attribute weights. The number of multiplication factors is 5. In particular, we set only one multiplication factor to 0.1; it can decrease attribute weights. The other four multiplication factors are set to four prime numbers {2, 3, 5, 7}; they can increase the attribute weights. So in this paper, the multiplication factors can take values from the finite set {0.1, 2, 3, 5, 7}.
Here are the major steps of the attribute weight adjustment in each iteration: T -a threshold value Output: CWANB-Correlation-based Weight Adjusted Naive Bayes 1: Initialize all attribute weights by the correlation-based attribute weighting filter for NB 2: Learn an attribute weighted naive Bayes and use Equation 12 to evaluate its performance 3: while stopping condition is not satisfied do 4: for each attribute j = 1 to m do 5: for each multiplication factor k = 1 to 5 do 6: Set new weight of jth attribute by multiplying its current weight with kth multiplication factor: w j,new ← w j,old * F k 7: Learn an attribute weighted naive Bayes with new attribute weight values and use Equation 12 to evaluate its performance 8: Reset the weight of jth attribute :w j,new ← w j,old 9: end for 10: Determine the best multiplication factor for the jth attribute that improves the performance of Equation 12 mostly in the inner loop. 11: end for 12: Select the pth attribute that gives the best overall performance of Equation 12 in the outer loop. 13: Update the weight value of pth attribute to its new weight in the training dataset: w p,new ← w p,old * F k 14: if performance improvement is less than the threshold value T then 15: Exit the loop 16: end if 17: end while 18: Build an attribute weighted naive Bayes with updated weight vector W and return the built attribute weighted naive Bayes classifier.
can be described as Algorithm 1. From Algorithm 1, we can see that our CWANB is a hybrid attribute weighting approach. Firstly, it uses the correlation-based attribute weighting filter to initialize the attribute weights. Then, it adjusts the attribute weights constantly by the attribute weight adjustment wrapper. The attribute weight adjustment wrapper is a greedy search algorithm. In each iteration, only one weight value of the global best attribute is updated.
Using the correlation-based attribute weighting filter to initialize all attribute weights is fast and effective. It has low time complexity. To calculate all initial attribute weights, the training time complexity is O(m 2 v 2 ) only, where v is the average number of values for an attribute, and m is the number of attributes. In order to find the best set of attribute weights which maximize the classification accuracy, the wrapper procedure needs to traverse across all attributes and all multiplication factors. The time complexity is mainly related to the loop structure. Building an NB classifier and evaluating its performance on the training dataset takes time of O(mn + qmn), where q is the number of class labels, m is the number of attributes, n is the number of training instances. Let us assume that the loop executes t times, the time complexity of the wrapper is O(t * mk * (mn + qmn)), where k is the number of multiplication factors. The total time complexity of our CWANB approach is O(t * mk * (mn + qmn) + m 2 v 2 ), which is approximately to O(ktm 2 n). In reality, our algorithm can be addressed parallelly. In this manner, the time complexity of our algorithm can be reduced to O(ktmn).

IV. EXPERIMENTS AND RESULTS
The purpose of this section is to validate the effectiveness of our proposed CWANB, and thus we design a group of experiments to compare CWANB with NB and other five existing state-of-the-art attribute weighted competitors. These competitors and their abbreviations are listed as follows.
• DEAWNB: NB with differential evolution-based attribute weighting [32]. We conducted our experiments on the whole 36 University of California, Irvine (UCI) datasets [37] published on the main web site of WEKA platform [38], which represent a wide range of domains and data characteristics. In our experiments, missing attribute values were replaced with the modes of the nominal attribute values and the means of the numerical attribute values from the available data. Numerical attribute values were discretized using the Fayyad & Irani's MDL method implemented in the WEKA platform [39]. Apparently, if the number of values of an attribute was equal to the number of instances, it rarely contributes to classification. For example, the attribute ''Hospital Number'' in the data set ''colic.ORIG'' is an useless attribute. In these 36 datasets, there are only three such attributes. So, we manually delete these three useless attributes: ''Hospital Number'' in ''colic.ORIG'', ''instance name'' in ''splice'' and ''animal'' in ''zoo''.   Table 1 shows the detailed comparison results in terms of the classification accuracy. All of the classification accuracies were all obtained by averaging the results from 10 separate runs of stratified 10-fold cross-validation. We use two-tailed t-test with the p = 0.05 significance level to compare our proposed CWANB with its competitors [40]. The symbols • and • in the table denote statistically significant improvement or degradation over its competitors, respectively. The averages and the Win/Tie/Lose (W /T /L) values are summarized at the bottom of the table. The average (arithmetic mean) of each algorithm across all datasets provides a gross indicator of the relative performance in addition to the other statistics. In the table, each W /T /L implies that compared to its competitors, our proposed CWANB wins on W datasets, tie on T datasets, and lose on L datasets.
Then, we employ a corrected paired two-tailed t-test with the p = 0.05 significance level to thoroughly compare each pair of algorithms. Table 2 shows the detailed summary test results. In Table 2, for each entry i(j), i is the number of datasets on which the algorithm in the column achieves higher classification accuracy than the algorithm in the corresponding row, and j is the number of datasets on which the algorithm in the column achieves significant wins with the p = 0.05 significance level with regard to the algorithm in the corresponding row. Table 3 shows the detailed ranking test results. In Table 3, the first column is the difference between the total number of wins and the total number of losses that the corresponding algorithm achieves compared with all the other algorithms, which is used to generate the ranking. The second and third columns represent the total numbers of wins and losses, respectively.
From these comparison results, we can see that our proposed CWANB significantly outperforms standard NB. Meanwhile, it is overall better than other five state-of-theart attribute weighting approaches. We summarize the main highlights of these comparisons as follows: 1) The averaged classification accuracy of CWANB on 36 datasets is 84.83%, which is higher than standard NB (83.31%) and other five state-of-the-art attribute weighted competitors, such as GRAWNB (82.70%), DTAWNB (83.37%), DAWNB (83.61%), CAWNB (84.41%) and DEAWNB (84.39%). 2) According to the paired two-tailed t-tests, CWANB is overall the best. CWANB is much better than NB (16 wins and 0 loss), GRAWNB (16 wins and 0 loss), DTAWNB (10 wins and 2 losses), DAWNB (14 wins and 0 loss), CAWNB (7 wins and 1 loss) and DEAWNB (6 wins and 0 loss), respectively. 3) According to the summary and ranking test results, CWANB is the best one (69 wins and 3 losses), and the overall rank (descending sort) is CWANB, DEAWNB, CAWNB, DAWNB, DTAWNB, NB and GRAWNB. 4) Generally, our proposed CWANB approach performs the best compared with its competitors in terms of the classification accuracy. Besides, in our experiments, we have also observed the performance of our proposed CWANB in terms of the elapsed training time (in milliseconds). Our experiments were conducted on a Linux machine with 3.2 GHz processor and 8 GB of RAM. The detailed comparison results are shown in Tables 4-6. Note that the meaning of the t-test results in these tables are opposite to those in Tables 1-3. For the elapsed training time, a small number is better than a large number. Thus, in terms of the elapsed training time, the symbols • and • in Table 4 denote statistically significant improvement or degradation over its competitors, respectively. In the table 4, each W /T /L implies that compared to its competitors, our proposed CWANB wins on W datasets, tie on T datasets, and lose on L datasets. From these comparison results, We summarize the main highlights of these comparisons as follows: 1) The averaged elapsed training time of CWANB on 36 datasets is 6718.55 milliseconds, which is much lower than that of DEAWNB (21204.99 milliseconds). So, our proposed CWANB approach is much faster than the DEAWNB approach. 2) Compared with DEAWNB, CWANB is much better than DEAWNB. It reduces the elapsed training time dramatically on 32 datasets, and only loses on 1 dataset. 3) According to the summary and ranking test results, we can see that our proposed CWANB is indeed   slower than the attribute weighting filters (GRAWNB, DTAWNB, DAWNB and CAWNB), but much faster than the attribute weighting wrapper (DEAWNB).

V. CONCLUSION AND FUTURE WORK
In this study, we focus our attention on attribute weighting, and propose a new paradigm for the hybrid attribute weighting method. The improved model is called the correlation-based weight adjusted naive Bayes (CWANB). In CWANB, we use a correlation-based attribute weighting filter to initialize the attribute weight vector. Then, an attribute weight adjustment wrapper is used to adjust each weight to find the optimal attribute weight vector. Extensive experiments are conducted to compare CWANB with NB and other five state-of-the-art attribute weighting approaches in terms of the classification accuracy and the elapsed training time.
The comparison results show that CWANB has the highest classification accuracy. At the same time, CWANB reduces the elapsed training time dramatically compared with the existing wrapper. How to learn the attribute weights is a crucial problem in attribute weighted NB. An interesting future work will be VOLUME 8, 2020 the exploration of more effective hybrid attribute weighting approach to improve the current version. In addition, applying the hybrid attribute weighting approach to improve other classification models will also be considered in our future work. LIANGJUN  MEIZHANG HE received the Ph.D. degree from Wuhan University, in 2016. He is currently a Lecturer with the College of Computer, Hubei University of Education. His research interests include data mining, machine learning, big data analysis, and artificial intelligence. VOLUME 8, 2020