Improved Base Belief Function-Based Conflict Data Fusion Approach Considering Belief Entropy in the Evidence Theory

Due to the nature of the Dempster combination rule, it may produce results contrary to intuition. Therefore, an improved method for conflict evidence fusion is proposed. In this paper, the belief entropy in D–S theory is used to measure the uncertainty in each evidence. First, the initial belief degree is constructed by using an improved base belief function. Then, the information volume of each evidence group is obtained through calculating the belief entropy which can modify the belief degree to get the final evidence that is more reasonable. Using the Dempster combination rule can get the final result after evidence modification, which is helpful to solve the conflict data fusion problems. The rationality and validity of the proposed method are verified by numerical examples and applications of the proposed method in a classification data set.


Introduction
Dempster-Shafer theory (D-S theory) [1,2] plays a vital role for addressing uncertainty in medical diagnosis [3], target recognition [4,5], fault diagnosis [6], classification [7][8][9], clustering [10][11][12], risk analysis [13] and many other fields [14]. D-S theory can clearly measure the uncertainty of events, and then provide the basis for decision-making by the data fusion results. However, due to the complexity of data, evidence conflicts are often encountered in the actual data processing. In [15], the concepts of conflict from different perspectives are proposed to clarify what conflict is and from where the conflicts come. In [16,17], Zadeh points out that if a conflict exists between the subjects of evidence, classical evidence theory will often get the opposite results in its normalization process. Due to the nature of Dempster combination rule, it may produce results contrary to intuition [16,17]. Smets analyzes the 'jungle' of combination rules and the nature of the combinations [18]. Classical evidence theory can not deal with conflict data effectively, which greatly restricts the promotion and application of evidence theory. Therefore, this paper studies the conflict data fusion.
Because the several pieces of evidence from multiple information elements are often inconsistent, the data is often in conflict. Many experts and scholars have done a lot of research on conflict data fusion. At present, there are many methods to solve conflict data fusion [19,20]. Part of the research focus on proposing new combination rules like inconsistent measure-based rule [21], combination rule considering evidence dependence [22] or improving the original combination rules using belief entropy-based method [23], fuzzy element [24], so as to improve the results of conflict data fusion. In [25], some basic principles are proposed after a systematic review of existed fusion rules, which can reasonably solve the fusion when there exits incomplete information. In [26], a new combination rule is proposed, which is based on the analysis and illustration of similarity collision. And this method aims to solve the problem of conflict. In [27], Decision making trial and evaluation laboratory (DEMATEL) method is proposed to merge conflicting data. New combination rule can effectively solve problems in recognition field. In [28], the improved combination rule of D-number is applied to emitter identification. In [29], a method is proposed to select the source behavior, which is based on a very general and expressive fusion scheme. The important advantage of this method is that it can clearly explain the assumption of the source. Furthermore, incomplete information should also be considered in conflict data fusion [30][31][32].
In addition, another method of conflict fusion is to manage the uncertainty in the evidence sources before evidence fusion. Entropy is a typical method for uncertainty measure and management [33]. In evidence theory, the belief entropy [34][35][36] or uncertainty measure of mass function [37,38] is used to address the information volume of evidence, so as to modify evidence sources. In [39], Deng entropy is proposed, which can not only deal with the uncertainty of basic probability distribution effectively but also correctly. Deng entropy, as a belief entropy, has been widely used in many applications such as risk analysis [40]. In [41], a Multiple-Criteria Decision-Making method is proposed, which is based on D Numbers and Belief Entropy. This method can deal with the conflict problems effectively. In [42], a novel belief entropy is proposed to measure uncertainty of basic probability assignments, which is based on belief function and plausibility function. This method can deal with the conflicts reasonably during information fusion. In [43], an improved method is proposed to combine conflicting evidence, which is based on the similarity measure (which can evaluate the similarity between two things) and belief function entropy.
Besides, constructing initial belief on each evidence can reduce the conflict between BPAs. In [44], base belief function is proposed, which can modify the BPAs to deal with conflict data fusion. Based on this, in [45], an improved method is proposed to manage conflict data by assigning an elementary belief. On the basis of the conflict data fusion strategy, the improved base belief function [45] and belief entropy [39] are used to solve the problem of conflict data fusion. The procedure of the proposed method is as follows. Firstly, the BPAs are modified by the improved method of base belief function. Secondly, belief entropy is used to calculate the information volume and to get the weight of each evidence group. Thirdly, the weight is used to modify the BPAs again. At last, the Dempster combination rule is used for data fusion.
The proposed method can solve the data conflict problems effectively which can get better combination result. The proposed method considers both the focus elements in the current evidence and the proposition in the power set space. In addition, the proposed method reallocates the BPAs for conflicting data, which also can solve some the initial BPAs that are zero value in each evidence group. At the same time, due to different information sources having different influence on the final results, the proposed method can distribute the weight according to the information volume of information sources. The final BPAs obtained by using the belief entropy can make the data fusion results more logical.
The following parts of this paper are organized as follows. In Section 2, we review some basic concepts. Then in Section 3, we propose an improved approach using information volume to weight basic probability assignment, so as to obtain a reasonable combination result using D-S theory. In addition, a few examples are given to verify the correctness of the proposed method. In Section 4, the classification experiments are presented to show the effectiveness of the proposed method. The open issues are given in Section 5. Finally, conclusions of proposed method are given in Section 6.

Preliminaries
In this section, some preliminaries are introduced.

Dempster-Shafer Evidence Theory
Dempster-Shafer theory [1,2], which is known as belief function theory, is the extension of the Bayesian subjective probability theory. The evidence theory was developed by Shafer, the concept of belief function is also introduced by him. Shafer formed a set of mathematical methods of "evidence" and "combination" to settle the uncertain reasoning. The D-S evidence theory does not need to know the prior probability, which can represent "uncertainty" well. In addition, D-S theory is widely used to deal with uncertain data. It is mainly applicable to information fusion, expert system, information and legal case analysis, multi-attribute decision-making analysis as an uncertain reasoning method. Its biggest characteristic is to use "interval estimation" instead of "point estimation" for the description of uncertainty information, so as to distinguish the unknown and uncertain aspects, accurately reflect the evidence collection, which shows great flexibility.
Let U be the frame of discernment (FOD). Basic Probability Assignment (BPA) is a mass function m which is 2 U → [0, 1] and satisfies In FOD, belief function is defined as, The plausibility function [46] is defined as, The Dempster combination rule is a key step to combine the output of multiple principals. For two mass functions m 1 and m 2 , the Dempster combination rule can be defined as follows: where a coefficient K is defined as follows: The advantages of the Dempster combination rule are mainly reflected in the case of less evidence conflict. However, the Dempster combination rule also has some disadvantages. If there is a high conflict between two pieces of evidence, the following defects will appear when using them: It may assign 100% belief to a small possible proposition, which will produce results contrary to intuition. It is also very sensitive to the allocation of basic reliability.
In D-S theory, for hypothesis A in FOD, the belief function Bel(A) and plausibility function Pl(A) are calculated according to the basic probability assignment BPA to form the belief interval [Bel(A), Pl(A)], which is used to indicate the degree of confirmation of hypothesis A.

Belief Entropy
Belief entropy is one of the hot issues in the field of information fusion, many types of belief entropy are proposed such as Dubois-Prade's entropy [47], Jirousek-Shenoy entropy [48,49], Deng entropy [39], and so on [50]. Deng entropy as a measurement of uncertain information is defined as follows [39,51]: among them, m is the a mass function defined on the FOD X, and A is a focal element of m. |A| stands for the cardinality of A.

Improved Base Belief Function
The improved base belief function is proposed to obtain the modified BPAs before data fusion in [45]. Let θ be a set of N possible values which are mutually exclusive. So, the power set of θ is 2 θ , where the number of elements is 2 N . If the FOD is complete, m(∅) = 0. Determine the number of propositions with initial belief degree assigned in evidence group as λ. Thus, the improved base belief function n(R i ) is defined as [45]: where R i represents a subset in FOD Ω. λ represents the number of propositions with initial belief degree in evidence group. Then n(R i ) is adopted to modify the initial BPA m through the arithmetic mean [45]: The following is an example of calculating the improved base belief function [45]. For FOD Ω = {a, b, c}, the BPAs are as follows: There are four focal elements {a}, {b}, {c} and {a, b} in m 1 and m 2 . So, λ=4. Using Equation (7), the value of the improved base belief function is n( There are 3 elements in FOD, so the size is 2 3 − 1 = 7. With Equation (8), the modified BPA are as follows:

Proposed Method
In this section, the improved base belief function in [45] and a belief entropy in [39] are adopted to construct a new data fusion method.

Method
The procedure of the proposed method is listed as follows. And the flowchart of the proposed improvement method is shown in Figure 1.  Step 1: For potentially conflict data, the improved base belief function method n in Equation (7) is used to modify the BPAs to get the modified evidence m of Equation (8).
Based on the improved base belief function, the situation where the belief of a proposition is zero can be avoided, which can overcome the shortcoming of Dempster combination rule in conflict data fusion.
Step 2: For the ith evidence, the information volume Iv is calculated through the Deng entropy Ed(i) [39]. Iv is defined as follows: Calculating information volume is the basis of obtaining weight.
Step 3: For each evidence, the weight w(i) is defined as follows: Due to different information sources have different influence on the final results, weight can represent the impact of each evidence group on the final result. In this way, each evidence group is assigned a small weight, which is more reasonable in application.
Step 4: The weights are obtained through step 3 to modify the BPAs before fusing data. After evidence modification using the base belief function and information volume-based uncertainty, the final evidence for data fusion can be calculated as follows: Using the weight factor to modify the BPAs again to get the final evidence. And fusing the final evidence can obtain better results which is more realistic.
Step 5: The final evidence obtained by step 4 can be fused through the Dempster combination rule in Equation (4) to get the final result. If there are n bodies of evidence, then the modified evidence will be fused with n − 1 times.
Step 6: Decision making based on the data fusion result.

Examples and Discussion
Numerical examples are given to explain and verify the rationality of the proposed method.

Example 1.
Supposed that the FOD is Ω = {a, b} and the BPAs are given as The improved base belief function based on Equation (7) is Then, the improved base belief function is used to modify the BPAs based on Equation (

Example 2.
Supposed that the FOD is Ω = {a, b, c} and two sets of BPAs (adopted from Zadeh [16,17]) are as follows. Then, the modified BPAs using improved base belief function are: After data modification, the information volume of E1 and E2 and the final evidence are:  After evidence modification, the information volume of each evidence (E1 and E2) is: The final result can be calculated by Dempster combination rule, as shown in Table 1. The fusion result is compared with the methods with only classical Dempster combination rule and only the improved base belief function, as shown in Table 1

Method m(a) m(b) m(c) m(a, b) m(a, c) m(b, c) m(a, b, c)
Dempster's rule 0.  The data fusion results with the proposed method and the methods with only classical Dempster combination rule and only improved base belief function are shown in Table 2 and Figure 3.

Method m(a) m(b) m(c) m(a, b) m(a, c) m(b, c) m(a, b, c)
Dempster's rule 0.4865 0.0270 0.4865 0 0 0 0 Improved base belief function [45]   The final results with the proposed method and the methods with only classical Dempster combination rule and improved base belief function are shown in Table 3 and Figure 4.
Compared with the method only using Dempster combination rule, the proposed method can reflect the uncertainty among the events {a}, {b} and {c} reasonably. Compared with the method only using improved base belief function, the proposed method has more belief assignment on {a}. This is contributed by a belief assignment m 2 (a) = 0.05 on {a} in m 2 while no similar belief assignment on {c} in m 1 . The uncertainty in multi subset proposition is reflected by assigning less belief degree on {c} in comparison with the method only use the improved base belief function, which also reflects the differences in the initial BPAs between this example and the previous one.

Method m(a) m(b) m(c) m(a, b) m(a, c) m(b, c) m(a, b, c)
Dempster's rule 0.3448 0.0345 0.6207 0 0 0 0 Improved base belief function [45]  From the results of Examples 3, 4 and 5, the effectiveness and rationality of the proposed method for conflict data fusion are verified. Compared with the method with only Dempster combination rule or only improved base belief function, the proposed method can get a more rational fusion result. Because the proposed method considers both the initial belief assignment in base belief assignment and the information volume with belief entropy.

Application of Proposed Method
In this section, the classical example in machine learning to classify the Iris is adopted to evaluate the rationality and effectiveness of the proposed method. The real data set comes from the UCI machine learning library and the BPAs after evidence modelling are adopted from [44,52]. In the Iris data set, there are three species (named Setosa (a), Versicolor (b), and Virginica (c)), each species contains 50 instances. Each species of Iris has four attributes (sepal length (SL), sepal width (SW), petal length (PL), petal width (PW)).

Experiment 1
In [44], Wang et al. select 40 instances from each species randomly, so the remaining 10 are considered test sets. An instance is randomly selected from the species Setosa (a) of the test set to generate BPA. The BPAs of the four attributes are shown in Table 4. According to the steps of proposed method, the calculation procedure of this experiment is shown in Figure 5. The BPAs of the first two attributes (SL and SW) are assigned belief degree in all power set spaces, which does not lead to possible anti intuitive fusion results due to zero values when using Dempster composition rules. Therefore, only the BPAs of the last two attributes (PL and PW) will be modified using the proposed method. And the improved base belief function can be calculated as: n(a) = n(b) = n(c) = n(a, b) = n(a, c) = n(b, c) = n(a, b, c) = 1 7 According to the data modification steps based on the improved base belief function in the proposed method, the modification BPAs of the two attributes PL and PW is shown in Table 5.
After evidence modification, the information volume of each evidence is:   After the BPAs of attribute PL and PW is modified based on the belief entropy, the final evidence is fused three times by using the Dempster combination rule, and the final result is calculated. Table 6 shows the final data fusion results using the proposed method and using only the improved base belief function. From the final results, the belief degree of the test case to Setosa (a) species is the highest, which is consistent with the actual situation, indicating the rationality of the proposed method. In addition, the belief degree using the proposed method assigned to species of Setosa (a) is 67.98%, which is higher than 62.32% using only the improved base belief function. According to this, the validity and rationality of the proposed method are shown.

Method m(a) m(b) m(c) m(a, b, c)
Improved base belief function [45] Table 7, where θ means all the three species {a, b, c}.  All attributes of each sample have data conflicts, and only using Dempster composition rule will lead to possible illogical fusion results owing to zero values. So, according to the step 1 of the proposed method shown in Figure 5, all BPAs generated by four attributes of Setosa samples will be modified by using the improved base belief function firstly. Then the rest steps of the proposed method are executed to get the final evidence using belief entropy. The final combination results using the proposed method and only using the improved base belief function are shown in Table 8.  (c) m(a, b) m(a, c) m(b, c)  It can be seen from the combination results of two methods that the BPA of the proposition {a} is the highest in each sample. According to the final results, the sample is Setosa obviously. In addition, the BPA of hypothesis {a} using the proposed method is higher than only using improved base belief function. Compared with only using the improved base belief function, the proposed method can effectively deal with data conflicts to some extent. The results verify the validity and rationality of the proposed method.

Open Issues
Some open issues exist in the current work. First of all, the uncertainty measure in D-S theory is still an open issue. How to measure the reliable and independent of evidence needs further study. Is the belief entropy good enough for this open issue [33,39,50,51]?
Secondly, Dempster combination rule is axiomatically justified in [15,18,53]. Dempster rule can be used under the condition that two sources must be entirely reliable and independent. But this is also the source of problem, in practical world, there is full of uncertainty, it is hard to find two sources which are entirely reliable and independent. Among so many improved rules [29,54], how to find the proper one for the specific applications and cases?
The third is for information fusion in the open world assumption [30,31]. The unknown and new information should be taken into consideration. Dynamic evidence reasoning may be a choice [55].
The experiments should be conducted on several databases for further work.

Conclusions
Regarding conflict data fusion problems, an improved method is proposed in this paper, which is based on the belief entropy and improved base belief function in D-S theory. First, in the power set space of evidence, the initial belief is calculated through using the improved base belief function, and the initial belief is calculated according to the number of propositions with belief. Then, using the belief entropy measures the information volume of each evidence. The improved base belief function and information volume are used to modify the evidence. At last, the data fusion is based on Dempster combination rule. The effectiveness and rationality of the proposed method are verified by numerical examples and two applications of the proposed methed on classification data set.
The proposed method not only considers the focus elements which are assigned the initial belief in the current evidence, but also considers the proposition in the power set space such as the propositions which are zero value or are not assigned the belief degree. However, there are still some shortcomings. The proposed method only is applied to the closed-world hypothesis, however, the uncertain factors will increase in the open-world.