On the combination of two visual cognition systems using combinatorial fusion

When combining decisions made by two separate visual cognition systems, statistical means such as simple average (M 1) and weighted average (M 2 and M 3), incorporating the confidence level of each of these systems have been used. Although combination using these means can improve each of the individual systems, it is not known when and why this can happen. By extending a visual cognition system to become a scoring system based on each of the statistical means M 1, M 2, and M 3 respectively, the problem of combining visual cognition systems is transformed to the problem of combining multiple scoring systems. In this paper, we examine the combined results in terms of performance and diversity using combinatorial fusion, and study the issue of when and why a combined system can be better than individual systems. A data set from an experiment with twelve trials is analyzed. The findings demonstrated that combination of two visual cognition systems, based on weighted means M 2 or M 3, can improve each of the individual systems only when both of them have relatively good performance and they are diverse.


Introduction
Many decisions that humans have to make are partially, or even wholly, based on visual input. The split second nature of such decisions may make the process seem simple. However, there are many factors that are considered and combined during this short time frame. On a neurological level, there has been growing interest in understanding the factors that are combined within the visual aspect alone [1,2], as well as how visual information is joined with information from other senses [3][4][5][6][7]. Combination of multiple visual decisions has also been explored [5,8,9].
Prior research into how pairs of people can interactively make decisions based on visual perception has been conducted by several researchers including Bahrami et al. [8], Ernst [5], and Kepecs et al. [9]. In Bahrami's work, four predictive models are used on experiments of varying degrees of noise, feedback, and communication: coin-flip (CF), behavioral feedback (BF), weighted confidence sharing (WCS), and direct signal sharing (DSS). Bahrami concludes that the WCS model is the only one that can be fit over the empirical data. His findings indicate that the accuracy of the decision-making is aided by communication between the pairs and can greatly improve the overall performance of the pair.
Marc O. Ernst expands on the concept of WCS [5] between pairs by proposing a hypothetical soccer match during which two referees determine whether the ball falls behind a goal line. Similar to Bahrami's proposal, Ernst's findings indicate that simply taking the approach of BF or a CF omits information which could lead to an optimal joint decision between the pair. However, while Ernst agrees that the WCS model can lead to a beneficial joint determination, his findings also indicate that there are improvements that can be made to the WCS model to achieve a more optimal joint decision. With Ernst's scenario, Bahrami's WCS model can be applied as the distance of the individual's decision (d i ) divided by the spread of the confidence distribution (r), which is d i /r i . A modified version of WCS (which closely resembles DSS) using sigma-square can produce a more accurate estimate through the joint opinion, which is represented as d i /r i 2 . In an affirmation of Bahrami's research, Ernst also notes that joint decision-making comes with a cost when individuals with dissimilar judgments attempt to come to a consensus in such a manner. Bahrami and Ernst set forth very different experimental methods, but their aim is very much the same: to devise an algorithm for optimal decisionmaking between two people based on visual sensory input.
In the other direction, neural bases for decision-making and combining sensory information within senses have been studied by Gold and Shadlin [10] and Hillis et al. [1]. Koriat [11] indicated that there is no need to combine two heads' decisions under a normal environment. His suggestion is to simply take the decision of the most confident person.
Combinatorial Fusion Analysis (CFA), an emerging information fusion paradigm, was proposed for analyzing the combination of multiple scoring systems (MSS) (see Hsu et al. [12][13][14]). CFA has been shown to be useful in several research domains, including sensor feature selection and combination [15,16], information retrieval, system selection and combination [12,17], text categorization [18], protein structure prediction [19], image recognition [20], target tracking [21], ChIP-seq peak detection [22], and virtual screening [23]. These studies have shown in its respective domain that combination of MSS performs better than individual systems when the individual scoring systems perform relatively well and they are characteristically different [13,14].
In a series of previous studies [24][25][26], a modified version of the soccer goal line decision proposed by Ernst is used as the data collection method. In this experiment, two subjects observe a small target being thrown into a grass field. The subjects are separately asked of their decision on their perceived landing point of the target and their respective confidences in their decisions. More recently, we conducted two sets of experiments with a total of 20 trials on two different days (12 trials and 8 trials) [27,28]. In each of these trials, a small token was thrown into a grass field and landed at location A = (A x , A y ). Two subjects P and Q standing 40 feet away from the landing site would perceive the landing site as at location P = (P x , P y ) and Q = (Q x , Q y ) with confidence radius r P and r Q , respectively. In these works, each visual cognition system is treated as a scoring system which assigns a score to each of the partitioned intervals in the common visual space. Then the problem of combining visual cognition systems is transformed to the problem of combining multiple scoring systems. The combination is analyzed using the CFA framework. Results obtained showed that combination by rank as well as by score can improve individual systems.
In this paper, we explore the issue of when and why a combination of two cognitive systems is better than each individual system using the CFA. In particular, we use the concept of ''cognitive diversity'' and the notion of ''performance ratio'' to analyze the outcome of the combination. Using the data set from the experiment with twelve trials [27], we demonstrate, as in other domain applications, that combination is positive (better than or equal to the best of the two individual systems) only if the two systems, based on weighted mean using confidence radius, are relatively good (higher performance ratio) and they are diverse (higher cognitive diversity). Section 2 of this paper discusses two methods of combining visual cognition systems: statistical mean and combinatorial fusion. In Sect. 2.1, three statistical means M 1 , M 2 , and M 3 are calculated as average or weighted mean using the confidence radius as the weight. Based on these means, scoring systems p and q are constructed from the two visual cognition systems P and Q, respectively, in Sect. 2.2. Section 2.3 gives the method to combine these two visual scoring systems using the CFA framework. Section 3 gives the definition of cognitive diversity and the notion of performance ratio. Section 4 consists of examples, in particular the data set of an experiment with twelve trials of pairs of visual cognition systems [27]. Combination of these two visual cognition systems and analysis of the combination for the data set is discussed in more detail in Sect. 4.2 and 4.3. A summary of the results and possible future works is discussed in Sect. 5.
2 The CFA framework for combining two visual cognition systems

Computing various statistical means
When we make a decision based on visual input, we can consider this decision-making as a contemplation of various choices or candidates. Given two perceived locations P = (P x , P y ) and Q = (Q x , Q y ) (with confidence radius r P and r Q , respectively) of the actual landing site A = (A x , A y ), we wish to find a new location L (obtained by the joint decision of P and Q) so that L is better than P and Q (distance between L and A is smaller than those between P and A, and Q and A).
When determining a joint decision, typically an average or a weighted average approach is used to determine a mean. Average mean M 1 = (M 1x , M 1y ) of the two locations P = (P x , P y ) and Q = (Q x , Q y ) is calculated as and weighted means are obtained by and where P and Q are the perceived locations of the individual subjects P and Q, and r P and r Q are the confidence measurement of the two subjects, respectively.

Converting each visual cognition system to a scoring system
In the experiments we conducted, each of the two subjects provides an individually determined decision on where they respectively perceived the same target has landed in a field. Each coordinate on the field can be considered as a candidate for the respective participants' decisions of the perceived landing point. We are able to obtain a weight for each decision and their combination by asking each subject of a radius measurement of confidence around his or her decision. The smaller the radius measure of confidence, the more confident is the participant. We use radius R to calculate the spread (i.e., standard deviation) of the distribution around the perceived landing point, or r. In our research, we use

Set common visual space
The r values are used in Formulas (1), (2), and (3) to determine the positions of the means and denoted as M 1 , M 2 , and M 3 respectively. The distance between M i and A, where A is the actual landing site, is used to evaluate the performance of M i . With the field used as a two-dimensional coordinate grid, P, Q, and A are represented as x-and ycoordinates. Three formulas are used to calculate the mean of P and Q, as M i , where i = 1, 2, or 3. M i falls somewhere in between points P and Q and is determined as a coordinate. The longer of either segment PM i or M i Q is extended 30 % to the left to point P 0 or to the right to point Q 0 , respectively. The shorter side is extended more to create the widened observation area P 0 Q 0 so that M i is the midpoint of P 0 and Q 0 . We refer to the line segment P 0 Q 0 as the common visual space (Fig. 1).
We partition the length, d(P 0 ,Q 0 ), of line segment P 0 Q 0 into 127 intervals with midpoint d i in each interval i, i = 1, 2, …, 127, and with each interval length d(P 0 ,Q 0 )/127. The midpoint of the center interval, in this case, d 64 , contains M i .

Treat P and Q as two scoring systems p and q
Normal distribution probability curves for each participant are created with the point P and Q as the mean and using the confidence radii values, r P 2 and r Q 2 of P and Q as the variances of P and Q, respectively (see Fig. 2 in the case of 15 intervals). The following formula is used to determine normal distribution: where x is a normal random variable, l is the mean, and r is the standard deviation. A normal distribution curve spans infinitely to the right and to the left. Therefore, our two scoring systems p and q create overlapping distributions that span the entire visual plane between P 0 and Q 0 . Scoring system p and scoring system q, respectively, scores each of the 127 intervals on the common visual space. For normal distribution functions with point P and Q as the mean and r P and r Q as the standard deviation respectively, each of the scoring systems p and q assigns interval d i a score between 0 and 1 according to formula (5) (see Fig. 2 in the case of 15 intervals). These are the score functions s p and s q . The values of the score function s are sorted from highest to lowest to obtain the rank functions r p and r q , respectively (see Fig. 3). The d i with the lowest integer as its rank has the highest score. In the setting of this paper, the score function s C of the score combination of derived scoring systems p and q in our experiment is The score function s D of the rank combination of the two scoring systems p and q in our experiment is When we sort s C (d i ) in descending order, we obtain the rank function of the score combination, called r C (d i ). When we sort s D (d i ) in ascending order, we obtain the rank function of the rank combination, called r D (d i ). The top ranked interval in r C (d i ) is called C. The top ranked interval in r D (d i ) is called D (see Fig. 3). These points are considered the optimal score and rank combination, respectively, and are used for evaluation of the combination result. The performance of the points (P, Q, M i , C, and D) is determined by each respective point's distance from target A. A shorter distance indicates higher performance (Fig. 4).

Cognitive diversity
Given the score function s A of the system A and its derived rank function r A , rank-score characteristic (RSC) function f A , which is a composite function of s A and the inverse of r A , defined by Hsu et al. [13,14] is a function from N to R and can be computed mathematically as (see Fig. 5).
The cognitive diversity between two scoring systems p and q, d(p,q) is calculated using RSC functions f p and f q (also see [23]) as   P and Q is the performance ratio after it is normalized again among the twelve ratios to be in (0, 1].

Data set
We use the data set from an experiment of twelve trials conducted by the authors in [27]. Each trial consists of two volunteers P and Q with confidence radius r P and r Q . Each gives a visual cognitive estimate of the actual token landing site A as P and Q respectively. Table 1 lists coordinates of P (P x , P y ), Q (Q x , Q y ), and A (A x , A y ) as well as the confidence radius r P and r Q of P and Q respectively.

Combination results and analysis
The decision of Participant p, marked as P, and the decision of Participant q, marked as Q, are used to obtain line segment PQ. The radii of confidence are used to calculate the two r values to locate the coordinates of points M 1 , M 2 , and M 3 along the extended P 0 Q 0 . To combine and compare the two visual decision systems of p and q, a common plane must be implemented to be evaluated by the different systems. The 127 intervals along the P 0 Q 0 line serve as the common visual space to be scored.
When P 0 Q 0 has been partitioned into the 127 intervals mapped according to M i , the intervals are scored according to the normal distribution curves of P and Q using the standard deviation r P and r Q , respectively. Both systems assume the set of common interval midpoints d 1 , d 2 , d 3 ,…,d 127 . Each scoring system, p and q, consists of a score function. We define score functions s P (d i ) and s Q (d i ) that map each interval, d i , to a score in systems p and q, respectively. The rank function of each of the systems p and q maps each element d i to a positive integer in N, where N = {x | 1 B x B 127}. We obtained the rank functions r P (d i ) and r Q (d i ) by sorting s P (d i ) and s Q (d i ) in descending order and assigning a rank value from 1 to 127 to each interval. C and D based on M i , for i = 1, 2, and 3, are calculated, and the distances to target A are computed. The point with the shorter distance from the target is considered the point with the better performance. Table 2 lists the performance of (P, Q), confidence radius of P, Q and performance of C and D based on M i , i = 1, 2, and 3. Table 3 lists performance for M i , i = 1, 2, and 3 in the twelve trials. Table 4 gives comparisons of the performance of C or D to that of P and Q, and to M i . We note that Koriat's criterion, taking the decision of the most confident system, gives a correct prediction of 7 out of the 12 trials (Trials 1, 2, 4, 6, 8, 9, and 11). The score combination C or rank combination D obtained by CFA improves P and Q in 8, 7, and 6 out of the 12 trials when the common visual space mean is M 1 , M 2 , and M 3 respectively. It is interesting to note that C or D improves P and Q in more trials based on M 1 than those based on M 2 or M 3 because M 1 does not take into consideration the confidence radius as weighted means (Table 4(a)). The same reason can be given to Table 4(b) where C or D can improve M 1 in more trials than M 2 or M 3 . In addition, in the 4 trials (Trials 3, 5, 10, and 12) that Koriat's criterion fails to apply, they can all be improved using the CFA framework. Figures 6 and 7 illustrate the performances of P, C, D, M i and Q for i = 1, 2, and 3 in Trials 2 and 7 respectively. In Trial 2, P performs quite good and has a higher confidence radius than Q. When given weighted means M 2 and M 3 , combinatorial fusion C or D performs better than P and Q. However, in Trial 7, P performs better but has a lower confidence radius than Q. In this case, C or D does not On the combination of two visual cognition systems 25 improve P and Q based on M 2 or M 3 when more weight is given to Q. Therefore, we observe that giving more weight to the better performer with a higher confidence leads to a combination which improves P and Q. We call such a case a positive case. In the following Sect. 4.3, we investigate in general when combination (either rank or score combination) can improve P and Q.

Positive cases versus Negative cases
We plot the result of a score or rank combination of P and Q, distinguishing positive cases as ''h'' or ''e'' and negative cases as ''9'' or ''?'' on the two-dimensional coordinate plane with the y-axis as the cognitive diversity d(P, Q) and the x-axis as the performance ratio P l /P h (lower performance over higher performance) for all the trials for each M i , i = 1, 2, or 3. Each trial within each graph is noted as positive when rank or score combination performs better than both P and Q, and negative when it does not. The average for all positive cases and the average for all negative cases is also marked for each graph as ''j'' and ''X'' respectively. Cognitive diversity between P and Q, d(P, Q), is the diversity between two RSC functions f p and f q , d(f p , f q ), and   Each bold number indicates the performance of M i in the Trial is better than P and Q. M 3 is best among M i 's in Trials 2, 4, 6, 8, 9, and 11 Table 4 Comparisons of performance of C or D to that (a) of P and Q, (b) of M i , and (c) of P, Q, and M i (set of 36 cases in Table 2 is calculated using formula (9). Cognitive diversity values are normalized to (0, 1] in each case based on M i , i = 1, 2, and 3 (see Table 5). Figure 8 depicts the positive versus negative cases based on each M i , i = 1, 2, and 3 ( Fig. 8a-c respectively) in terms of cognitive diversity (y-axis) and performance ratio (x-axis).

Summary and future work
In our previous work [27,28], it has been demonstrated that combination of two visual cognition system using the CFA framework can improve each of the individual systems. In this paper, we analyze outcomes of these combinations according to positive cases or negative cases using the notions of cognitive diversity and performance ratio on the data set of an experiment with 12 trials [27]. It is demonstrated that in the majority of the 72 cases of rank combinations and score combinations (12 9 2 9 3 = 72) (see Fig. 8a-c), combination of two visual systems, based on weighted means M 2 or M 3 , can outperform each of the individual systems only if they each perform relatively well (with higher performance ratio) and they are diverse (with high cognitive diversity). In an earlier work by Hsu and Taksa [12], it was shown that under certain conditions, rank combination can be better than score combination. In the current study, each of the six trials (Trials 1, 2, 5, 6, 9, and 10) has higher diversity than the remaining six trials. Similar to the results in [12], the six trials do have better rank combination (D) than score combination (C). It is also interesting to note that improvement in the other six trials was carried out by rank combination only ( Trial 3,4,7,8,11,and 12). In other cases, whenever score combination (C) improves P and Q, rank combination (D) can also improve. All these indicate that the CFA framework, which uses score and rank combination, is robust in analyzing combination and decision problems for visual cognition systems.
In the combination of decisions or visual cognition systems, as well as the integration of signals from different sensors, statistical means or weighted means such as M 1 , M 2 , or M 3 are often used [1,3,4,5,8]. It has been observed in these previous studies that M 3 , using 1/r P 2 (or 1/r Q 2 ) as the weight assigned to system P (or Q), provides better combination results. In our current study, when comparing M 1 , M 2 , and M 3 in each of the 12 trials, it is shown that M 3 is better than M 1 and M 2 in 6 of the 12 trials, while M 1 and M 2 are the best in 5 and 1 of the 12 trials respectively, independent of the performance of P and Q. So our current study supports that observation. However, when comparing improvements of M i over P and Q, it was shown in our study that the statistical means M 1 , M 2 , and M 3 can improve P and Q in 4, 3, and 3 trials, respectively (see Table 3). On the other hand, the CFA framework (C or D) based on M 1 , M 2 , or M 3 can improve P and Q in 8, 7, or 6 trials. All these indicate that the CFA framework is a viable analytic method in combining visual cognition systems and can be generalized to analyze data in bioinformatics and neuroscience.
In summary, our CFA framework provides two criteria: performance ratio and cognitive diversity to guide us to combine two visual cognition systems with confidence radii. In the case of unsupervised learning or when the performance cannot be evaluated (e.g., the location of A is not known), cognitive diversity itself can be used to direct us when to combine (when the cognitive diversity is big enough) or how to combine (use rank combination or score combination) (see [12, 14, 21, 22, and 23]).
Our future work includes the following: (1) Apply CFA framework to the combination of more than two visual systems; (2) Study the effect of the number of partition intervals in the common visual space defined by P 0 Q 0 ; (3) Use other diversity measurements such as Pearson's correlation (between two score functions s A and s B ) and Kendall's tau (see [29]) or Spearman's rho (between two rank functions r A and r B ); and (4) Apply CFA framework to combination of multiple sensing systems or combination of multi-modal physiological systems.