Uncertain Distribution-Based Similarity Measure of Concepts

,e similarity of concepts is a basic task in the field of artificial intelligence, e.g., image retrieval, collaborative filtering, and public opinion guidance. As a powerful tool to express the uncertain concepts, similarity measure based on cloud model (SMCM) is always utilized to measure the similarity between two concepts. However, current studies on SMCMhave twomain limitations: (1) the similarity measures based on conceptual intension lack interpretability for merging the numerical characteristics and cannot discriminate some different concepts. (2) ,e similarity measures based on conceptual extension are always instable and inefficient. To address the above problems, an uncertain distribution-based similarity measure of cloud model (UDCM) is proposed in this paper. By analyzing the definition of the CM, we propose a new complete uncertainty including first-order and secondorder uncertainty to calculate the uncertainty more accurately. ,en, based on the difference between the complete uncertainty of two concepts, the computing process of UDCM and its some properties are introduced. Finally, we exhibit its advantages by comparing with other methods and verify its validity by experiments.


Introduction
e similarity of concepts is a basic sense for human cognition, which is also a fundamental task in artificial intelligence. It plays a crucial role in semantic information retrieval systems [1][2][3], sense disambiguation [1,4], and information extraction [3,5]. Due to the uncertainty of the concept, for different people, their cognitions for the same concept are also different [6][7][8][9][10]. To describe various forms of uncertainty, researchers proposed many models: probability models for randomness [11], fuzzy set models for vagueness [12], and rough set models for inconformity and incompleteness [13]. Cloud model (CM) [14] proposed by Li et al. described the uncertain concepts in human language and realized human bidirectional cognition between their extension and intension. Owing to the strong expression ability for uncertain concepts, similarity measure based on cloud model (SMCM) has been applied in many scenarios, such as image segmentation [15][16][17][18], collaborative filtering [19][20][21], and synthetic evaluation [22][23][24][25][26]. Although SMCM has been successful in many scenarios, there are two urgent problems which need to be addressed.
From the perspective on conceptual intension, the similarity of the numerical characteristics can depict the similarity of concepts. To express the complex form of uncertainty, different numerical characteristics express different meaning for an uncertain concept. ere lacks a reasonable method of merging numerical characteristics to measure similarity.
Example 1. Figure 1 shows the shooting results of three shooters. We are asked to evaluate the similarity of performances between A, B, and C. It is an intractable problem. As shown in Figure 1, the variance of C's shooting results is close to A, which means the same psychological state is shared by them, but the mean value of C's performance is far from A, which means a significant difference between their shooting levels. Although B's shooting results are scattered, the mean value of B performance is close to A. For the psychological state, C is more close to A than B, but B is more close to A than C from the aspect of shooting level.
at is, considering the similarity from different angles, different results can be obtained. Hence, we need a more comprehensive method to measure the similarity.
On the contrary, the similarity measure is always computed by random realization, which directly reflects the similarity from abundant samples of concepts. e more the samples are generated, the higher the accuracy of SMCM is. erefore, we have to spend excessive computing time acquiring accurate SMCM. Besides, due to random realization, the results of SMCM are different each time, which can be illustrated in the following example: Example 2. Supposing two cloud models C 1 (5, 3, 1) and C 2 (5, 2, 0.5), their similarity is computed by fuzzy distancebased similarity (FDCM) [27], wherein the estimated overall score s is with n � 50. Table 1 shows results of their 10 similarity measures.
From the above analyses, it can be seen that SMCM still needs further study. To address the above problems, in this paper, we propose a new notion called complete uncertainty to depict the whole uncertainty in the process from numerical characteristics to the conceptual extension.
en, a new SMCM is presented based on completed uncertainty, which reflects the similarity of the uncertain distribution of two concepts. Compared with the SMCM based on extension, the new SMCM has an invariable result. Besides, it has a more reasonable method to merge numerical characteristics compared with the SMCM based on intension. Moreover, because that new SMCM reflects the complete uncertainty of the CM, it can acquire more accurate similarity results between two concepts. e remainder of this paper is organized as follows. e related definitions of the CM and current SMCM are introduced in Section 2. We propose uncertain distributionbased SMCM and elaborate the calculation in detail in Section 3. Section 4 provides four experiments to show the effectiveness of the proposed method. Final conclusions are presented in Section 5.

Preliminaries
In this section, we review relative concepts of the CM and current methods for similarity of the cloud model.

Cloud Model
Definition 1 (see [14]). Let U be a nonempty infinite set and C(Ex, En, He). If there is a number x ∈ U, which is a random realization of the concept C and satisfies x ∼ R N (Ex, y), where y ∼ R N (En, He) and the certainty degree of x on U is μ(x) � exp − (x − Ex) 2 /2y 2 , then the distribution of x on U is a Gaussian cloud or normal cloud, and each x is defined as a Gaussian cloud drop.
As a crucial model of CM, the Gaussian cloud model is applied widely due to the universality of the Gaussian distribution, and we only discuss the Gaussian cloud model in this paper. Gaussian cloud model introduces three numerical characteristics including Ex, En, and He, which denote mathematical expectation, entropy, and hyperentropy. It accords with human thought [14,[28][29][30][31][32] and depicts a unified framework of randomness and vagueness in the human cognitive process. Herein, the expectation represents the basic determinate domain of the qualitative concept, the entropy represents the uncertainty for the qualitative concept, and the hyperentropy represents the uncertainty for entropy. Figure 2 shows the shape and the characteristic curves of CM C(20, 6, 1), i.e., curve y 1 � exp − (x − Ex) 2 /2(En − 3He) 2 , which is called the inner envelope, and y 2 � exp − (x − Ex) 2 /2(En − 3He) 2 , which is called the outer envelope. We also call curve y � exp − (x − Ex) 2 /2En 2 the expectation curve.
In cognitive computing, cloud drops are called conceptual extension, i.e., samples of a concept. e numerical characteristics, expectation Ex, entropy Ex, and hyper-entropy He, are called conceptual intension representing the essence of a concept.
ere are many algorithms to implement bidirectional transformation between extension and intension of the CM, which can be called forward cloud transformation (FCT) algorithm and backward cloud  transformation (BCT) algorithm. Due to the limitation of space, we do not expatiate these algorithms in this paper, and relative methods can be found in [14,32,33].

Similarity Measure of Cloud Model.
To describe the similarity between two cloud concepts, many SMCM are proposed currently. Generally speaking, a suitable similarity measure should assure the correct conclusions in specific situations and require discriminability, efficiency, stability, and interpretability. Zhang et al. [34] used the average distance of cloud drops generated by FCT to measure the distance between two cloud models.
is method is called concept extension-based similarity measure (CS). It is understandable and accords with human cognition. However, calculation of average distance is highly complex and instable. Other researchers study SMCM based on concept intension. Likeness comparing method based on cloud model (LICM) [19] employs included angle cosine of vectors composed by numerical characteristics to measure similarity. It has high efficiency, but ignores the relationships among numerical characteristics leading to unreasonable results sometimes. Li et al. [35] proposed expectation based on cloud model (ECM) and max boundary based on cloud model (MCM), which define similarity measure by overlapping the area of characteristic curves. ese methods employ characteristic curves to denote the certainty degrees of cloud drops and then use similarity of uncertainty to measure similarity of concepts. Other SMCM based on characteristic curves can be found in [16,27,36,37]. Table 2 shows the comparison of similarity measures mentioned above on discriminability, efficiency, stability, and interpretability perspectives. In reality, due to merits on high efficiency, stability, and interpretability, MCM and ECM are widely applied in many situations.
From Table 2, the current popular SMCM are inefficient to distinguish two different concepts except CS because they only use partial uncertain relationship to measure the similarity between concepts. For example, if two cloud models have same expectation and entropy, their similarity calculated by ECM equals to 1, although they have different hyperentropy. In other words, ECM only uses the uncertain relationship between expectation and entropy, which is called first-order uncertainty. Ignoring hyperentropy will cause incorrect results. In the next section, we define a new notion of CM called complete (first-order and second-order) uncertainty and propose a new SMCM based on complete uncertainty, which focuses on the difference between distributions of two concepts.

Complete Uncertainty of the Cloud Model.
From discussion in Section 2, it is a critical step for SMCM based on conceptual intension to compute the whole uncertainty of CM. Due to the 3σ principle, we only consider cloud drops of 2 /2y 2 also varies depending on the random variable y ∼ R N (En, He). So, there are two forms of uncertainty in Definition 1. Firstorder uncertainty is certainty degree of drops, and secondorder uncertainty is the uncertainty of certainty degree. e complete uncertainty should include two uncertainties and reflect the relationships among three numerical characteristics. Figure 3 shows the two forms of uncertainty and their relationships. Hence, for each cloud drop, its uncertainty is a fuzzy set U(x) � (exp − (x − Ex) 2 /2y 2 , μ y ) , where μ y � exp − (y − En) 2 /2He 2 is the membership function depending on y. en, the complete uncertainty of CM C is denoted by

Uncertain Distribution-Based SMCM.
Based on formula (1), uncertain distribution-based similarity measure of cloud model (UDCM) can be defined as the following: be two cloud models; uncertain distribution-based similarity measure of cloud model between C 1 and C 2 is defined as the following: , |L| is the length of L, and U 1 (x) and U 2 (x) are uncertainties of x for C 1 and C 2 , respectively. In Definition 2, S(·, ·) is a similarity measure of the fuzzy set. erefore, UDCM is a framework of SMCM. ere are various UDCM depending on different definitions of S(·, ·). In this paper, we define S(·, ·) as the following: where A and B are two fuzzy sets on universal set U and | · | is the cardinal number of the fuzzy set. Figure 4 shows the illustration of UDCM. Two horizontal scatter diagrams represent the relationship between cloud drops and certain degree, i.e., first-order uncertainty. Two vertical curves represent the uncertainty of certain degree, where x � x 0 , i.e., second-order uncertainty. Owing to formula (3), S(U 1 , U 2 ) is the ratio of the blue area to the green area. From Figure 4, we know that Next, we try to calculate S(U 1 (x), U 2 (x)) for each x ∈ L. Its uncertainty is a fuzzy set: Firstly, we should find the relationship between membership functions of two fuzzy sets when their elements are equal. For each x ∈ L, let y 1 and y 2 be variables for membership functions of U 1 (x) and U 2 (x), respectively. Since we have en, So, (2) can be written as  Mathematical Problems in Engineering (8) It is obvious that membership functions of both fuzzy sets U 1 and U 2 are exponential parts of the Gaussian distribution for each x ∈ L, respectively. Hence, we only calculate the intersection and the union of two exponential parts respective with R N (En 1 , He 1 ) and R N (|x − Ex 1 /x − Ex 2 |En 2 , |x − Ex 1 /x − Ex 2 |He 2 ). We do not introduce the process of calculation in this paper; more details can be found in [38].
e remainder is to calculate the integral in formula (8). Unfortunately, its integrand is not an elementary function, and the result of the integral is not an analytic expression. We have to calculate it by the numerical method. In definition of integration, it satisfies the following equation (9) Function a ⌊ ⌋ obtains the maximal integer less than a. For convenience, we denote function S(U 1 (x), U 2 (x)) as Sim(x).
ere is a special situation of Definition 2 which is as follows.
Theorem 1. Let C 1 (Ex 1 , En 1 , He 1 ) and C 2 (Ex 2 , En 2 , He 2 ) be two cloud models; if they satisfy Ex 1 � Ex 2 and En 1 � En 2 , the UDCM of them is Proof. Since Ex 1 � Ex 2 and En 1 � En 2 , according to Definition 2, L � [Ex 1 − 3En 1 , Ex 1 + 3En 1 ], and |L| � 6En 1 , and |x − Ex 2 /x − Ex 1 | � 1. For each x ∈ L, the value of S(U 1 (x), U 2 (x)) is invariable. Without loss of generality, we assume He 1 ≤ He 2 . Due to En 1 � En 2 , there are Hence, the similarity of U 1 and U 2 is e UDCM of C 1 and C 2 is In order to illustrate UDCM has a high ability to distinguish two different concepts, we have the following theorem. Proof. To prove this lemma, we must employ measure theory. Sufficiency: suppose that f(x) � 1 holds on L, almost everywhere. Due to countable additivity of integration, where E(f � 1) means the set of x satisfying f(x) � 1.
Necessary: suppose n is a natural number.

Comparison with Other SMCM.
In order to demonstrate high discriminability of UDCM clearly as eorem 2 claimed, we calculate similarity among four cloud models, C 1 (10, 5, 1), C 2 (5, 2.5, 0.5), C 3 (10, 5, 0.4), and C 4 (10, 2, 2), using different similarity measures. It is obvious that C 1 , C 2 , C 3 , and C 4 are four different CMs. Table 4 shows the similarity values of different SMCM. For LICM, due to proportional numerical characteristics of two cloud models, the angle cosine of two vectors composed by numerical characteristics of C 1 and C 2 is equal to 1. It only distinguishes the difference in the shape of CM, which is unrelated to expectation. C 1 and C 3 have the same expectation and entropy. eir difference cannot be found using ECM. Analogously, the difference between C 1 and C 4 cannot be found using MCM because they have the same outer envelope. UDCM can distinguish the difference between any two different CMs from the perspective of uncertain distribution. It equips high discriminability which is significant to artificial intelligence domains. We will analyze its merits and demonstrate its performance in experiments as the following.

Classification of Time
Series. An appropriate SMCM is important for the time series classification. BCT reserves uncertain features in processing reduction of the dimensions. CM has been applied in many relative domains of time series [39]. Besides, similarity measure is the other critical factor to classifying after dimension reduction. In order to test the performance, we conduct the experiment to compare these measures based on the standard dataset. We download the synthetic  control chart dataset from the machine learning data repository, University of California at Irvine [40]. ere are 600 time series and 6 classes in this dataset. We randomly divide the dataset into six equal portions and successively adopt cross-validation based on these portions. For each database, the training set contains 500 data, and the testing set contains 100 data. We employ misclassification rate to evaluate performances of LICM, ECM, MCM, and UDCM as follows: As shown in Figure 6, the performance of UDCM surpasses other methods. LICM only captures tendency information, and the loss of distribution information results in high misclassification rate. Compared with ECM and MCM, UDCM utilizes more accurate uncertainty to measure similarity and can capture the second-order uncertain relationship to describe similarity more appropriately.

Shooting Experiment.
In order to verify similarity results in accordance with the uncertainty distribution of concepts, we measure the similarity of four shooters' performance. We suppose cloud models of four shooters' performance A (1, 2, 1), B(3.2, 6.2, 3.2), C(1.2, 2.1, 10), and D(2, 3, 1). Cloud drops represent the deviation of hitting from bull's eye. Results of shooting are denoted as the 10point system (off-target denoted as 0 point). Table 5 exhibits the score statistics of 100 times shooting simulated by FCT with respect to each shooter. e results of shooting and histograms of score statistics are shown in Figure 7. Similarities of score distributions are calculated by Jensen-Shannon divergence as where M � 1/2(P(x) + Q(x)). For X, Y ∈ U � A, B, C, D { }, similarities are defined as .
(21) Table 6 shows the similarity between A and other shooters by different similarity measures. After 100 times shooting, the sort of uncertain distribution similarities is  S(A, B). Compared with other SMCM, the similarity by UDCM is in accordance with the similarity of shooting result distribution. It can be seen that UDCM takes into account more comprehensive similarity than other conceptual intension-based SMCM.

Application in Multicriteria Group Decision-Making.
In multicriteria group decision-making (MCDM), the linguistic variable is a good choice to express personal sense.
ere are various linguistic variables applied widely in different fields, e.g., 2-tuple linguistic model [41,42], probabilistic uncertain linguistic model [43,44], intuitionistic fuzzy linguistic model [45], and hesitant fuzzy linguistic model [46,47]. All of the aforementioned linguistic models can only describe the fuzziness but not randomness. Wang et al. [22] proposed conversion between the linguistic variable and the cloud model, which has the ability to describe fuzziness and randomness in human cognition. Based on their work, other researchers utilized cloud model in group decision-making [27,48,49]. As shown in Exampled 1 and 2, we have to cost more running time to acquire correct results. In this section, we will demonstrate the merit of UDCM in MCDM.
Example 3 (continued from Example 2). As shown in Figure 8, the similarity between C 1 and C 2 is calculated by FDCM with different times of Monte Carlo simulation. e results shock around 1 with n increasing and exceed 1 sometimes. erefore, FDCM is not a real similarity measure of cloud model. Figure 9 Table 4: Similarity values of LICM, ECM, MCM, and UDCM.    0  1  2  3  4  5  6  7  8  9  10  A  0  0  2  2  4  3  12  17  19  29  12  B  42  6  7  5  9  9  6  5  5  5  1  C  37  2  5  4  2  9  9  13  10  9  0  D  2  7  4  10  11  11  11  18  10     To acquire a more accurate similarity, Wang et al. [27] claimed that FDCM has to execute no less than 50,000 Monte Carlo simulations to obtain a stable estimated overall score s. In the remainder, we calculate the stable FDCM with n � 2000. To verify our method validity in MCDM, we make a decision for the application example in [27] by UDCM. e experiment is run on a personal computer with Windows 10 and Inter (R) Core (TM) i7-7700 CPU 3.6 GHz, and DDR3, 16 GB memory. Matlab R2015b software is used. Let the consensus threshold δ � 0.9; we make the decision using FDCM and UDCM, respectively, in MCDM. eir final group decision matrices, ranking result, and cost time are listed in Table 7, where the parameter n is the number of times for executing the Monte Carlo simulation to obtain the overall score of the cloud model. Although the decision results are consistent, UDCM only takes one-sixth of the time by FDCM. Hence, UDCM is an efficient method for FDCM.

Conclusion
Similarity of concepts is a fundamental study in uncertain artificial intelligence. By utilizing FCT and BCT, bidirectional cognitive transformation between intension and extension of a concept is realized by the CM. Furthermore, CM reflects the uncertainty of qualitative concept itself and, meanwhile, reveals the objective relationship between probability and fuzziness in the uncertain concept. As a significant expression, distribution is always utilized to describe the uncertain phenomenon. Based on this, we propose a new similarity measure UDCM and introduce its calculation in detail. Due to employing complete uncertainty, it acquires similarity results in accordance with the uncertain distribution and then gives some valuable consultations in synthetic evaluation. Besides, UDCM has merits of discriminability and stability and is an effective tool for cognitive computing. Finally, UDCM is also a framework of SMCM. Employing different forms of second-order uncertainty will result in different results. In the future, selection of uncertain forms for different situations also deserves to be studied.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.