A novel intuitionistic fuzzy similarity measure based on double sequence by using modulus function with application in pattern recognition

: In the field of pattern recognition, clustering is used to group the data into different clusters based on the similarity among them. There are a number of clustering techniques developed in the past using different distance/similarity measure. Due to the high versatility in data, researchers have used various distance measure like Hamming distance, Euclidean distance etc. to solve the clustering problems. In this paper, we proposed a novel similarity measure based on the double sequence space and modulus function. Also, to handle the uncertainty of data, Atanassov intuitionistic fuzzy set were used. Experimental simulation is performed on the real-world problems viz. car data and medical diagnosis problems and shows that the results are outperformed


Introduction
Zadeh lays down the basic principles of a strong logic system called fuzzy logic (Zadeh, 1965). The fuzzy logic considers that each element x i of the universe of discourse X could not be precisely defined, even though information about them is complete. Thus, it is in this sense that only membership value was assigned to each element x i of the universe of discourse X and in this case non-membership value will be one minus membership value. Atanassov (1986Atanassov ( , 1999 offers another layer of explanation of fuzzy sets known as Atanassov intuitionistic fuzzy sets (AIFS). He defined AIFS to deal with those cases where information regarding the elements is both incomplete as well as imprecise. Thus, hesitancy is revealed over assigned membership and on the non-membership value of each element of X. Hence, in AIFS non-membership value need not to be equal to the complement of membership value. It is noteworthy to mention that information about the elements of X is inversely proportional to the hesitancy value present in AIFS. As information about the elements of X increases, then hesitancy value in AIFS decreases. Now it can be said that if information regarding elements of X is complete, then hesitancy value turns zero and AIFS converts to a fuzzy set. AIFS and fuzzy sets have various applications in many fields such as Clustering (Khan & Lohani, 2016;Xu, Chen, & Wu, 2008), Control System (Lee, 1990), Artificial Intelligence (Vas, 1999), Bacteria Recognition (Khatibi & Gholam Ali, 2009) etc.
The similarity between AIFS A 1 and A 2 is measured by the help of similarity measure, which is usually derived from distance measure. The transformation of similarity measure into distance measure is given in Hung and Yang (2004). Majority of the literature (Dengfeng & Chuntian, 2002;Gupta & Kanwar, 2016;Khan, Alamri, Mursaleen, & Lohani, 2017;Li, Olson, & Qin, 2007;Liang & Shi, 2003;Mitchell, 2003;) available on similarity measures consider AIFS as a point, thus, these measures calculate similarity between points. The similarity measure induced by Hausdorff distance measure (Hung & Yang, 2004) computes the distance between two sets. Hence, it was effectively used over linguistic variables. The Sugeno integral-based similarity measure was proposed by Hwang, Yang, Hung, and Lee (2012) with application in pattern recognitions. The distance measure introduced in Mursaleen and can be utilized to deduce some similarity measures. Similarity measure has many applications in different area such as Pattern Recognition , Decision Theory (Chen & Ng, 2004), and Machine Learning (Cristianini & Shawe-Taylor, 2000).
Clustering is a classification technique that is the task of organization of the data into groups. The aim of the clustering is that the objects in the same clusters must be similar to one another but dissimilar to objects in another cluster (Everitt, Landau, & Leese, 2001). It is a procedure to handle unsupervised learning problems which appear in pattern recognition. The distance measure has an important role in recognizing patterns, so most of the AIFS-based clustering techniques were developed using them. We have not come across to any distance/similarity measure which offers a guaranteed good result for every clustering problems. Thus, in place of a single distance/similarity measure, several types of distance/similarity measure were explored over different problems. In the field of fuzzy clustering, the major contribution came from the pioneering work of Bellman, Kalaba, and Zadeh (1964), Bazdek (2013) and many more (Xu, 2009;Xu, Tang, & Liu, 2011, Xu & Wu, 2010Xu, Xu, Liu, & Zhao 2013).
The motivation of the paper is to introduce a new distance measure BV p for the clustering of AIFS, while using the theory of double sequence space and modulus function. Since, in clustering/classification problem, the output results remain unchanged whether it is derived from double sequence or single sequence version of popular Hamming/Euclidean distance measures. Therefore, double sequencebased distance measure could not motivate researchers for its application over a real-world problem. For this reason, practical application of double sequences could not be given. We noticed that the divided difference operator Δ 11 (Δ 11 x mn = x m, n − x m−1, n − x m, n−1 + x m−1, n−1 , introduced for double sequence) changes the input values in the distance measure of the double sequence. So its output values differ from the values obtained by the corresponding single sequence-based distance measure (which involves simple difference operator Δ (Δx n = x n − x n−1 )). The equivalence between two independent components of AIFS with the two variables of the double sequence helps us in defining AIFS-based BV p similarity measure. We used BV p similarity measure to solve car data-set problem  and then we compared the result with the result of .
The paper altogether contains five sections. In Section 2, we recall some basic definitions essential for the understanding of the research work. In Section 3, we proposed BV p similarity measure and proved its interesting mathematical property. In Section 4, we have used BV p similarity measure in IFSC algorithm for clustering of car data-set. The obtained result is compared with the result of . In Section 5, we added one more data-set known as medical diagnosis data for further justification of the proposed similarity measure. Finally, the conclusion is stated in Section 6.

Double sequence (Mursaleen, 2003)
A function f k :ℕ × ℕ → Ω. is called the Double sequence in Ω and is denoted by f k = x m, n . Under point wise addition and ordinary scalar multiplication Ω forms a vector space.

p-bounded variation of double sequence (Mursaleen & Mohiuddine, 2014)
A double sequence x m, n that satisfy the following property:

Modulus function (Nakano, 2014)
A continuously non-decreasing function :ℝ + → ℝ + is said to be modulus function if it satisfy the following conditions: (Atanassov, 1986(Atanassov, , 1999 AIFS A in X is defined as,

Atanassov intuitionistic fuzzy set (AIFS)
where A :X → [0, 1] and A :X → [0, 1] are the membership and non-membership functions assigns over x in X with respect to A such that, is the hesitancy function associated with x.

Association matrix (Xu et al., 2008)
An Association matrix is defined as W = (a ij ) m×m , where a ij = a(A i , Aj) are association coefficients between A i , A j (i, j = 1, 2, … , m).

Composition of matrix (Xu et al., 2008)
Let W = (a ij ) m×m be an association matrix. Then, composition matrix of W is defined as min a pk , a kq .

Equivalent association matrix (Xu et al., 2008)
An association matrix W = (a ij ) m×m is called equivalent association matrix if W 2 ⊂ W, i.e. a pq ≥ max k min a pk , a kq .

-cutting matrix (Xu et al., 2008)
-cutting matrix of the equivalent association matrix … , x k c } and its each feature x k j is AIFS. The first step of defining BV p similarity measure requires an arrangement of elements of A k and this arrangement remains fixed for all k. In our case, we do not disturb the existing arrangement of elements in A k . AIFS consider membership value and non-membership value as the two independent variables. Since, double sequence {x m, n } also contain two independent variables m and n, which varies over ℕ (set of natural number). So for each m, when n varies over ℕ, it produces an ordinary single sequence and this is replaced by membership values of x k j , where 1 ≤ j ≤ c and 0 for j > c. The range set of sequence of membership values is represented by MAIFS= k 1 , k 2 , … , k c , 0 . In similar fashion, NAIFS= k 1 , k 2 , … , k c , 0 is the range set of sequence of non-membership values of AIFS. As AIFS consist of three components , and , in which depends on the independent components and , so the range set corresponding to is taken to be c + 1 dimensional zero vector. Now on varying m over ℕ, In simple words, the matrix form of double sequence is used for description of A k . Hence, correspond- )} its matrix form is given below: for the sake of simplicity, let us write where f k m, n = A n k when m = 1; n = 1, 2, … , c and f k m, n = A n k for m = 2; n = 1, 2, … , c.
We denote the set of all transformed AIFS, f k = Proof It is easy to prove that the proposed distance/similarity measure BV p satisfy property (1)-(3) of Definition 2.5. To prove triangular inequality, let x, y, z ∈ D m ( AIFS) which is such that x ≤ y ≤ z (that is for every elements in the matrix x is less or equal to corresponding element in the matrix y, similarly z) we can write using triangular inequality of mod function and Definition 2.3.

Now using Minkowski inequality
Thus, (D m ( AIFS), 1 − BV p ) is metric space ✷ In order to compute similarity between f k and f t (f k , f t ∈ D m ( AIFS)), we induced the similarity measure BV p of double sequence from the distance measure 1 − BV p (to know more about interrelation between similarity measure and distance measure see Hung & Yang, 2004). Hence, similarity measure of double sequence is defined as, here notations has same meaning as Theorem 3.1. Proof Let f k , f t ∈ D m ( AIFS) such that and then we have be any arbitrary element of D m ( AIFS) and be any real number such that | | < 1.
for all n = 1, 2, … , c. Thus, f k ∈ D m ( AIFS) and hence D m ( AIFS) is a balance set. ✷

Application of BV p in clustering
For clustering of the car data-set , we implemented BV p similarity measure (for the sake of experimental simplicity we take p = 1 and modulus function as an identity function). The clustering performance of BV p is compared with the clustering results of .
BV p similarity measure is used in place of association coefficient for computing similarity in IFSC-Algorithm of association coefficient matrix method  as follows: (1) Calculate similarity between AIFSs using BV p similarity measure and then using these coefficient construct association matrix W.
(2) If constructed matrix W is an equivalent association matrix, take its -cutting matrix. If not, then by composition find out W n (n = 1, 2, …) and take -cutting matrix.
(3) Apply single linkage algorithm find out the clusters from the dendrogram.

Clustering of car data-set
The car data-set is taken from . It has 10 cars A i (i = 1, 2, … , 10), whose performances depends upon the six features: Fuel Economy, Aerodynamic Degree, Price, Comfort, Design, Safety (see Table 1).     Using BV p similarity measure, the similarity matrices W, W 2 , W 4 , W 8 for car data-set are computed. Since W 8 = W 4 hence by the Definition 2.10, W 4 is an equivalence association matrix. Hierarchical clustering tree for W 4 is shown in Figure 1. We deduce all possible ranges of while analyzing the equivalent matrix W 4 . Each range corresponds to a different clustering arrangement (see Table 2).
The results obtained by association coefficient method  is shown in Table 3. The third row of Table 3 claims that similarity level among the cars A 2 , A 3 , A 5 , A 7 , A 10 is not more than 81.1

Application of BV p in medical diagnosis
For more justification of the proposed double sequence-based similarity measure, we included one more data-set known as Medical diagnosis data (Boran & Akay, 2014;Own, 2009;Szmidt & Kacprzyk, 2001;Wei, Wang, & Zhang, 2011). Medical diagnosis problem consist for the four patients Al, Bob, Joe, Ted, set theoretic notation P = { Al, Bob, Joe, Ted}. To detect a accurate diagnosis for every patient p ∈ P, on the basis of symptoms S and using BV p , we have calculated the similarity between a diagnosis and all patients. We repeat the process for all the diagnoses d ∈ D and present all resulting similarity degree in Table 6. In this table, diagnosis with higher degree of similarity is suggested by the similarity measure BV p . According to the similarity degrees in Table 6, Al have Viral Fever, Bob stand with Stomach problem, Joe suffers from Typhoid,  and Ted carry Viral Fever. We mention the relation of the patients and their respective diagnosis in comparison and simultaneously compare our result with the other existing similarity measures (see Table 7).

Conclusion
The distance measure of the double sequence of bounded variation was used to derive BV p similarity measure to classify the AIFS. In this work, AIFSs (A k ) is converted into a matrix f k of size 3 × (c + 1) (c is the cardinality of A k ), which represents the simplest form of the double sequence. The first and second rows of the matrix f k contain the elements of MAIFS and NAIFS. However, all columns of the last row of the matrix f k is kept to zero. A real-world example of car data-set is used to classify using the BV p similarity measure. In this paper, we derive BV p similarity measure by utilizing the double sequence and represents its real-world application for the first time. However, there is a vast scope to generalize it for the other forms of double sequence. In future, the application of double sequence will bring a new research domain for the researchers to improve the results of machine learning or pattern recognition.