SimE : A Geometric Approach for Similarity Estimation of Fuzzy Sets

The characteristics of a fuzzy set are decided by its membership function. This work aims to provide a geometric approach for enhancing the design and performance of fuzzy systems. Similarity Estimator (SimE) evaluates the membership functions of fuzzy sets on Euclidean space based on geometric area. The overlapping regions between the sets are partitioned into geometric structures. The area of overlapping is computed by summing the area of polygons and integrating the area under curves. Similarity between fuzzy sets is directly proportional to the area of overlapping between them. SimE was tested over a range of real numbers with finite intervals. Fuzzy sets using different membership functions were created for the same data distribution. From the test results it can be inferred that fuzzy sets defined using triangular membership functions have a minimum overlapping area when compared to fuzzy sets defined using other membership function. Optimal overlapping area of fuzzy sets improves the semantic representation and the performance of the system. SimE can be used by knowledge engineers to design efficient fuzzy systems.


INTRODUCTION
Large amount of data are generated and manipulated everyday but often due to human errors and system failures the data may become noisy, ambiguous and redundant.Automated systems and computer-aided systems that use data for control and decision making have to use this incomplete, vague and imprecise data.Though data preprocessing techniques such as cleaning, transformation, normalization and reduction make the data suitable for use, the quality of the data can still be improved to enhance the reasoning and decision making ability of these systems.Fuzzy set theory and soft computing techniques are capable of handling imprecise, vague and incomplete data in automated tasks (Isermann, 1998).
The first fuzzy logic controller was developed in the mid-seventies (Mamdani and Assilian, 1975).It was used to represent the linguistic terms and the uncertainties involved in the control system, using a set of heuristic control rules, fuzzy sets and fuzzy logic.Ever since the birth of fuzzy theory, it is progressively applied increasingly in various fields of engineering and medicine (Soufi et al., 2016;Jane et al., 2016;Jacophine Susmi et al., 2015).Generally, a fuzzy logic based system consists of four modules: a fuzzification interface, knowledge base (fuzzy rule base), inference engine and the defuzzification module.
There are no fixed rules or design patterns for the design and development of fuzzy systems.The efficiency and effectiveness of a fuzzy system depends on several factors such as the choice of membership functions, their input values, the area of intersection among them, the information content of the knowledge base and also the skill, domain knowledge and experience of the designer.
Membership Functions (MF) are easily identifiable when human expertise for the domain of application is available However, for other applications it has to be identified and designed only by a system designer or knowledge engineer.The choice of a membership function and its design are important for any application of fuzzy set theory as they decide the overall performance of the fuzzy system (Zimmermann, 1996;Jang et al., 1997).Better design of a fuzzy system leads to enhanced reasoning and performance.
The purpose of a fuzzy system is to handle uncertainty with human reasoning.The conjunctive (AND) and disjunctive (OR) reasoning model is used for combining the membership functions.However this Fig.1a: Fuzzy sets with no similarity Fig. 1b: Similar fuzzy sets model is not the general way of human reasoning in many situations.Furthermore, a fuzzy system is designed with same level of membership space [0, 1] for all fuzzy sets.Equal importance is given to all the linguistic variables used in the design which is not the actual scenario in the real world.
This study proposes Similarity Estimator (SimE); an approach where fuzzy sets are said to be similar based on their overlapping region.Consider the following example: Let A and B are two fuzzy sets as shown in Fig. 1a.Both fuzzy sets are exactly same in shape but the Euclidean space enclosed by the sets is not similar, hence there is no overlapping between the sets.However, in Fig. 1b, C and D are different in shape but they are more similar as the region (Euclidean space) represented by the sets are similar.A fuzzy system with few efficiently designed fuzzy sets and few rules has lower computational demands.Therefore, the region of overlapping between the fuzzy sets is an important measure in designing any fuzzy system.

PRELIMINARIES
This section presents the basic concepts and mathematical models of fuzzy sets, Membership functions and fuzzy set similarity.Fuzzy set: Fuzzy set is a set whose elements have a degree of membership.Fuzzy set 'A' on the universe of discourse X is characterized by a membership function, which associates with each element 'x' a real number in the interval [0, 1].A Fuzzy set can be defined mathematically as follows: where, µ A (x) is called Membership Function (MF) for the fuzzy set A. X is universe of discourse X = {x j /j = 1, 2, …, m} (Zadeh, 1965).The MF maps each element of X to membership grade ranging from 0 to 1: 0 ≤ ߤ ሺ‫ݔ‬ሻ ≤ 1.

Membership functions:
The membership grade of an element in a fuzzy set is determined by its membership function.There are different types of MFs namely Triangular, Gaussian, Trapezoidal, Bell-shaped, linear, sigmoidal and polynomial membership function (Jang et al., 1997).This work considers four membership functions namely, Triangular, Gaussian, Trapezoidal and Bell-shaped membership functions for testing SimE.

Fuzzy set similarity:
The fuzzy set similarity is a measure of approximate equality between fuzzy sets which describe the relationship among fuzzy sets.Similarity plays a major role in decision making, classification and clustering.It is a major criterion in deciding the number of fuzzy (Jager and Benz, 2000;Setnes et al., 1998) sets for a fuzzy system design.
The following section reviews some of the works in literature that use different membership functions and fuzzy set similarity measures.Zhao and Bose (2002) have evaluated triangular, Gaussian, trapezoidal, two sided Gaussian and bell shaped MF for a fuzzy control system.Sensitivity has been analyzed and the effect of different types of membership functions in the fuzzy control system for speed-controlled induction motor drive was compared.The result indicates that triangular MF gives better performance than other MFs and the next identified best performer is trapezoidal MF.Hasuike et al. (2014) have developed a construction algorithm to obtain an appropriate MF.The construction approach is based on both fuzzy Shannon entropy with smoothing function and interval estimation derived from human cognitive behaviour and subjectivity.They have introduced natural assumptions for the fuzzy Shannon entropy and the smoothing function.Hence the approach proposed is suitable for constructing any type of membership functions.However their approach is not appropriate for continuous spaces.Alikhademi and Zaianudin (2014) have developed a framework based on Particle Swarm Optimization (PSO) to obtain the MF from quantitative data.In their framework an appropriate MF is generated for each input variable using PSO and it is optimized using S and Z fuzzy shapes.Using PSO, appropriate MF is obtained iteratively and the generated MF is used to transform quantitative data into fuzzy data.Their fuzzification approach is tested for typical data mining application (classification) to select an optimal set of association rules.Bera et al. (2014) have determined MF for fuzzy random variable.Fuzzy random variable follows a normal distribution with imprecise definition of mean and standard deviation.Probabilistic distribution and cumulative density function both are used to generate the MF of fuzzy random variable.In their work, cumulative density function used to fix the lower and upper bound of MF.They have demonstrated the technique on fuzzy random variable (temperature change) for a temperature controller.They have taken alpha-cut value and percentile value as evaluation parameter.From the result they inferred that the triangular membership function misinterprets uncertainty compared to the trapezoidal membership function.Jiménez et al. (2014) have computed the degree of similarity between the fuzzy sets by performing the fuzzy operations, such as union and intersection over the fuzzy sets.By means of determining degree of similarity between the fuzzy sets they have minimized the number of different fuzzy sets required to model the fuzzy classifier.They have taken the threshold value as 10 percent for the maximum similarity degree of fuzzy sets.The value was accepted in the scientific community in order to obtain more interpretable fuzzy classifier model.By minimizing number of fuzzy sets, number of rules is also minimized.They have tested the similarity measure technique with dataset collected from the Health Information System of the Incentive Care Burn Unit (ICBU) from 1999 to 2002.In the experiment they have used fuzzy sets with Gaussian membership functions.From their results they inferred that greater the similarity between the fuzzy sets, makes the system more ambiguous and the model is hardly interpretable by a human.

LITERATURE REVIEW
Several methodologies have been proposed in literature for determining MF for fuzzy systems.Generally, determination is based on data distribution, data intervals and elements of a set.The works of Alikhademi and Zaianudin (2014) emphasize on estimating the similarity between fuzzy sets based on elements of the sets.Moreover there are works that highlight geometrical operations on fuzzy sets (Bogomolny, 1987;Rosenfeld and Haber, 1985) but they have not used geometric approaches as a measure of similarity to evaluate fuzzy sets.
In this paper, a novel method called SimE, is devised to evaluate the design of fuzzy sets based on the geometric area of intersection of fuzzy membership functions.The approach may be used as a measure to estimate the similarity between fuzzy sets.

SIMILARITY ESTIMATOR
Similarity Estimator (SimE), is an algorithm that uses the geometric area of overlapping for analyzing the similarity between the fuzzy sets.Different MFs are analyzed based on the region of overlapping between the fuzzy sets for a given data distribution: SimE works in 3 steps: First, fuzzy sets are created using different MFs (such as Triangular, Trapezoidal, Bell and Gaussian) for the same input data.Second, similarity between the fuzzy sets is computed based on area of overlapping.Third, overlapping area between the fuzzy sets for each membership function is analyzed.
Step 1: Parameters (ℒ and ࣯) needed for constructing fuzzy sets using triangular MF are obtained from the input data.The membership function specific parameters are extracted from ℒ and ࣯.
For triangular MF, the parameters are starting point (a), end point (b) and center (c).The mapping of parameters for a triangular MF is defined in ‫,‪݂݅݉ሺℒ‬ݎݐݏ݊ܿ‬ ࣯ሻ function and illustrated in Fig. 2. Step 2: Similarity is computed for the generated fuzzy sets.Similarity is measured in terms of the area of overlapping.For computing the area of overlapping between fuzzy sets, two separate algorithms are used: The first algorithm is for computing area under curves and the second is for computing overlapping area of polygonal shapes.The former is defined by the function ‫ܣܱ‬ ௨௩ ሺॸॲሻ and the latter is defined by the function.

RESULTS AND DISCUSSION
To analyze the performance of the proposed approach, experiments were performed over two data distribution.The inputs to construct the fuzzy sets are the lower limits and the upper limits of the intervals over a range of real numbers.Fuzzy sets are constructed by extracting the parameters for different membership functions such as Triangular, Gaussian, trapezoidal and Bell.Then overlapping areas of fuzzy sets using different MF are computed and analyzed.Fuzzy sets with minimum overlapping area are considered to be less similar.Table 1 and 2 present the information about the two data distributions (ranges and intervals) used to test the SimE.
Figure 6 to 9 present the fuzzy sets generated for data distribution 1.The fuzzy sets are characterized by triangular, Gaussian, trapezoidal and bell membership function respectively.Figure 10     however, it is computed for theoretical purposes.This computation may be used for repairing fuzzy sets when they are randomly generated or evolved using an evolutionary algorithm.Table 3 and 4 present the area of overlapping between the adjacent fuzzy sets and also the total area of that fuzzy variable.From the experimental results it can be observed that fuzzy sets with triangular MF have a minimum overlapping area compared to fuzzy sets characterized by other MFs.

CONCLUSION
Set theory based similarity estimators are well known and also frequently used in the design and development of fuzzy systems.The expressiveness of geometric models for fuzzy system design is still not completely exploited by the fuzzy logic community.The proposed SimE is just yet another fuzzy systemdesign evaluator based on the similarity between fuzzy sets.The requirement of background domain knowledge for designing fuzzy systems based on geometric models, such as SimE, is less when compared to set theoretic models.SimE can be applied for designing fuzzy systems when the parameters of the fuzzy membership functions are not well-defined.Furthermore, there are several works in literature that use evolutionary approaches for generation of fuzzy sets.Evolutionary Fuzzy systems (Amaral et al., 2002;Celikyilmaz and Burhan Turksen, 2007) are becoming more accepted in engineering and design.Random fuzzy sets are initially created and iteratively improved to evolve an optimal fuzzy system.In such scenarios, SimE can be used to obtain a combination of fuzzy sets with optimal similarity and overlapping area.Geometric area of similarity between fuzzy sets decides how efficiently uncertainty and vagueness is represented and handled by the fuzzy system.SimE can be used to design systems with optimal similarity for different membership functions.

Table 2 :
Interval limits for data distribution 2

Table 3 :
Overlapping area for data distribution 1