FUZZY DATA CLUSTERING IN THE RANK SCALE BASED ON A DOUBLE NEO-FUZZY NEURON

1Ph.D., Associate Professor of School of Educational Information Technology, Central China Normal University, Wuhan, China 2Dr.Sc., Professor, Professor of AI department, Kharkiv National University of Radio Electronics, Kharkiv, Ukraine 3Ph.D., Senior Researcher, Control Systems Research Laboratory, Kharkiv National University of Radio Electronics, Kharkiv, Ukraine 4Post-Graduate student, Kharkiv National University of Radio Electronics, Kharkiv, Ukraine


NOMENCLATURE
ANFIS is an adaptive neuro-fuzzy inference system; ( ) x k is a vector of input values; ( ) x k is the i -th component of ( ) x k ; ( ) y k is an output signal of the double neo-fuzzy neuron; ji μ is a membership function of the i -th component in a vector of input values for the j -th centroid; ( ) ji w k is a synaptic weight for the j -th membership function of the i -th component; i NS is the i -th neo-fuzzy synapse; ( ) e k is an error at a time step k ; E an objective function; ( ) d k is an external reference signal; η is a learning rate parameter that usually defines a convergence speed of a learning process; n is a number of neo-fuzzy synapses; i m is a number of membership functions; ( ) u k is an output signal of the first layer of the double neo-fuzzy neuron; ji c is a center of the j -th membership function of the i -th component; x is a linguistic variable; i r is a corresponding rank; D a sample of learning signals.

INTRODUCTION
Hybrid systems of Computational Intelligence and especially adaptive neuro-fuzzy systems are currently popular for solving pattern recognition, classification etc. tasks under conditions of substantial uncertainty about a data distribution's nature and mutual classes' overlapping [1][2][3][4].
Main drawbacks of these systems are their bulkiness (for example, a five-layer ANFIS [2] and other similar systems) and a low convergence speed for corresponding learning algorithms that requires big volumes of training samples.
To overcome these shortcomings, an architecture of a neo-fuzzy neuron was introduced for neuro-fuzzy systems [5,6] which is similar in its architecture to a traditional formal neuron with n inputs, although it contains nonlinear synapses  The standard square error is commonly used as a learning criterion of the neo-fuzzy neuron Minimizing this function with the help of a gradient procedure leads to a learning algorithm [6] ( 1) ( ) ( ) ( ( )) that makes introduction of a hidden defuzzification layer unnecessary (one can usually find this layer in most of neuro-fuzzy systems).An architecture of a double neo-fuzzy neuron was introduced in [7] to improve approximating properties of the neo-fuzzy neuron.

PROBLEM STATEMENT
Initial data for solving the fuzzy clustering task is a set (a sample) of images formed by N n -dimensional feature vectors { } (1), (2),..., ( ),..., ( ) is a rank of a specific value of a linguistic variable in the i -th coordinate of the n -dimensional space for the k-th observation) and a sample of reference signals where 0 ( ) ( ) , is a rank of a reference signal's value in the sample D. After the double neo-fuzzy neuron has been learnt in data ranking, a partition of an initial data array X into 0 m overlapping classes with membership functions 0 l μ of the k -th image to the l -th class should be provided.

REVIEW OF THE LITERATURE
A new relational fuzzy clustering method (FRFP) was proposed in [8].This procedure is based on determining a fixed point of a function of the desired membership matrix.The produced membership matrices are in some sense less crisp than those produced by NERFCM and more representative of the proximity matrix that is used as input to the clustering process.
An algorithm for finding a fuzzy membership matrix in case of numerical and categorical features is described in [9].A set of feature vectors with mixed features is mapped into a set of feature vectors with only real-valued components.The only condition for this situation is that a new set of vectors has the same proximity matrix as the original feature vectors.And then FCM is used for clustering of the new set of vectors.
An article [10] introduces another fuzzy relational clustering method for finding a fuzzy membership matrix.Objects' clustering may be performed by describing objects in terms of feature vectors as well as on the basis of relational data.And the relational data may be considered in terms of proximities between objects.Finally, proximities between membership vectors should be proportional to proximities between objects.Components' values of a membership vector corresponding to an object are membership degrees of an object in various clusters.In other words, a membership vector is just a sort of a feature vector.
A method where unknown numeric variables are assigned to ordinal values is described in [11].Minimizing a clustering objective function means finding numeric values for these variables.The proposed method clustering utilizes the same objective function which is used by the FCM algorithm except the fact that both a membership function and an ordinal to real mapping are determined by the gradient descent method.
An introduced clustering method [12] is based on a modified FCM version.Input data features are considered as linguistic variables.A set of fuzzy numbers actually describes any feature.The modified method makes it possible to find an appropriate number of clusters.Besides that, it's claimed that this algorithm possesses some improved robust characteristics when compared to the traditional FCM.
A new algorithm for developing a mapping of ordinal values into numerical ones is presented in [13] for which a measure of dissimilarity exists.This method is a part of the well-known FCM procedure.The modified algorithm demonstrates better data partition into clusters as well as an ordinal-numerical mapping that reveals hidden structural knowledge of an ordinal feature.
A clustering algorithm is presented in [14] for data sets of mixed features (nominal, numerical and ordinal).The algorithm aims at reducing a negative effect from noise.An optimization function utilizes the likelihood for each individual feature as an optimization criterion for similarity or likeliness between patterns and clusters which is a quite opposite idea compared to FCM based on distances or the EM clustering algorithm.It's claimed that this method can quickly find fuzzy clusters with different distributions in each feature level.
An article [15] discusses a method for representation of ordinal values by fuzzy sets on a fixed interval of real values.This procedure contains centroids determined by a frequency distribution on ordinal values.Then triangular and trapezoidal fuzzy sets can be found that have these centroids by means of the gradient descent method.The obtained fuzzy sets do not share parameters which would be the case if an end point of one fuzzy set was a vertex of an adjacent fuzzy set.
That's why we would like to propose a new fuzzy clustering method based on the double neo-fuzzy neuron designated for clustering rank (ordinal) data.At the same time, this method possesses robust characteristics.

MATERIALS AND METHODS
An architecture of the double neo-fuzzy neuron is shown in Fig. 1.
It consists of two layers: the first layer is formed by n nonlinear synapses i NS with i m membership functions and synaptic weights for each nonlinear synapses; the output layer is formed by a nonlinear synapses 0 ) and synaptic weights 0 l w .If an image ( ) x k is fed to the input of the double neofuzzy neuron, an output signal is x k w w It can be seen that a value of the output signal of the double neo-fuzzy neuron is defined by components' ( ) x k values for an input image as well as by values of 0 1 ∑ membership functions and their corresponding synaptic weights.
Triangular membership functions that are equally distributed in the interval [0,1] and meet the condition (1) can be written down like ..., , 0 otherwise, , ..., , 0 otherwise, (here ji c and 0 l c are centers of corresponding membership functions).Wherein a distance between centers is constant for each nonlinear synapse and can be written like , Using the Ruspini (unity) partition leads to the fact that only two neighboring membership functions are activated at every learning step.Denoting these membership functions ), we can write down Thus, the double neo-fuzzy neuron provides piecewiselinear approximation of some nonlinear separating function in the form where parameters ( ) i a k and ( ) i b k to be tuned are defined by both values of corresponding membership functions and trained synaptic weights.
The double neo-fuzzy neuron (just like the traditional one) is designated for data processing given in the scale of natural numbers.A situation is getting substantially more complicated when initial data are given in the ordinal (rank) scale.This sort of cases can be often encountered in sociology, economics, medicine, education, etc. [16].For a one-dimensional case this information is given in the form of an ordered sequence of linguistic variables x is actually a linguistic variable, i r is a corresponding rank.It was offered in [12,15,17] to perform fuzzification for initial data based on the occurrence frequency distribution analysis of specific linguistic variables for processing data given in the ordinal scale.It was also assumed that these distributions were subject to the Guassian law.There was an approach in [18] that was not associated with the normal distribution hypothesis which we will use in the future.Thus, initial data for solving the fuzzy clustering task is a set (a sample) of images formed by N n − dimensional feature vectors { } (1), (2),..., ( ),..., ( ) is a rank of a specific value of a linguistic variable in the i-th coordinate of the n − dimensional space for the k -th observation) and a sample of reference signals

{ }
(1), (2),..., ( ),..., ( ) is a rank of a reference signal's value in the sample D. After the double neo-fuzzy neuron has been learnt in data ranking, a partition of an initial data array X into 0 m overlapping classes with membership functions 0 l μ of the k -th image to the l -th class should be provided.
The fuzzification procedure for a sequence of rank linguistic variables should be considered by an example of a one-dimensional sample (1), (2),..., ( ) x N where every observation ( ) x k may be assigned to one of the ranks i r , 1, 2,..., 0.5 These membership functions are computed with the help of expressions similar to the expressions ( 2)- (7).The only difference is that one should use instead of the expression (2), instead of the expression (4), instead of the expression (5), instead of the expression (7).
A gradient minimization procedure with a variable parameter of a search step ( ) i k η should be used for learning the double neo-fuzzy neuron.An algorithm for tuning an output synapse 0 NS can be written down in the form ( 1) ( ), 1.
Thus, the weights are adjusted at every iteration step.These weights correspond to activated membership functions 0 p μ and 1,0 p+ μ .To increase the convergence speed and to introduce additional smoothing properties, it's expedient to use an algorithm in the form of [19] which coincides (when 0 α = ) with the one-step optimal algorithm by Kaczmarz-Widrow-Hoff [20,21] and when 1 α = it coincides with the stochastic approximation procedure by Goodwin-Ramage-Caines [22,23].
To tune the synaptic weights of the first layer, a learning criterion can be written down like 0 Introducing a derivative 0 0 the gradient minimization procedure ( 15) can be written down in the form ( ), 1 and using optimization techniques from [24,25], we come to a simple and effective learning procedure for nonlinear synapses of the first layer 1 0 ( ) ( ( 1)) ( ( 1)), ( 1) ( ), 1, which completely structurally coincides with the procedure (14).

EXPERIMENTS
We chose the sample Nursery from the UCI Repository [26] for the first experiment to demonstrate effectiveness of the proposed neuro-fuzzy system based on the double neofuzzy neuron and its learning method for our first experiment.The sample contained 12958 observations with 8 ordinal attributes for each one.The sample was divided into a training one and a test one with a ratio 70/30.
A set of experiments was performed for the second experiment for an adaptive fuzzy clustering method based on the double neo-fuzzy neuron.A synthetic sample was created for this purpose that consisted of 3 non-overlapping clusters, and 20% outliers were additionally added.The generated sample was ranked in heterogeneously broken ranges from 1 to 7 in order to obtain ordinal attributes.The sample is shown in Fig. 2. There's a separate enlarged sampling area on the right where outliers have been previously cut off.

RESULTS
We were comparing results obtained from architectures based on the traditional neo-fuzzy neuron, the double neofuzzy neuron and the extended neo-fuzzy neuron [27][28][29][30][31][32].During the experiment, we were measuring the clustering accuracy for the training sample and the test one as well as the system's learning speed.The results are demonstrated in Table 1.
One can see clustering results of the second experiment with the help of the FCM algorithm (Fig. 3) that's known for its instability to outliers and partition with the help of the adaptive fuzzy clustering methods based on the double neofuzzy neuron.Outliers don't influence the result when we apply the adaptive procedure while traditional methods are hypersensitive to observations that lie far enough from all prototypes (centroids).

DISCUSSION
We could come to a conclusion that if we need higher accuracy we should use either ENFN or DNFN (here everything depends on a level of accuracy you'd like to achieve).But if there's a system that's characterized by some speedy reasoning, probably the traditional NFN would be of a great benefit.Speaking of the demonstrated performance, DNFN was 12% slower than the conventional NFN and 20% faster than ENFN.Speaking of forecasting accuracy, DNFN was 35% better than NFN and about 50% worse compared to ENFN.
As it can be seen in Fig. 3, FCM and the proposed method demonstrated different partition.Although both algorithms divided all the data into 3 clusters but FCM assigned all outliers/tails into one cluster that was previously determined (Fig. 3a).At the same time, the proposed method carried out a more reasonable partition.A part of outliers/ tails was assigned to a new cluster.Frankly speaking, the adaptive fuzzy clustering method performed more like what a human expert would do.It demonstrated the fact that it was more robust compared to FCM.
Considering the results, we can assert that the proposed system provides high-quality clustering for ordinal data.The additional layer if the proposed neuro-fuzzy system makes it possible to increase the clustering accuracy, although it may take a little longer if we talk about the learning time.

CONCLUSIONS
It has been widely known that pattern vectors to be clustered may have attributes of various types including ordinal.Ordinal attributes with values such as «poor», «very poor», «good», and «very good» are neither entirely numerical nor entirely qualitative.That's why it leads to some troubles with clustering since it doesn't make any sense to take differences of values for these ordinal attributes as it is required for finding distances between pattern vectors.
traditional synaptic weights).These nonlinear synapses are formed by a set of triangular symmetrical membership functions ji μ ( 1, 2,..., i j m = ) which are equally distributed in the interval [0,1].Each membership functions is connected to its own adjustable weight ji w .An output of the neo-fuzzy neuron in response to a number of an image ( )x k in a training set or a current discrete time item) can be written down is a current meaning of an adjustable synaptic weight in a time moment k for the j-th membership function of the i-th component of an input signal.

Figure 1 -
Figure 1 -An architecture of the double neo-fuzzy neuron

N
times in the dataset.Then a relative occurrence frequency may be introduced for the i r -th rank, 1, 2,3,..., obtained frequencies, asymmetrical unevenly located membership functions ji μ and 0 l μ are formed with centers which are calculated with the help of recurrent ratios 1 1