Fundamental Concepts on Classification and Statistical Implicative Analysis for Modal Variables

The present work offers some fundamental concepts of the Statistical Implicative Analysis for modal variables and proposes an index to establish the similarity between two modal variables. Expressions to calculate typicality and contribution of the individuals to the classes that are formed in the classification are also presented. This technique is illustrated by two examples, one with binary data, which allows the coincidence between the formulas presented and the existent for the binary case to be shown, and the other for modal data with more than two modalities.


Resumen
En el presente trabajo se ofrecen algunos conceptos fundamentales del análisis estadístico implicativo para el caso de variables modales y se propone un índice para establecer la similaridad entre dos variables modales, así como expresiones para el cálculo de la tipicalidad y contribución de los individuos a las clases que se forman en la clasificación.Con el objetivo de ilustrar la técnica presentada, es aplicada a dos juegos de datos, uno binario, el cual permite mostrar numéricamente la coincidencia de las fórmulas presentadas con las existentes para el caso de variables binarias, y otro modal con más de dos modalidades.

Introduction
Régis Gras and its collaborators have developed the Statistical Implicative Analysis (SIA), which allows association rules in a dataset to be established, crossing individuals and variables.
The initial objective of this method was to answer the question: if an object has a certain quality, could it have some other one?This is the case of binary variables (Gras 2000).
The research developed by Marc Bailleul from 1991 to 1994, (Bailleul & Gras 1994) showed the necessity to extend the concept of statistical implication to nonbinary variables, in particular to the modal variables.Bailleul & Gras (1994) refer to the representation that the professors of mathematics make of their own teaching, for which, the professors have been asked to order a list of significant words depending on their importance.Their elections are no longer binary.The words selected by a professor are arranged in a scale where the top words are the most representative.The question of Bailleul is centered on questions of the type: "if I select a word x with the importance i x then I select the word y with the importance i y ≥ i x ".
For the application of the SIA techniques, CHIC (Classification Hiérarchique Implicative et Cohésitive), (Couturier 2008) has been developed.This is a software of analysis of data that was initially conceived by Regis Gras and it has been successively improved for personal computers by Saddo Ag Almouloud, Harrisson Ratsimba-Rajohn and, in its latest versions, by Raphaël Couturier (Version 5.0), who incorporates the classification, by means of the similarity analysis.
Unlike the techniques of SIA, which are concerned with the cohesive and the implicative analysis, in the specialized literature, expressions for the determination of the similarity index, the significant nodes, the typicality and the contribution of individuals to the classes in the case of modal variables do not appear.For this reason, the present paper proposes a similarity index for the case of modal variables, as well as expressions for the determination of the typicality and contribution of the individuals to the classes formed during the construction of the similarity tree, extending the works of Bailleul & Gras (1994) and Lagrange (1998).

The Intensity of Propensity
Lagrange (1998) defines the "propensity" as a non-symmetric association of two variables whose values belong to the interval [0, 1] and introduces the "intensity of propensity" to decide whether or not an observed association is significant.He also proves that in the case of binary variables, the intensity of propensity is similar to the implication intensity.
Similar to the case of binary variables, let us consider a set I formed by n individuals and a set A = {a 1 , a 2 , . . ., a p } formed by p modal variables, each one of which can take values in the set M = {k ∈ N : 1 ≤ k ≤ m}.
To transform the m modalities to values in the [0, 1] interval, a weight is assigned to each modality, varying from 1 (the strongest) to 0 by fractions of size , by means of the function ψ defined by: where , conserving the notation used by Bailleul & Gras (1994).
Let a and b be two modal variables, Lagrange (1998) defines the non propensity index of a on b as: and to define the intensity of propensity, he considers a set E of size N sufficiently large, of which I is a random part.The random index is defined as: where {A x } x∈E and {B x } x∈E are sets of random variables with equal distributions, representative of the variables ψ a and ψ b respectively, for which the observed non propensity index s is one of its possible values.It is also proved that Z follows approximately a Normal distribution of mean 2 n , and therefore, for sufficiently large n, Z − µ σ follows approximately a Normal distribution of mean zero and variance one.Lagrange (1998) defines the propensity coefficient and the intensity of propensity of a on b as follows: • Propensity coefficient: • Intensity of propensity: where: • m a (resp.m b ) is the empirical mean of ψ a (resp.ψ b ), • s 2 a (resp.s 2 b ) is the variance of ψ a (resp.ψ b ), and • φ is the standard normal distribution function.

Contribution of the Individuals
To define the typicality and the contribution of the individuals to a class or a path for the binary case, the approach defined by Ratsimba-Rajohn (1992) has been used, which presents the notion of respect of an implication among binary variables, in the following way: where ϕ x a, b is the intensity of the implication of a on b for the individual x and p is a number between 0 and 1.
For modal variables, Bailleul & Gras (1994) only analyze the contribution of the individuals to a class or a path.They proceed in two steps: First.The set of the possible values that the couples of variables can take is arranged according to the weights that have been assigned to each one of the modalities.
In this classification, they divide the couples of variables in two groups: The rank assigned to these couples of variables is given through the expression , and decrement −1, and 0 ≤ j ≤ (i − 1).The rank assigned to these couples of variables is given by the expression Second.The value that measures the respect or the contradiction of the implication is related to the implication intensity.
Thus, the grade of adhesion of the individual x to the implication a ⇒ b is defined as: From this grade of adhesion, Bailleul & Gras (1994) define the distance d of an individual x to the class C formed by g sub-classes, for which the g generic couples (r 1 , r 2 , . . ., r g ) and the g generic implications (ϕ 1 , ϕ 2 , . . ., ϕ g ) have been defined as (see Gras & Kuntz 2007): and the contribution of an individual x to the class C as: where x n is the neutral individual, that is, the individual that assigns the value 0 to all the variables of the g generic couples r i .

Classificatory Analysis for Modal Variables
In this section, an index to quantify the similarity among modal variables, the typicality and the contribution of the individuals to a class C is presented.

Similarity Index
Following the method to define the propensity coefficient, let us denote a set E of size N , with sufficiently large N , from which a random subset I of size n is selected, and a set A = {a 1 , a 2 , . . ., a p } formed by p modal variables, each one of which can take values in the set M = {k ∈ N : 1 ≤ k ≤ m}.Specific weights are assigned to each variable, from 1 (the strongest) to 0, in size fractions 1 m − 1 by means of the function ψ defined in (1).
Let {A x } x∈E and {B x } x∈E be two groups of random variables with equal distributions, representatives of the variables a and b with weights ψ a and ψ b respectively.Under the assumption of non a priori relationship between a and b, the 2N variables {A x }, {B x } are independents and the N events {x ∈ I} are independents with an occurrence probability equal to n N .
Let us define the random variable: for which 2 and therefore, for n sufficiently large, follows approximately a Normal distribution of mean zero and variance one.
The proof of this result is similar to the one presented by Lagrange (1998) for , therefore it doesn't show up in this article.
The similarity index of a on b is defined as: where: which coincides with the similarity index defined for the binary case.
Let us prove this.For the binary case, there are just two modalities, this is, M = {1, 2} and the unique assignable weights are 0 and 1, therefore: , where: , where:

Significant Nodes
For the determination of the significant nodes, a similar procedure to that of the binary case is used, but to define the initial and global pre-order on A × A, the index s (a, b) is used, this is, G s (Ω) = {((a, b) , (c, d)) : s (a, b) < s (c, d)}.For more details, see (Zamora, Gregori & Orús 2009).

Typicality and Contribution to a Class C
The couple (a, b) so that: If the class C has g subclasses (C included), then there will be g generic similarities ( s 1 , s 2 , . . ., s g ).
Based on the idea presented by Bailleul & Gras (1994), the grade of agreement between two modal variables a and b is defined.For that purpose: First.The set of the possible values that the couples of variables can take is arranged, according to the weights that have been assigned to each modality.
The couples of this ordered set are divided into two groups: G1: The couples (a, b) that verify the similarity a ∼ b or for which a (x) = b (x).

G2:
The couples (a, b) that don't verify the similarity.
To establish a classification to the levels of similarity or non similarity, ranks are assigned to the couples of values that the couples of variables (a, b) can take for a given individual x, like it is shown below.
Ranks are placed on a m×m matrix (see Table 1 for the case m = 4), where the rank assigned to the cell (i, j), corresponding to the i-th modality of the variable a and the j-th modality of the variable b, is given by the expression: with 1 ≤ i ≤ (m − 1) and j decreasing from m to (i + 1) with decrement (−1).It is also defined that Rg i,j = Rg j,i .From this table we can appreciate that: • The couple (1, 4) corresponding to an individual for which ψ a (x) = 0 and ψ b (x) = 1 receives the lower rank, 1.
• For the rest of the cases, the cell receives a rank between 1 and 7, according to the level of discrepancy or disagreement between a (x) and b (x).
Second.The value that measures the respect or the contradiction to the full agreement is linked to the similarity index.
For that, the grade of agreement of the individual x to the relationship a ∼ b is defined as: where a (x) = i and b (x) = j.
The typical subject is defined as the subject whose grade of agreement coincides with the value of the generic similarity for all the generic couples, this is, The distance of an individual x to the class C formed by g sub-classes is then defined as: where (r 1 , r 2 , . . ., r g ) are the g generic couples and ( s 1 , s 2 , . . ., s g ) represents the g generic similarities.
The typicality is defined by the fact that certain individuals are representative of the population's behavior; this is, with a similarity index close to that of the formed class.The typicality of an individual x to the class C is calculated by the expression: Let us analyze the two extreme cases: • If x is a typical subject, it should satisfy that deg ri = s i for all generic couples that for all generic couples r i , deg ri (x) = 0, that is, the rank that is assigned to all generic couples is 1, which occurs if for all generic couples There are individuals more responsible than others in the formation of the class.The contribution is defined to evaluate this degree of responsibility.The contribution of an individual x to the class C is given by: is the distance from the individual x to the class C and deg * ri = Let us analyze the two extreme cases, the individual of higher contribution and the individual of lowest contribution to the formation of the class: • If x is the individual of higher contribution to the formation of the class, then it satisfies that a (x) = b (x) and therefore

Illustration of the Technique
The techniques presented have been programmed in language R (R Core Team 2014), in a file that will be denoted as ASIM.R. Two examples are presented, the first with binary data, to show the equivalence of the similarity index proposed with the existent one for the binary case.The second example deals with modal variables with more than two modalities, where the similarity index, the typicality and contribution of the individuals to the classes formed during the construction of the similarity tree are computed.
In the first example, seventeen binary variables are analyzed in a sample of 100 patients, from which 50 suffer from lung cancer and the other 50 don't.Before presenting the results of the application of both systems, CHIC and ASIM, it is necessary to point out that CHIC, for similarity analysis in the binary case, allows the user to choose the distribution (Poisson or Binomial); however, the results that CHIC shows, in both cases, correspond to the Poisson approximation to the Normal distribution.Given this difficulty, it is possible to compare the result shown by the CHIC for the binary case, with the result shown by ASIM for the modal case with two modalities.
The index proposed for the modal case with 2 modalities coincides with the index proposed for the binary case, in this case the random variable x∈I AxBx n follows approximately a Normal distribution of mean , distribution that coincides with the Poisson approximation to the Normal distribution.
The Figures 1 and 2 show the similarity indexes at level zero of the hierarchy.The coincidence of the results can be appreciated.Through the running of the above example and from the execution of other datasets, we can say that the execution times between CHIC and ASIM.R are similar.
Next, the results of a modal variable case with five modalities are presented.To accomplish this, the file "DatosMusicaModalII.txt" is used.This file corresponds to the degree of preference of twenty students to fifteen different kinds of music.The degree of preference is assessed by a scale from 1 to 5, 5 being the highest value of preference.The similarity index of the class 1 formed by variables HIP and PUN (see Figures 8 and 9) is 0.934143; it means that HIP and PUN are the most similar music types in the population of which the sample under study has been selected.
The second class is formed by the variables JAZ and HEA.This class also shows a big similarity index (0.922925), but smaller than the previous one (see Figures 8 and 9).Then we can say that the preference for these music types is high, but smaller than for the previous group (HIP and PUN).The other classes are interpreted in a similar way.The typicality of the individuals and the optimal group are shown for each one of the formed classes.In Figure 11 we can see that individuals for which the value of the typicality in a class is 0 exist, for example, the individual 7 in the class 3.These individuals are in more disagreement, this is, the values taken for the variables in each generic couple for these individuals are respectively the minimum (modality 1) and maximum (modality 5) value or vice versa.
On the contrary, there are individuals whose typicality value in a class is 1, for example, the individual 1 of the class 3. The values taken for the variables in each generic couple for these individuals are the same.The degree of agreement among these variables for this individual coincides with the similarity index, being hence the most typical individual.
Figures 13 and 14 show the contributions of the individuals and the optimal group.The results related to the contribution can be interpreted as follows.When the contribution of an individual to a class is 1, for example, individual 1 of class  1, the values of the variables for these individuals, in each generic couple, are equal.This individual evaluates in a similar way his preference for the music types that form the vector of generic similarities for the class and therefore it is more responsible in the creation of the class.
If the contribution of an individual to a class is 0, for example, individual 1 of class 4, then its distance to the class is the biggest possible (1).This is the individual that contributes the least to the formation of the class (FLA, RAP), doesn't like the music FLA and assigns the biggest value possible to the preference for the music RAP.It is the individual of lowest contribution for declaring FLA and RAP as similar music types.
For space reasons, only a part of the typicalities, contributions and optimal group are shown.

Conclusions
In this work we proposed a similarity index among modal variables, and expressions for the calculation of the typicalities and contributions of the individuals to a class, for these types of variables.It has been demonstrated, theoretically and numerically, that the expression for the calculation of the similarity index for modal variables with two modalities coincides with the expression for the binary case, thus the index proposed is a generalization of the binary case.Following the idea of Bailleul & Gras (1994), we propose an expression to calculate the grade of agreement between two modal variables and from this, expressions are given to calculate the typicalities and contributions of the individuals to the formed classes.
for all the generic couples r i = (a, b), and therefore, d (x, C) = 0 and γ (x, C) = 1.• If x is in more disagreement with C, d (x, C) = max y∈I d (y, C), which means is the individual of lowest contribution to the formation of the class, it satisfies that a (x) = 1 and b (x) = m or a (x) = m and b (x) = 1 and therefore Rg ri = 1 for all generic couples r i = (a, b).Hence deg * ri (x) = 0, ∀r i and d (x, C) = 1 ⇒ γ (x, C) = 0.

Figure 1 :
Figure 1: Similarity indexes at the level zero of the hierarchy with ASIM.R.

Figure 2 :
Figure 2: Similarity indexes at the level zero of the hierarchy, obtained by CHIC.

Figure 5
Figure5shows the same results, by executing the software CHIC.The coincidence of both results can be appreciated.Figures6a and 6bshow the similarity trees obtained by means of each program.Again the results are coincident.

Figure 5 :
Figure 5: Classes and similarity indexes obtained by executing CHIC.

Figures 7 ,
Figures 7, 8 and 9 show the similarity indexes obtained at the level zero of the hierarchy, the grouping of variables in each level of this hierarchy, and the indexes to which they were joined.

Figure 7 :
Figure 7: Similarity indexes at the level zero of the hierarchy.

Figure 10
Figure 10 presents the similarity tree for the music data classification shown in Figures 8 and 9.

Figure 11 :
Figure 11: Typicalities of the first 10 individuals to the first 10 classes.

Figure 12 :
Figure 12: Optimal group for the first 6 classes.

Figure 13 :
Figure 13: Contribution of the first 10 individuals to the first 9 classes.

Figure 14 :
Figure 14: Optimal group for the first 6 classes.