An Extension of Totohasina’s Normalization Theory of Quality Measures of Association Rules

In the context of binary data mining, for unifying view on probabilistic quality measures of association rules, Totohasina’s theory of normalization of quality measures of association rules primarily based on affine homeomorphism presents some drawbacks. Indeed, it cannot normalize some interestingness measures which are explained below. This paper presents an extension of it, as a new normalization method based on proper homographic homeomorphism that appears most consequent.


Introduction
In the context of the implicative statistical analysis (ISA) [1], you can never leave aside the probabilistic notions of the measures of quality which assess the degree of implicative link between two patterns of association rule. In light of very rich number of measures in the literature of the binary database, researchers are working [2][3][4][5][6][7] parallelly to find the more general relationships allowing to classify partially or entirely these various measures of interest. Hence the creation of highly founded concept called "normalization under five constraints quality measures" in the context of data mining appeared in 2003 [2]. This definition of normalized measures is acquired. It thus turns out that all normalizable measures by affine homeomorphism become comparable [2,8]. Although this author has already well-treated this subject, the problem of normalization of probabilistic quality measures remains not fully resolved. Indeed, the tool called affine homeomorphism he used may have some weakness, because it cannot normalize a measure intended to the infinity at least one of reference situations more or less intuitive such as the incompatibility, the statistical independence, and logical implication [1,[9][10][11][12], for example, the measures Cost Multiplying, Sebag, Conviction, Odd-Ratio, Informal Gain, and Ratio of Example counter-example. Therefore, as already announced by the author ([13] paragraph 3.2 page 65) "the problem remains open concerning the transformation allowing normalization of other measures in a way still to be specified". We are interested in this issue. This article proposes a way to partly solve this problem and discussion. First, let us recall below our proposed definition of a normalized quality measure which is evolving with three intuitive events as incompatibility, stochastic independence, and implication.
Definition 1 (cf. [6,[13][14][15]). Let X and Y be itemset of a context of binary data mining = ( , , R), the uniform discrete probability on probabilizable space ( , P( )) [2], all sets of transactions, the set of attributes called items or patterns, R the binary relation from to , a probabilistic quality measure, and → an association rule, with ∩ = 0. ∀ ∈ , = { ∈ / ∀ ∈ ; R } denoted the extention of the itemset : is hence an event of the discrete probability ( , P( ), ). A quality measure of an association rule is called to be normalized if it verifies the five conditions below: (iii) ( → ) = 0, if ( / ) = ( ); i.e., the two events and are independent and this means both and are independent itemsets; (iv) 0 < ( → ) < 1, and 0 ̸ = ( / ) > ( ); i.e., if and are positively dependent, therefore the two itemsets and are positively dependent; (v) ( → ) = 1, and ( / ) = 1 or if ⊆ : the itemset is then completely included in . = { ∈ / ∀ ∈ ; R }; i.e., the set of all transactions contains the pattern [6]. Note that this measure was discovered independently by three authors from three continents: by S. Guillaume in France in her thesis [15], by A. Totohasina in [2] in Madagascar at his research on normalization, at that time he appointed ION "Implication Oriented Normalised", and by X. Wu, C. Zhang, and S. Zhang in [5] in USA under the name of CPIR "Conditional Probability Incrementation Ratio". Its rich and interesting mathematical properties are studied in [6,15]. In addition, it is historically interesting to notice that this measure was partially discovered as Certainty Factor (CF) by Edward H. Shortliffe and Bruce G. Buchanan in USA at 1975 [12]. Fernando Berzal et al. [16] established some statistical properties of CF and its relation with some common interestingness measures as Confidence and Conviction.
Hereafter, our work is divided into five sections. Section 2 recalls some properties of a homographic homeomorphism that will be the main subject of our contribution. Section 3 offers a new normalization process based on homographies, in order to solve the aforementioned problem. Section 4 recommends the raw results of each of the normalized measures. Section 5 raises a conclusion.

About Homographic Function
Processing Tool

Definition and Reminders.
In mathematics, a homographic function is a function which can be represented as the form of quotient of two linear functions. It is bijective and its inverse function is a particular homographic function. In the commutative field R a homographic function f on R is a function in itself defined by ( ) = ( + )/( + ), where , , , and are real numbers such as − ̸ = 0. Prohibit − to be zero to avoid a constant function. Sometimes the condition " not zero" is added, as the case = 0 corresponds to linear functions, but then we lose the group structure of the set of homographic functions with the composition of applications.
We will retain that a homographic function is a homeomorphism of the form R \ {− / } → R : , which has the same determinant ( − ̸ = 0). Note that −1 is a homography of the same type as and the graphs of homography and −1 are hyperbolas.
It is seen that if we extend by (− / ) = ∞ and (∞) = / , we obtain a projective application and let us denote R = R ∪ {∞}.
Derivative and Variation. In the real homography case , its derivative is ( ) = /( + ) 2 , where = − is the determinant of the matrix ( ) and so is called the determinant of the homography and denoted det( ). For this reason, here are the variations in the homographic function: if det( ) is positive, then is increasing on its two definition intervals; if det( ) is negative, then is strictly decreasing on both definition intervals.

Canonical Form. Let be a homography such that
In case is not zero, the canonical form (also called reduced form) of a homographic function is ( ) = /( + ) + , where = ( − )/ 2 , = / , and = / . By making a change in reference by taking a point S, the set projective applications, of coordinates ( , ) as a new origin, the expression of the homographic function becomes ( ) = / which corresponds to the inverse function multiplied by the scalar = ( − )/ 2 .
Morality. Any own homographic function nonzero determinant can thus be reduced to a homographic function type as → /( + ), with ̸ = 0. From now, we are interested in all homographies of type as → /( + ), with ̸ = 0, that lim →∞ ( /( + )) = 0 ∈ R: own homography thus returns infinity to a real finite. Knowing that, this time, our main objective is to "make the infinity to be finite", we thereafter consider the measures that have infinity value among the three conditions such as logical implication, stochastic independence, and incompatibility; we use the homographic function mentioned above. Then, for any situation not leading to infinity, it is relevant to use the theory in [13] which is based on the use of an affine homeomorphism. Taking advantage of the fact that affine applications are part of the great family of homographies and appear as degenerate homographies returning infinity to infinity, we will enhance the whole nonconstant homographies. In the current theory we can combine the theory in [2] and the one we have just proposed. It is thus a natural extension of such approach of [14].

Notations.
For convenience let us denote by the value of the probabilistic quality measure ( → ), by the value of ( → ), at logical implication, that of ( → ) at independence, and the value of ( International Journal of Mathematics and Mathematical Sciences 3

Normalization Process by Homography
Let ℎ be the homography normalization of quality measure , ℎ the semihomographic normalized of , ℎ at right semihomographic normalized of , and ℎ at left semihomographic normalized of .
As announced in [2], the main objective of normalization of quality measure is to bring its values in [-1; 1] under the three conditions that takes the value -1 at incompatibility, 0 at independence, and 1 at logical implication in order to compare two normalizable measures. Remember always that if these three values are finite and pairwise different, so the research carried out by [14] has already taken the approach to solve this kind of problem (problem of normalization of probabilistic quality measure), i.e., the use of the expression of the normalized of : where these four coefficients, called normalization coefficient, are determined by passing unilateral limits in reference situations (incompatibility, independence, and logical implication) due to the continuity of evolution in both zones: attraction (positive dependence) and repulsion ( negative dependence ) [2,6,17]. If one or two of the three values , , and are infinite and in case we have two infinite values, it is necessary that is excluded, which leads us to use one of the three following expressions to find the four real coefficients, , , , and : These four coefficients are still determined by passing unilateral limits in situations of reference (incompatibility, independence, and logic implication) by taking into account the continuity evolution in both zones: attraction (positive dependence) and repulsion (negative dependence). In case where favors , can be infinite, ̸ = , ( , ) ∈ R 2 and ∈ R * ; then we obtain the following system of equations: As ( , ) ∈ R 2 and ∈ R * , so you only use the theories in [2] for the left normalization. We can write the system of four nonlinear equations, with the following four unknowns: We are just here to take = 0 and we have four equations with four unknowns, with this particularity that the coefficient can be infinite. Hence we have the following proposition. The following system of equations presents the common features of equation (5) with the only difference that can be infinite, ̸ = and ∈ R * . This gives the system of equations (7).
International Journal of Mathematics and Mathematical Sciences Hence we obtain the following proposition. Proof. ( ) The system of equations (7) is equivalent to the system of equations (8): It became a system of linear equations. The matrix writing the system of equations (8) is given by the vector equation (9): = ∞, then the last two equations do not make sense.
The following system is similar to the previous equation (7), this time with ∈ R and ∈ R * , including case where = 0; then in this case, m must be nonzero. Take, for example, m = 1.
Then we obtain the following proposition. Proof. ( ) The system of equations (10) is equivalent to the system of equations (11): It is reduced to a system of linear equations. The matrix writing of the system of equations (11) is given by the vector equation (12): The current form of the studied system was the appearance of the system of equations (13): In (13) and can be infinite, is real, and ( + ) ∈ R * . Proof. (i) The system of equations (13) is equivalent to the system of equations (14): It therefore comes to a system of linear equations. The matrix writing system of equations (14) is given by the vector equation (15): Let us call 4 = (

Application of These Four Propositions
We recall in Table 1 the respective definitions of the various measures that lead to the results below.
(1) The normalized measure of "Cost Multiplying" is (2) The normalized measure of "Sebag" is (3) The normalized measure of "Example counterexample" is ( / ) − ( ) (5) The normalized measure of "Conviction" is (20) The normalized measure of "Informal Gain" is Note that the normalized of Gain-Informal measure has no relationship with , but it is continuing. Table 1 calls the respective definitions of the various measures.
The following theorem supplements in its part (i) the statement in [17] (paragraph 3.2, page 62).

Theorem 8. A probabilistic quality measure is normalizing if and only if, for any association rule
→ , the following conditions are met: (i) the following inequalities are satisfied ) ∈ R 3 , and one or two of them are infinite, then it is necessary to use Propositions 3, 4, 5, and 6 (ii) above. Hence the theorem is stated.

Conclusion and Perspectives
This study makes a significant number of quality measures that remain nonaffine normalizable (e.g., Sebag, Odd-Ratio, Example counter-example, Informal Gain, Cost Multiplying, and Conviction), through the use of homographic functions. However, we have shown that each measure has its own position in relation to the three intuitive references such as incompatibility, independency, and logical implication. In this case, we have shown that we can use a homographic homeomorphism or even combine a homographic homeomorphism with an affine homeomorphism, when the infinity appears as reference coefficient. Only, we must be able to play with the homographic homeomorphism in accordance with its property and according to the need. It is explained in our work only that any situation is infinite; moreover, it is not yet risen to the position of stochastic independence: only homographic functions of the type → /( + ), where = 0 and ∈ R * , are sufficient to solve the problem of normalization. A small exception will be noticed on the measurement Informal Gain, because it is infinite at incompatibility and equal to zero at the stochastic independence. Therefore, it appears necessary to introduce the two functions → /( + ) and → /( + ) and ̸ = 0, for example, m = 1. Finally, this study is based on normalization by homography complementarily to those by affine application; this remains around the quality measure with its two components and multiplying by closely factors. This reinforces the unifying property of relative to all measures existing in literature. Although we have these homographic homeomorphisms and affine homeomorphisms to normalize interestingness measures, the route to be taken is even longer. Indeed, there is a group of measures that resist to the use of these new tools, namely, Klosgen, Pondered dependency, One way support, Bilateral Support, Coverage, and Prevalence, because they do not meet the conditions for normalization. So, the problem remains still open with respect to the transformation that allows normalization of those quality measures in a sense still to be specified.