Principal Component Analysis: A Generalized Gini Approach

A principal component analysis based on the generalized Gini correlation index is proposed (Gini PCA). The Gini PCA generalizes the standard PCA based on the variance. It is shown, in the Gaussian case, that the standard PCA is equivalent to the Gini PCA. It is also proven that the dimensionality reduction based on the generalized Gini correlation matrix, that relies on city-block distances, is robust to out-liers. Monte Carlo simulations and an application on cars data (with outliers) show the robustness of the Gini PCA and provide diﬀerent interpretations of the results compared with the variance PCA.


Introduction
This late decade, a line of research has been developed and focused on the Gini methodology, see Yitzhaki & Schechtman (2013) for a general review of different Gini approaches applied in Statistics and in Econometrics. 1 Among the Gini tools, the Gini regression has received a large audience since the Gini regression initiated by Olkin and Yitzhaki (1992). Gini regressions have been generalized by Yitzhaki & Schechtman (2013) in different areas and particularly in time series analysis. Shelef & Schechtman (2011) and Carcea and Serfling (2015) investigated ARMA processes with an identification and an estimation procedure based on Gini autocovariance functions. This robust Gini approach has been shown to be relevant to heavy tailed distributions such as Pareto processes. Also, Shelef (2016) proposed a unit root test based on Gini regressions to deal with outlying observations in the data.
In parallel to the above literature, a second line of research on multidimensional Gini indices arose. This literature paved the way on the valuation of inequality about multiple commodities or dimensions such as education, health, income, etc., that is, to find a real-valued function that quantifies the inequality between the households of a population over each dimension, see among others, List (1999), Gajdos & Weymark (2005), Decancq & Lugo (2013). More recently, Banerjee (2010) shows that it is possible to construct multidimensional Gini indices by exploring the projection of the data in reduced subspaces based on the Euclidean norm. Accordingly, some notions of linear algebra have been increasingly included in the axiomatization of multidimensional Gini indices.
In this paper, in the same vein as in the second line of research mentioned above, we start from the recognition that linear algebra may be closely related to the maximum level of inequality that arises in a given dimension. In data analysis, the variance maximization is mainly used to further analyze projected data in reduced subspaces. The variance criterion implies many problems since it captures a very precise notion of dispersion, which does not always match some basic properties satisfied by variability measures such as the Gini index. Such a property may be, for example, an invariance condition postulating that a dispersion measure remains constant when the data are transformed by monotonic maps. 2 Another property typically related to the Gini index is its robustness to outlying observations, see e.g. Yitzhaki & Olkin (1991) in the case of linear regressions. Accordingly, it seems natural to analyze multidimensional dispersion with the Gini index, instead of the variance, in order to provide a Principal Components Analysis (PCA) in a Gini sense (Gini PCA).
In the field of PCA, Baccini, Besse & de Falguerolles (1996) and Korhonen & Siljamäki (1998) are among the first authors dealing with a 1 -norm PCA framework. Their idea was to robustify the standard PCA by means of the Gini Mean Difference metric introduced by Gini (1912), which is a city-block distance measure of variability. The authors employ the Gini Mean Difference as an estimator of the standard deviation of each variable before running the singular value decomposition leading to a robust PCA. In the same vein, Ding et al. (2006) make use of a rotational 1 norm PCA to robustify the variancecovariance matrix in such a way that the PCA is rotational invariant. Recent PCAs derive latent variables thanks to regressions based on elastic net (a 1 regularization) that improves the quality of the regression curve estimation, see Zou, Hastie & Tibshirani (2006).
In this paper, it is shown that the variance may be seen as an inappropriate criterion for dimensionality reduction in the case of data contamination or outlying observations. A generalized Gini PCA is investigated by means of Gini correlations matrices. These matrices contain generalized Gini correlation coefficients (see Yitzhaki (2003)) based on the Gini covariance operator introduced by Schechtman & Yitzhaki (1987) and Yitzhaki & Schechtman (2003). The generalized Gini correlation coefficients are: (i) bounded, (ii) invariant to monotonic transformations, (iii) and symmetric whenever the variables are exchangeable. It is shown that the standard PCA is equivalent to the Gini PCA when the variables are Gaussians. Also, it is shown that the generalized Gini PCA may be realized either in the space of the variables or in the space of the observations. In each case, some statistics are proposed to perform some interpretations of the variables and of the observations (absolute and relative contributions). To be precise, an U -statistics test is introduced to test for the significance of the correlations between the axes of the new subspace and the variables in order to assess their significance. Monte Carlo simulations are performed in order to show the superiority of the Gini PCA compared with the usual PCA when outlying observations Gini correlation index. contaminate the data. Finally, with the aid of the well-known cars data, which contain outliers, it is shown that the generalized Gini PCA leads to different results compared with the usual PCA.
The outline of the paper is as follows. Section 2 sets the notations and presents some 2 norm approaches of PCA. Section 3 reviews the Ginicovariance operator. Section 4 is devoted to the generalized Gini PCA. Section 5 focuses on the interpretation of the Gini PCA. Sections 6 and 7 present some Monte Carlo simulations and applications, respectively.

Motivations for the use of Gini PCA
In this Section, the notations are set. Then, some assumptions are imposed and some 2 -norm PCA techniques are reviewed in order to motivate the employ of the Gini PCA.

Notations and definitions
Let N * be the set of integers and R [R ++ ] the set of [positive] real numbers. Let M be the set of all N ×K matrix X = [x ik ] that describes N observations on K dimensions such that N K, with elements x ik ∈ R, and I n the n × n identity matrix. The N × 1 vectors representing each variable are expressed as x ·k , for all k ∈ {1, . . . , K} and we assume that x ·k = c1 N , with c a real constant and 1 N a N -dimensional column vector of ones. The K × 1 vectors representing each observation i (the transposed ith line of X) are expressed as x i· , for all i ∈ {1, . . . , N }. It is assumed that x ·k is the realization of the random variable X k , with cumulative distribution function F k . The arithmetic mean of each column (line) of the matrix X is given byx ·k (x i· ). The cardinal of set A is denoted #{A}. The 1 norm, for any given real vector x, is x 1 = K k=1 |x ·k |, whereas the 2 norm is x 2 = ( K k=1 x 2 ·k ) 1/2 .

Variants of PCA based on the 2 norm
The classical formulation of the PCA, to obtain the first component, can be obtained by solving or equivalently where ω ∈ R K , and Σ is the (symmetric positive semi-definite) K ×K sample covariance matrix. Mardia, Kent & Bibby (1979) suggest to write Cor[x ·,j , Xω] subject to ω 2 2 = ω ω = 1.
The part Xω * 1 ω * 1 is actually a constraint that we add to ensure the orthogonality of the two first components. This problem is equivalent to finding the maxima of Var[Xω] subject to ω 2 2 = 1 and ω ⊥ ω * 1 . This idea is also called Hotelling (or Wielandt) deflation technique. On the k-th iteration, we extract the leading eigenvector Saad (1998)). Note that, following Hotelling (1933) and Eckart & Young (1936), that it is also possible to write this problem as where · denotes the nuclear norm of a matrix (i.e. the sum of its singular values) 4 . One extension, introduced in d' Aspremont et al. (1920), was to add a constraint based on the cardinality of ω (also called 0 norm) corresponding to the number of non-zero coefficients of ω. The penalized objective function is then for some λ > 0. This is called sparse PCA, and can be related to sparse regression, introduced in Tibshirani (1996). But as pointed out in Mackey (2009), interpretation is not easy and the components obtained are not orthogonal. Gorban et al. (2007) considered an extension to nonlinear Principal Manifolds to take into account nonlinearities. Another direction for extensions was to consider Robust Principal Component Analysis. Candes et al. (2009) suggested an approach based on the fact that principal component pursuit can be obtained by solving min X −X + λ X 1 .
But other methods were also considered to obtain Robust PCA. A natural 'scale-free' version is obtained by considering a rank matrix instead of X. This is also called 'ordinal' PCA in the literature, see Korhonen & Siljamäki (1998). The first 'ordinal' component is where R denotes some rank based correlation, e.g. Spearman's rank correlation, as an extention of Equation (3). So, quite naturally, one possible extension of Equation (2) would be where R[X] denotes Spearman's rank correlation. In this section, instead of using Pearson's correlation (as in Equation (2) when the variables are scaled) or Spearman's (as in this ordinal PCA), we will consider the multidimensional Gini correlation based on the h-covariance operator.

Geometry of Gini PCA: Gini-Covariance Operators
The first PCA was introduced by Pearson (1901), projecting X onto the eigenvectors of its covariance matrix, and observing that the variances of those projections are the corresponding eigenvalues. One of the key property is that X X is a positive matrix. Most statistical properties of PCAs (see Flury & Riedwyl (1988) or Anderson (1963)) are obtained under Gaussian assumptions. Furthermore, geometric properties can be obtained using the fact that the covariance defines an inner product on the subspace of random variables with finite second moment (up to a translation, i.e. we identify any two that differ by a constant). We will discuss in this section the properties of the Gini Covariance operator with the special case of Gaussian random variables, and the property of the Gini correlation matrix that will be used in the next Section for the Gini PCA.

The Gini-covariance operator
In this section, X = (X 1 , · · · , X K ) denotes a random vector. The covariance matrix between X and Y , two random vectors, is defined as the inner product between centered versions of the vectors, Hence, it is the matrix where elements are regular covariances between components of the vectors, Cov(X, Y ) = [Cov(X i , Y j )]. It is the upper-right block of the covariance matrix of (X, Y ). Note that Cov(X, X) is the standard variance-covariance matrix of vector X.
Definition 3.1. Let X = (X 1 , · · · , X K ) be collections of K identically distributed random variables. Let h : R → R denote a non-decreasing function. Let h(X) denote the random vector (h(X 1 ), · · · , h(X K )), and assume that each component has a finite variance. Then, operator ΓC h (X) = Cov(X, h(X)) is called h-Gini covariance matrix.
Since h is a non-decreasing mapping, then X and h(X) are componentwise comonotonic random vectors. Assuming that components of X are identically distributed is a reasonable assumption in the context of scaled (and centered) PCA, as discussed in footnote 3. Nevertheless, a stronger technical assumption will be necessary: pairwise-exchangeability.
Pairwise-exchangeability is a stronger concept than having only one vector with identically distributed components, and a weaker concept than (full) exchangeability. In the Gaussian case where h(X k ) = Φ(X k ) with Φ(X k ) being the normal cdf of X k for all k = 1, . . . , K, pairwise-exchangeability is equivalent to components identically distributed.
Proposition 3.1. If X is a Gaussian vector with identically distributed components, then X is pairwise-exchangeable.
Proof. For simplicity, assume that components of X are N (0, 1) random variables, then X ∼ N (0, ρ) where ρ is a correlation matrix. In that case Let us now introduce the Gini-covariance. Gini (1912) introduced the Gini mean difference operator ∆, defined as: for some random variable X (or more specifically for some distribution F with X ∼ F , because this operator is law invariant). One can rewrite: where the term on the right is interpreted as the slope of the regression curve of the observed variable X and its 'ranks' (up to a scaling coefficient). Thus, the Gini-covariance is obtained when the function h is equal to the cumulative distribution function of the second term, see Schechtman & Yitzhaki (1987).
Definition 3.3. Let X = (X 1 , · · · , X K ) be a collection of K identically distributed random variables, with cumulative distribution function F . Then, the Gini covariance is ΓC F (X) = Cov(X, F (X)).
On this basis, it is possible to show that the Gini covariance matrix is a positive semi-definite matrix.
Definition 3.4. Let X = (X 1 , · · · , X K ) be a collection of K identically distributed random variables, with survival distribution function F . Then, the generalized Gini covariance is GΓC ν ( for ν > 1. This operator is related to the one introduced in Yitzhaki & Schechtman (2003), called generalized Gini mean difference GM D ν operator. More precisely, an estimator of the generalized Gini mean difference is given by: where r x ·k = (R(x 1k ), . . . , R(x nk )) is the decumulative rank vector of x ·k , that is, the vector that assigns the smallest value (1) to the greatest observation x ik , and so on. The rank of observation i with respect to variable k is: The index GM D ν is a generalized version of the GM D 2 proposed earlier by Schechtman & Yitzhaki (1987), and can also be written as: When k = , GM D ν represents the variability of the variable x ·k itself. Focus is put on the lower tail of the distribution x ·k whenever ν → ∞, the approach is said to be max-min in the sense that GM D ν inflates the minimum value of the distribution. On the contrary, whenever ν → 0, the approach is said to be max-max, in this case focus is put on the upper tail of the distribution x ·k . As mentioned in Yitzhaki & Schechtman (2013), the case ν < 1 does not entail simple interpretations, thereby the parameter ν is used to be set as ν > 1 in empirical applications. 5 Note that even if X k and X have the same distribution, we might have GM D ν (X k , X ) = GM D ν (X , X k ), as shown on the example of Figure 1 when X k and X are exchangeable. But since generally GM D ν is not symmetric, we have for x ·k being not a monotonic transformation of x · and ν > 1,

Generalized Gini correlation
In this section, X is a matrix in M. The Gini correlation coefficient (Gcorrelation frown now on), is a normalized GM D ν index such that for all ν > 1, see Yitzhaki & Schechtman (2003), with GC ν (x ·k , x ·k ) = 1 and GM D ν (x ·k , x ·k ) = 0, for all k, = 1, . . . , K. Following Yitzhaki & Schechtman (2003), the G-correlation is well-suited for the measurement of correlations between non-normal distributions or in the presence of outlying observations in the sample.
Property 3.1. -Schechtman and Yitzhaki (2013): Whenever ν → 1, the variability of the variables is attenuated so that GM D ν tends to zero (even if the variables exhibit a strong variance). The choice of ν is interesting to perform generalized Gini PCA with various values of ν in order to robustify the results of the PCA, since the standard PCA (based on the variance) is potentially of bad quality if outlying observations drastically affect the sample.
A G-correlation matrix is proposed to analyze the data into a new vector space. Following Property 3.1 (iv), it is possible to rescale the variables x · thanks to a linear transformation, then the matrix of standardized observation is, The variable z i is a real number without dimension. The variables x ·k are rescaled such that their Gini variability is equal to unity. Now, we define the N × K matrix of decumulative centered rank vectors of Z, which are the same compared with those of X: Note that the last equality holds since the standardization (7) is a strictly increasing affine transformation. 6 The K × K matrix containing all Gcorrelation indices between all couples of variables z ·k and z · , for all k, = 1, . . . , K is expressed as: , then we get the following.
Proposition 3.2. For each standardized matrix Z defined in (7), the following relations hold: The extra diagonal terms may be rewritten as, Finally, using the same approach as before, we get: which ends the proof.
Finally, under a normality assumption, the generalized Gini covariance matrix GC ν (X) ≡ [GM D ν (X k , X )] is shown to be a positive semi-definite matrix.
13 Theorem 3.2 shows that under the normality assumption, the variance is a special case of the Gini methodology. As a consequence, for multivariate normal distributions, it is shown in Section 4 that Gini PCAs and classical PCA (based on the 2 norm and the covariance matrix) are equivalent.

Generalized Gini PCA
In this section, the multidimensional Gini variability of the observations i = 1, . . . , N , embodied by the matrix GC ν (Z), is maximized in the R K -Euclidean space, i.e., in the set of variables {z ·1 , . . . , z ·K }. This allows the observations to be projected onto the new vector space spanned by the eigenvectors of GC ν (Z). Then, the projection of the variables is investigated in the R N -Euclidean space induced by GC ν (Z). Both observations and variables are analyzed through the prism of absolute and relative contributions to propose relevant interpretations of the data in each subspace.

The R K -Euclidean space
It is possible to investigate the projection of the data Z onto the new vector space induced by GM D ν (Z) or alternatively by GC ν (Z) since GM D ν (Z) = GC ν (Z). Let f ·k be the kth principal component, i.e. the kth axis of the new subspace, such that the . . , r c c,f ·K ] its corresponding decumulative centered rank matrix (where each decumulative rank vector is raised to an exponent of ν − 1). The K × K matrix B ≡ [b ·1 , . . . , b ·K ] is the projector of the observations, with the normalization condition b ·k b ·k = 1, such that F = ZB. We denote by λ ·k (or 2µ ·k ) the eigenvalues of the matrix [GC ν (Z) + GC ν (Z) ]. Let the basis B := {b ·1 , . . . , b ·h } with h ≤ K issued from the maximization of the overall Gini variability: Indeed, from the Lagrangian, because of the non-symmetry of GC ν (Z), the eigenvalue equation is, The new subspace {f ·1 , . . . , f ·h } such that h ≤ K is issued from the maximization of the Gini variability between the observations on each axis f ·k . Although the result of the generalized Gini PCA seems to be close to the classical PCA, some differences exist.
Proposition 4.1. Let B = {b ·1 , . . . , b ·h } with h ≤ K be the basis issued from the maximization of b ·k GC ν (Z)b ·k for all k = 1, . . . , K, then the following assertions hold: Then, maximizing the multidimensional variability b ·k GC ν (Z)b ·k yields from We have:f and so GM D ν (f ·k , f ·k ) = λ ·k for all k = 1, . . . , K. The results (ii) and (iii) are straightforward.

Discussion
Condition (i) shows that the maximization of the multidimensional variability (in the Gini sense) b ·k GC ν (Z)b ·k does not necessarily coincide with the maximization of the variability of the observations projected onto the new axis f ·k embodied by GM D ν (f ·k , f ·k ). Since in general, the rank of the observations on axis f ·k does not coincide with the projected ranks, that is, In other words, maximizing the quadratic form b ·k GC(Z) ν b ·k does not systematically maximize the overall Gini variability GM D ν (f ·k , f ·k ). However, it maximizes the following generalized Gini index: In the literature on inequality indices, this kind of index is rather known as a generalized Gini index, because of the product between a variable f ·k and a function Ψ of its ranks, Ψ(r f k ) := b ·k (R c z ) , such that: Yaari (1987) and subsequently Yaari (1988) proposes generalized Gini indices with a rank distortion function Ψ that describes the behavior of the decision maker (being either max-min or max-max). 8 It is noteworthy that this generalized Gini index of variability is very different from Banerjee (2010)'s multidimensional Gini index. The author proposes to extract the first eigenvector e ·1 of X X and to project the data X such that s := Xe ·1 so that the multidimensional Gini index is G(s) = s Ψ (r s ), with r s the rank vector of s and withΨ a function that distorts the ranks. Banerjee (2010)'s index is derived from the matrix X X. To be precise, the maximization of the variance-covariance matrix X X (based on the 2 metric) yields the projection of the data on the first component f ·1 , which is then employed in the multidimensional Gini index (based on the 1 metric). This approach is legitimated by the fact that G(s) has some desirable properties linked with the Gini index. However, this Gini index deals with an information issued from the variance, because the vector s relies on the maximization of the variance of component f ·1 . Alternatively, it is possible to make use of the Gini variability, in a first stage, in order to project the data onto a new subspace, and in a second stage, to use the generalized Gini index of the projected data for the interpretations. In such as case, the Gini metric enables outliers to be attenuated. The employ of G(s) as a result of the variance-covariance maximization may transform the data so that outlying observations would capture an important part of the information (variance) on the first component. This case occurs in the classical PCA. This fact will be proven in the next sections with Monte Carlo simulations. Let us before investigate the employ of the generalized Gini index GGM D ν .

Properties of GGM D ν
Since the Gini PCA relies on the generalized Gini index GGM D ν , let us explore its properties.
Proposition 4.2. Let the eigenvalues of GC ν (Z) + GC ν (Z) be such that Proof. (i) The result comes from the rank-nullity theorem. From the eigenvalue Equation (10), we have: Let f be the linear application issued from the matrix GC ν (Z). Whenever λ ·k = 0, two columns (or rows) of GC ν (Z) are collinear, then the dimension of the image set of f is dim(f ) = K −1. Hence, f ·k = 0. Since b ·k GC ν (Z) b ·k = GGM D ν (f ·k , f ·k ) for all k = 1, . . . , K, then for λ ·k we get: On the other hand, since f ·k = 0, it follows that GGM D ν (f ·k , f · ) = 0 for all = 1, . . . , K. Also, if f ·k = 0 then the centered rank vector r c f ·k = 0, and so GGM D ν (f · , f ·k ) = 0 for all = 1, . . . , K.
(ii) The proof comes from the Rayleigh-Ritz identity: (iii) Again, the Rayleigh-Ritz identity yields: The index GGM D ν (f ·k , f ·k ) represents the variability of the observations projected onto component f ·k . When this variability is null, then the eigenvalue is null (i). In the same time, there is neither co-variability in the Gini sense between f ·k and another axis f · , that is GGM D ν (f ·k , f · ) = 0.
In the Gaussian case, because the Gini correlation matrix is positive semidefinite, the eigenvalues are non-negative, then GGM D is null whenever it reaches its minimum.
Point (iv) shows that the eigenvalues of the standard PCA are proportional to those issued from the generalized Gini PCA. Because each eigenvalue (in proportion of the trace) represents the variability (or the quantity of information) inherent to each axis, then both PCA techniques are equivalent when X is Gaussian: , ∀k = 1, . . . , K ; ∀ν > 1.

The R N -Euclidean space
In classical PCA, the duality between R N and R K enables the eigenvectors and eigenvalues of R N to be deduced from those of R K and conversely. This duality is not so obvious in the Gini PCA case. Indeed, in R N the Gini variability between the observations would be measured by GC ν ( Z) := −2ν N (N −1) (R c z ) Z, and subsequently the idea would be to derive the eigenvalue equation related to R N , The other option is to define a basis of R N from a basis already available in R K . In particular, the set of principal components {f 1 , . . . , f ·k } provides by construction a set of normalized and orthogonal vectors. Let us rescale the vectors f ·k such that: .
Then, { f 1 , . . . , f ·k } constitutes an orthonormal basis of R K in the Gini sense since GM D ν ( f ·k , f ·k ) = 1. This basis may be used as a projector of the variables z ·k onto R N . Let F be the N × K matrix with f ·k in columns. The projection of the variables z ·k in R N is given by the following Gini correlation matrix: whereas it is given by 1 N F Z in the standard PCA, that is, the matrix of Pearson correlation coefficients between all f ·k and z · . The same interpretation is available in the Gini case. The matrix V is normalized in such a way that V ≡ [v k ] are the G-correlations indices between f ·k and z · . This yields the ability to make easier the interpretation of the variables projected onto the new subspace.

Interpretations of the Gini PCA
The analysis of the projections of the observations and of the variables are necessary to provide accurate interpretations. Some criteria have to be designed in order to bring out, in the new subspace, the most significant observations and variables.

Observations
The absolute contribution of an observation i to the variability of a principal component f ·k is: The absolute contribution of each observation i to the generalized Gini Mean Difference of f ·k (ACT ik ) is interpreted as a percentage of variability of GGM D ν (f ·k , f ·k ), such that N i=1 ACT ik = 1. This provides the most important observations i related to component f ·k with respect to the information GGM D ν (f ·k , f ·k ). On the other hand, instead of employing the Euclidean distance between one observation i and the component f ·k , the Manhattan distance is used. The relative contribution of an observation i to component f ·k is then: Remark that the gravity center of {f 1 , . . . , f ·K } is g := (f 1 , . . . ,f ·K ) = 0. The Manhattan distance between observation i and g is then K k=1 |f ik − 0|, and The relative contribution RCT ik may be interpreted rather as the contribution of dimension k to the overall distance between observation i and g.

Variables
The most significant variables must be retained for the analysis and the interpretation of the data in the new subspace. It would be possible, in the same manner as in the observations case, to compute absolute and relative contributions from the Gini correlation matrix V ≡ [v k ]. Instead, it is possible to test directly for the significance of the elements v k of V in order to capture the variables that significantly contribute to the Gini variability of components f ·k . Let us denote U k := Cov(f · , R c z ·k ) with R c z ·k the (decumulative) centered rank vector of z ·k raised to an exponent of ν − 1 and U · := Cov(f · , R c f · ). Those two Gini covariances yield the following U -statistics: Yitzhaki & Schechtman (2013), U k is an unbiased and consistent estimator of U 0 k . From Theorem 10.4 in Yitzhaki & Schechtman (2013), Chapter 10, we asymptotically get that √ N (U k − U 0 k ) a ∼ N . Then, it is possible to test for: Letσ 2 k the Jackknife variance of U k , then it is possible to test for the null under the assumption N → ∞ as follows: 9 U k σ k a ∼ N (0, 1).
The usual PCA enables the variables to be analyzed in the circle of correlation, which outlines the correlations between the variables z ·k and the components f · . In order to make a comparison with the usual PCA, let us rescale the U -statistics U k . Let U be the K × K matrix such that U ≡ [U k ], and u ·k the k-th column of U. Then, the absolute contribution of the variable z ·k to the component f · is: The measure ACT k yields a graphical tool aiming at comparing the standard PCA with the Gini PCA. In the standard PCA, cos 2 θ (see Figure 2 below) provides the Pearson correlation coefficient between f 1 and z ·k . In the Gini PCA, cos 2 θ is the normalized Gini correlation coefficient ACT k1 thanks to the 2 norm.

Figure 2: Circle of correlation
It is worth mentioning that the circle of correlation does not provide the significance of the variables. This significance relies on the statistical test based on the U -statistics exposed before. Because ACT depends on the 2 metric, it is sensitive to outliers, and as such, the choice of the variables must rely on the test of U 0 k only.

22
In this Section, it is shown with the aid of Monte Carlo simulations that the usual PCA yields irrelevant results when outlying observations contaminate the data. To be precise, the absolute contributions computed in the standard PCA based on the variance may lead to select outlying observations on the first component in which there is the most important variability (a direct implication of the maximization of the variance). In consequence, the interpretation of the PCA may inflate the role of the first principal components. The Gini PCA dilutes the importance of the outliers to make the interpretations more robust and therefore more relevant. The mean squared errors of the eigenvalues are computed as follows: where λ oi k is the eigenvalue computed with outlying observations in the sample. The MSE of ACT et RCT are computed in the same manner.
We first investigate the case where the variables are highly correlated in order to gauge the robustness of each technique (Gini for ν = 2, 4, 6 and variance). The correlation matrix between the variables is given by: As can be seen in the matrix above, we can expect that all the information be gathered on the first axis because each pair of variables records an important linear correlation. The repartition of the information on each component, that is, each eigenvalue in percentage of the sum of the eigenvalues is the following.

Eigenvalues
Gini The first axis captures around 82% of the variability of the overall sample (before contamination). Although each PCA method yields the same repartition of the information over the different components before the contamination of the data, it is possible to show that the classical PCA is not robust. For this purpose, let us analyze Figures 3a-3d below that depict the MSE of each observation with respect to the contamination process described in Algorithm 1 above.
On the first axis of Figure 3a, the absolute contribution of each observation (among 500 observations) is not stable because of the contamination of the data, however the Gini PCA performs better. The MSE of the ACTs measured during to the contamination process provides lower values for the Gini index compared with the variance. On the other hand, if we compute the standard deviation of all these MSEs over the two first axis, again the Gini methodology provides lower variations (see Table 2). Gini ν = 2 Gini ν = 4 Gini ν = 6 Variance Axis 1 6.08 6.62 7.41 12.09 Axis 2 4.07 5.12 13.37 2.98 Table 2: Standard deviation of the MSE of the ACTs on the two first axis Let us take now an example with less correlations between the variables in order to get a more equal repartition of the information on the first two axes.
The repartition of the information over the new axes (percentage of each eigenvalue) is given in Table 3. When the information is less concentrated on the first axis (55% on axis 1 and around 35% on axis 2), the MSE of the eigenvalues after contamination are much more important for the standard PCA compared with the Gini approach (2 to 3 times more important). Although the fourth axis reports an important MSE for the Gini method (ν = 6), the eigenvalue percentage is not significant (1.56%).   -4b). We obtain the same kind of results, with less variability on the second axis. In Figures 4a-4b, it is apparent that the classical PCA based on the 2 norm exhibits much more ACT variability (black points). This means that the contamination of the data can lead to the interpretation of some observations as significant (important contribution to the variance of the axis) while they are not (and vice versa). On the other hand, the MSE of the RCTs after contamination of the data, Figures 4c-4d, are less spread out for the Gini technique for ν = 4 and ν = 6, however for ν = 2 there is more variability of the MSE compared with the variance. This means that the distance from one observation to an axis may not be reliable (although the interpretation of the data rather depends on the ACTs).

Application on cars data
We propose a simple application with the celebrated cars data (see the Appendix). 10 The dataset is particularly interesting since there are highly correlated variables as can be seen in the Pearson correlation matrix given in Table 5. capacity x 1 power x 2 speed x 3 weight x 4 width x 5 length  Also, the dataset is composed of some outlying observations ( Figure 5): Ferrari enzo (x 1 , x 2 , x 5 ), Bentley continental (x 2 ), Aston Martin (x 2 ), Land Rover discovery (x 5 ), Mercedes class S (x 5 ), Smart (x 5 , x 6 ).
The overall information (variability) is partitioned over six components (   two axes are sufficient to project the data, the Gini PCA and the standard PCA yield the same share of information on each axis. However, we can expect some differences for absolute contributions ACT and relative contributions RCT . The projection of the data is depicted in Figure 6, for each method. As depicted in Figure 6, the projection is very similar for each technique. The cars with extraordinary (or very low) abilities are in the same relative position in the four projections: Land Rover Discovery Td5 at the top, Ferrari Enzo at the bottom right, Smart Fortwo coupé at the bottom left. However, when we improve the coefficient of variability ν to look for what happens at the tails of the distributions (of the two axes), we see that more cars are distinguishable: Land Rover Defender, Audi TT, BMW Z4, Renault Clio 3.0 V6, Bentley Contiental GT. Consequently, contrary to the case ν =   Tables 7 to 10. Some slight differences appear between the Gini PCA and the classical one based on the variance. The theoretical Section 4 indicates that the Gini methodology for ν = 2 is equivalent to the variance when the variables are Gaussian. On cars data, we observe this similarity. In each PCA, all variables are correlated with Axis 1 and weight with Axis 2. However, when ν increases, the Gini methodology allows outlying observations to be diluted so that some variables may appear to be significant, whereas they are not in the variance case.
Gini (     Tables 8 and 9 (ν = 4, 6) show that Axis 2 is correlated to speed (not weight as in the variance PCA). In this respect the absolute contributions must describe the cars associated with speed on Axis 2. Indeed, the Land Rover discovery, a heavy weight car, is no more available on Axis 2 for the Gini PCA for ν = 2, 4, 6 (Figures 8, 9, 10). Note that the red line in the Figures represents the mean share of the information on each axis, i.e. 100%/24 cars = 4.16% of information per car.

Conclusion
In this paper, it has been shown that the geometry of the Gini covariance operator allows one to perform Gini PCA, that is, a robust principal component analysis based on the 1 norm.
To be precise, the variance may be replaced by the Gini Mean Difference, which captures the variability of couples of variables based on the rank of the observations in order to attenuate the influence of the outliers. The Gini Mean Difference may be rather interpreted with the aid of the generalized Gini index GGM D ν in the new subspace for a better understanding of the variability of the components, that is, GGM D ν is both a rank-dependent measure of variability in Yaari (1987) sense and also an eigenvalue of the Gini correlation matrix.
Contrary to many approaches in multidimensional statistics in which the standard variance-covariance matrix is used to project the data onto a new subspace before deriving multidimensional Gini indices (see e.g. Banerjee (2010)), we propose to employ the Gini correlation indices (see Yitzhaki & Schechtman (2013)). This provides the ability to interpret the results with the 1 norm and the use of U -statistics to measure the significance of the correlation between the new axes and the variables.