Sensitivity of principal component subspaces: A comment on Prendergast’s paper

Abstract: In a recent paper on sensitivity of subspaces spanned by principal components, Prendergast [5] introduces an influence measure based on second order expansion of the RV and GCD coefficients which are commonly used as measures of similarity between two matrices. The goal of this short note is to point out that the paper of Castaño-Tostado and Tanaka [2] is based on a similar approach. However this work seems unknown to Prendergast since it is missing in his references. A comparison of the two papers is provided together with a brief review of some related works.

Throughout this note, we consider a c.d.f. F defined on R p . We assume that the mean µ = zdF (z) and the p × p covariance matrix Σ = (z − µ)(z − µ) t dF (z) exist and that Σ has distinct eigenvalues λ 1 > · · · > λ p associated to normalized eigenvectors v k for k = 1, . . . , p. Letting S denote an arbitrary subset of {1, . . . , p} with K elements (K < p) and V the p × K matrix whose columns are the vectors v k for k ∈ S, we focus on the modification of the column space of V as F is shifted to F ǫ defined as F ǫ = (1 − ǫ)F + ǫδ x where δ x is the Dirac distribution giving mass one at some x ∈ R p .

Tanaka's approach
Tanaka [7] considers the projection operator P = VV t onto the column space of V . When F is modified to F ǫ , P is modified to P ǫ = V ǫ V ǫ t which can be expressed in a convergent power series: for sufficiently small ǫ. Letting S ′ denote the complement of S and y k = v t k (x − µ), Tanaka derives the influence function P (1) of P as: so that P (1) can be used as sensitivity measure. However, in practice, F is generally unknown and must be estimated by the empirical c.d.f.F based on a sample. In his numerical study, following Critchley [3], Tanaka constructs three sample versions of this influence function. The reader is referred to these two papers for further details.

Bénasséni's influence measure
The idea of Bénasséni [1] is to consider matrix measures such as the RV-measure between V and V ǫ . However he notes that RV (V, V ǫ ) = 1 + O(ǫ 2 ) so that the first order coefficient of ǫ vanishes. That is why he introduces the following measure of closeness between V and V ǫ :

Sensitivity of principal component subspaces: A comment on Prendergast's paper 929
and suggests using the coefficient of ǫ in the expansion of ρ 1 as a sensitivity indicator which can be expressed as:

Castaño-Tostado and Tanaka comment
Castaño-Tostado and Tanaka [2] consider the expansion of the RV measure up to the second order: and suggest using [1 − RV (V, V ǫ )] 1/2 as a sensitivity measure.

Prendergast's comment
Prendergast [5] introduces the influence measure defined as: However since RV (V, V ǫ ) = 1 K Tr(PP ǫ ), this influence measure is simply equal to the second order coefficient of ǫ in the expansion of 1 − RV (V, V ǫ ), emphasizing that Castaño-Tostado and Tanaka measure is simply the square root of Prendergast's one which can be expressed as: by developing (5). Prendergast also considers the correlation case, discuss other applications and shows the interest of an approximate sample version of his measure when considering a high-dimensional data set. More generally it should be noted as a concluding remark that all the measures discussed in this paper contain similar sensitivity information as illustrated by the comparison of (3) and (6).