Detecting column dependence when rows are correlated and estimating the strength of the row correlation

: Microarray experiments often yield a normal data matrix X whose rows correspond to genes and columns to samples. We commonly calculate test statistics Z = Xw , where Z i is a test statistic for the i th gene, and apply false discovery rate ( FDR ) controlling methods to ﬁnd in-teresting genes. For example, Z could measure the diﬀerence in expression levels between treatment and control groups and we could seek diﬀerentially expressed genes. The empirical cdf of Z is important for FDR methods, since its mean and variance determine the bias and variance of FDR estimates. Efron (2009b) has shown that if the columns of X are independent, the variance of the empirical cdf of Z only depends on the mean-squared row correlation.Microarraydata, however, frequently shows signs of column dependence. In this paper, we show that Efron’s result still holds under column depen- dence, and give a conservative (upwardly biased) estimator for the mean-squared row correlation. We show Fisher’s transformation for sample cor- relations is still normalizing and variance stabilizing under column dependence, and use it to construct a permutation-invariant test of column inde- pendence. Finally, we argue that estimating the mean-squared row correlation under column dependence is impossible in general. Code to perform our test is available in the R package “colcor,” available on CRAN.


Introduction
Microarray experiments often yield an N × n normal data matrix X whose rows represent genes and columns represent samples. Usually, N is much larger than n. One way to find "interesting" genes is to form a vector Z = Xw, where Z i is a test statistic for gene i, and apply false discovery rate methods to find significant Z i s. For example, if the rows of X have unit variance and w = (1, . . . , 1) / √ n, Z consists of the rescaled mean expression levels (that is, the one-sample tstatistics, but with known variance in the denominator). We could search for significantly large Z i to find overexpressed genes.
The empirical distribution of the entries of Z,F , is an important quantity for false discovery rate methods. Its mean and covariance determine the bias and variance of false discovery rate estimates. Efron (2009b) has approximated the mean and covariance ofF when the columns of X are independent. But many have noted that microarray data sets show signs of column dependence, possibly caused by preparation and lab effects (Chen et al., 2004;Piper et al., 2002). Efron (2009a) developed a permutation test that finds dependence in standard datasets (Efron, 2009a;Cosgrove et al., 2010). In this paper, we consider how to use Efron's approximations under column dependence and propose a permutation-invariant test for column dependence.
Suppose the columns of X are iid N (0, Σ), with Σ ii = 1, and let ρ ii ′ = Σ ii ′ / √ Σ ii Σ i ′ i ′ be the correlation between X ij and X i ′ j . Assuming the columns are independent, Efron (2009b) approximates the mean and covariance ofF using only the mean and mean-squared correlations, not the full correlation structure Σ. Efron shows that these quantities can be estimated well, in contrast to Σ, which is very hard to estimate. If X is not centered and scaled, then we also need the mean of X and the variances Σ ii . Efron's approximations let us calculate the bias and variance of F DR estimates. Suppose we want to find rows with positive mean, and we reject the null of zero mean for all Z i ≥ t. The standard estimator of the false discovery rate for this rejection rule isF whereπ 0 is the estimated fraction of Z i s with zero mean. Sinceπ 0 is usually close to 1 and Φ (t) is known, the bias and variance ofF determine the bias and variance ofF DR. Efron's approximations give us a better picture of the behavior ofF DR by approximating the mean and variance ofF under row correlation.
Most microarray analyses assume independent columns, but column dependence can cause serious problems. For example, it can cause some of the overor underdispersion commonly seen in microarray data. Suppose X is a standardized matrix of expression levels and we want to find genes (rows) that are significantly over-or underexpressed. A standard approach is to calculate a onesample t-statistic Z i for each row, then use an F DR procedure to find Z i s that are significantly far from 0. If the rows of X have unit variance, we can instead use the scaled row means by taking w = (1, . . . , 1) / √ n and Z = Xw. Each Z i is N (0, 1) under the null -if the columns of X are independent. If the columns Histogram of Z i under column correlation. We took N = 10, 000, n = 100, Σ = I and generated X with zero mean and common column correlation 0.05, then calculated the scaled row means Z i = 1 √ n n j=1 X ij . The Z i are highly overdispersed; the solid curve is the N (0, 1) null. Lemma 2 shows that the Z i are N 0, 2.44 2 , which agrees with the histogram (dashed curve).
are correlated the Z i can be substantially over-or underdispersed, as Figure 1 illustrates. Assuming column independence and using the N (0, 1) null can give very misleading results when columns are correlated.
In Section 2 we show that Efron's approximations still work under column dependence, so they can be used to assess the variability of F DR estimates even when columns are correlated. Estimating α 1 and α 2 , however, becomes more difficult. We find the uniformly minimum variance unbiased (UMVU) estimators of α 1 and α 2 under column independence. The estimator of α 1 remains unbiased under column correlation, while the α 2 estimator is upwardly biased. The bias is small unless the columns are quite strongly correlated. Our estimators are conservative: using them in Efron's approximations when columns are correlated gives an upwardly biased estimate of the variance ofF . Efron (2009a) shows that the strong row correlations typically seen in microarray data (Qiu et al., 2005;Owen, 2005) make detecting column dependence tricky. He gave a permutation-based test for column dependence, but desired a test that does not depend on the ordering of the columns.
In Section 3, we show that Fisher's transformation can explain why column dependence is hard to detect and why it makes estimating α 2 difficult. Let z (·) = tanh −1 (·) be Fisher's transformation, andρ ii ′ the sample row correlations. We show that when the columns are dependent, z (ρ ii ′ ) is still approximately normal with mean z (ρ ii ′ ) and constant variance, but the variance depends on the strength of the column dependence. This explains why it is difficult to estimate α 2 under column dependence: α 2 is approximately the variance of z (ρ ii ′ ), so estimating it is like separating out the variance of z (ρ ii ′ ) into the variance of z (ρ ii ′ ) and the unknown noise variance. Transposing the argument shows why row dependence makes column dependence hard to detect.
Finally, in Section 4, we use Fisher's transformation to give a permutationinvariant test for column correlation. Our test formalizes an F DR-based heuristic that Efron uses. The test is particularly useful since its p-values are independent of those produced by Efron's permutation test. Our test has good power in simulations, especially when most columns are weakly correlated but some are highly correlated. It has little power when the column correlations are all of similar size.

Setup
We use the matrix normal distribution to model correlated columns. Our model is X ∼ N (0, Σ ⊗ ∆), meaning that cov (X ij , X i ′ j ′ ) = Σ ii ′ ∆ jj ′ . This notation for the matrix normal is nonstandard, but is used by Efron (2009a), whose results we will need. In this model, Σ controls the row covariance and ∆ controls the column covariance.
We usually assume X is double standardized, so its rows and columns have zero mean and unit variance. Nearly all matrices can be double standardized by successive row and column standardization. Some pathological matrices cannot be double standardized, but Olshen and Rajaratnam (2010) show that these matrices are a set of Lebesgue measure zero in R N ×n . Accordingly, we will assume throughout that X is double standardized.
In our model, double standardization lets us take Σ ii = ∆ jj = 1, and forces each row and column of Σ and ∆ to sum to zero. Double standardizing a normal matrix, however, does not yield a normal matrix. Although centering preserves normality, scaling does not. Modeling X as normal before double standardization is more realistic, but makes many computations intractable. We will usually approximate X as normal after double standardization to make computations easier.
This approximation generally works well, though rigorously quantifying the error of approximation is difficult. Lemma 1 approximates the kurtosis introduced by the first row scaling. Since the double standardization procedure usually converges quickly (Olshen and Rajaratnam, 2010), Lemma 1 gives us some idea of the non-normality introduced by double standardizing a normal matrix.
. Then the skewness of Y ij is zero and Lemma 1 says that row standardization produces no skew and small kurtosis if column correlations are small. If ∆ is close to the identity, row standardization introduces a small negative kurtosis, but depending on the correlation Normal quantile-quantile plots of double standardized normal matrix entries for various (N, n). We generated X ∼ N (0, Σ ⊗ ∆), then double standardized. The off-diagonal entries of Σ, ∆ were all 0.1 to simulate moderate row and column correlation. structure, column correlation can introduce positive kurtosis as well. Transposing Lemma 1 shows that column standardization would introduce a kurtosis of if it were done on a normal matrix. Thus if row standardization approximately preserves normality, the following column standardization approximately preserves normality, as long as the row correlations are not too extreme. Subsequent row and column standardizations have much less of an effect, since after the first row and column standardizations, the matrix is approximately centered and scaled.
The lemma suggests that approximating X as normal after double standardization is reasonable, provided n, N are large and the row and column correlations are not too extreme. Figure 2 shows the distribution of the entries of a double-standardized X with weak row and column correlation for different values of n and N . The entries look fairly normal for N ≥ 100 and n ≥ 50.
We are interested in the empirical cdfF of Z = Xw, taking without loss of generality w = 1. When the columns are independent, Z ∼ N (0, Σ). It is easy to calculate the mean and covariance of Z for general ∆.
Allen and Tibshirani (2010) find the variance of each Z i when Z is a vector of two-sample t-statistics; Lemma 2 generalizes their result. It has two important implications. First, as Allen and Tibshirani (2010) note, w ′ ∆w changes the variance of each Z i . Column dependence can thus cause the over-or underdispersion often seen in microarray data.
Second, the approximate normality of Z lets us use Efron's approximations for the mean and variance ofF . Efron's approximations estimate the variance of Z, so they can account for the factor w ′ ∆w; it is easy to estimate since it is common to all the Z i . The lemma shows that column dependence does not affect the correlation structure of Z, so to use Efron's approximations, we still need the same mean and mean-squared correlations α 1 and α 2 as we would if the columns were independent.
Estimating α 1 and α 2 becomes more difficult when the columns are dependent. As Efron (2009a) notes, double standardization forces the row and column sample correlations to have the same mean and variance. For example, the meansquared row correlation is 1 n 2 N 2 tr ((XX ′ ) (XX ′ )) = 1 n 2 N 2 tr ((X ′ X) (X ′ X)), which is just the mean-squared column correlation. This makes it hard to estimate α 2 and to test for column correlation. Row correlation can create the appearance of column correlation, and column correlation can inflate estimates of row correlation.

The independence-UMVU estimator
We can, however, find conservative estimators of α 1 and α 2 , in the sense that under column correlation, the estimators are unbiased and positively biased, respectively. We begin by calculating their uniformly minimum variance unbiased (UMVU) estimators assuming the columns are independent. This result assumes that X ∼ N (0, Σ ⊗ I) and is not double standardized. This lets us apply a result of Olkin and Pratt (1958) to find the UMVU estimators easily.
Then the UMVU estimators of α 1 and α 2 areα 1 = 1 and the approximations are particularly accurate if mostρ ii ′ are small.
Lemma 3 gives the independence-UMVU estimators of α 1 and α 2 , and more useful linear approximations (accuracy bounds for the linear approximations are given in the Appendix). These approximations let us avoid forming the N × N sample row correlation matrixΣ, since i<i ′ρ 2 ii ′ = 1 2 ( Σ 2 F − N ), and we can compute the Frobenius norm using XX ′ 2 F = X ′ X 2 F . We now calculate these estimators' bias under column dependence. To do this, we approximate X as N (0, Σ ⊗ ∆) after double standardization. Centering makes each row and column of Σ and ∆ sum to zero. This fixes α 1 = − 1 N −1 . The α 1 estimators, however, are defined more generally, and the bias of the α 1 estimators under column correlation may be of interest when α 1 is not fixed. We thus assume that Σ ii = ∆ jj = 1, but do not require Σ and ∆ to be centered. Our basic tool is the following lemma, proved by Efron (2009a).
Lemma 4 (Efron). Suppose X ∼ N (0, Σ ⊗ ∆) with Σ ii = ∆ jj = 1. Letσ ii ′ be the sample covariance between columns i and i ′ in the double standardized model, and let ∆ 2 F = tr(∆ 2 ) be the Frobenius norm. Then Lemma 4 says that the sample row covariances are unbiased and have a Wishart covariance structure, but column dependence reduces the degrees of freedom from n to n 2 ∆ 2 F . An analogous formula holds for column covariances. We now apply Lemma 4 to approximate the bias of the independence-UMVU estimators and their linear approximations. Lemma 5 gives a simple and accurate approximation; messier but more precise formulas are in the Appendix.
Lemma 5 shows that our α estimators are conservative if the columns are in fact correlated. Efron's approximations for the variance ofF increase linearly as α 1 and α 2 increase. The lemma thus guarantees that even under column correlation, using our estimators in Efron's approximations gives conservative, upwardly biased, estimates of the variance ofF .
Lemma 5 also tells us that our α 2 estimators' bias is just the mean-squared correlation of the columns. The bias is small unless the column correlations are comparable in strength to the row correlations. In the microarray setting, this would mean that the correlations between arrays are comparable to the correlations between genes, which would be a very bad experimental situation.
For the rest of this paper, we assume that X has been centered, so α 1 is known and only α 2 is left to estimate. To simplify notation, we drop the subscript and denote α 2 by α. This gives us different notation from Efron (2009b), who uses α for the root -mean-squared correlation.

Transforming correlations
Estimating δ, the mean-squared column correlation, would let us assess the extent of column correlation and improve our estimate of α. We will now see why this is difficultα and δ are hard to separate. Fisher's transformation reveals that estimating α and δ is like trying to estimate the variances of two independent random variables based on observing their sum.
More generally, Fisher's transformation simplifies the study of row and column correlations by making the sample correlation distribution easier to understand and manipulate. The distribution of a sample row correlation depends on the true row and column correlations in a complicated way. Fisher's transformation simplifies this dependence by normalizing the sample row correlation and stabilizing its variance. The transformed sample row correlation is approximately normal with mean depending on the true row correlation and variance depending on the mean-squared column correlation. The same holds for sample column correlations as well, and we will use this to test for column correlation.

Fisher's transformation
We first show that Fisher's transformation for correlations still works when columns are dependent. We assume that X is normal after double standardization, so X ∼ N (0, Σ ⊗ ∆) with Σ ii = ∆ jj = 1, and Σ, ∆ have zero row and column sums.
An analogous formula holds for the transformed column correlations. Theorem 1 follows from the next two lemmas, which both use the delta method. Lemma 6 shows that the transformation is still variance stabilizing.
Lemma 6. The delta method approximation for the mean and variance of z Lemma 7 shows Fisher's transformation is still approximately normalizing.
Lemma 7. The skewness and kurtosis ofρ ii ′ are approximately These are also the delta method approximations to the skewness and kurtosis of z (ρ ii ′ ).
Lemma 7 shows that the skewness and kurtosis of the sample correlations is usually small. Since the off-diagonal elements of ∆ k decay rapidly, the tr(∆ k ) ∆ k F terms behave roughly like n 1− k 2 . Switching Σ and ∆ gives analogous formulas for the sample column correlations.
Lemmas 6 and 7 together show that Fisher's transformation is still approximately normalizing and variance stabilizing when the columns are correlated. The variance of the transformed row correlations, though, depends on the meansquared column correlation, and vice versa. The transformation is not perfect. It works very well in the center. If columns are highly correlated, however, the transformed row correlations can have lighter or heavier tails, and vice versa.

Interpretation
Theorem 1 relates estimating α and δ to a more familiar problem. If the columns were independent, the transformed row correlations z(ρ ii ′ ) would have known, equal variance. Since z(ρ) ≈ ρ for small ρ, α is roughly the variance of the z(ρ ii ′ ). We could thus estimate α by measuring the variance of the z(ρ ii ′ ) and subtracting the known noise variance.
When the columns are dependent, however, the variance of the z(ρ ii ′ ) is equal but unknown. This makes it impossible to separate the variance of the z(ρ ii ′ ) into row correlation (the variance of z(ρ ii ′ )) and column correlation (noise variance) components.
Estimating α and δ this way is like trying to estimate the variances of two independent random variables after only observing their sum. It is thus impossible without some strong assumptions on Σ or ∆. This explains why estimating α under column dependence is so difficult: any estimator must either rely on the failure of Theorem 1 or use some information not captured by treating the sample correlations as one large vector.
This view also explains the conservatism of our α estimators. Our estimators were designed for independence; they essentially take the variance of theρ ii ′ and adjust it using the known noise variance. Dependence increases the noise variance by roughly δ, since ∆ 2 F = n + n(n − 1)δ. This widens theρ ii ′ distribution and biases our estimates upward. Lemma 5 says that the bias of our α estimators is roughly the increase in noise variance.

Testing for column correlation
Testing for column dependence is easier than estimating it. Efron (2009a) gave a permutation test of column dependence, but desired a test that does not depend on the ordering of the columns. In this section, we use our results on Fisher's transformation to give such a test.
Even if the columns are independent, standardization introduces small correlations between them. Our null hypothesis is thus not ∆ = I, but ∆ = n n−1 (I − 1 n 11 ′ ), where 1 is an n-vector of ones. Our previous results make it easy to approximate the null distribution of the off-diagonal correlations, assuming X is normal after double standardization.

Lemma 8. Under the null hypothesis
The transformed correlations all have the same mean and variance, and are themselves nearly uncorrelated. Testing for column dependence is thus approximately the same as observing ( n 2 ) independent normal random variables of the same unknown variance, and testing if they all have the same mean. If we center and scale the z(∆ jj ′ ), this reduces to testing whether a collection of ( n 2 ) independent N (0, 1) random variables all have zero mean, or if some have nonzero mean. Figure 3 illustrates this idea. They show the centered and scaled column correlations for two data sets considered by Efron (2009a). There are many more large correlations than we would expect under the N (0, 1) null. This suggests both data sets have some column dependence, and indeed, our test strongly rejects the null in both cases. Efron (2009a) used a similar idea to explore the extent of column correlation. Our test builds on Efron's idea by using Fisher's transformation to justify an approximately normal null, and a different test for nonzero means. Donoho and Jin (2004) introduced the "higher criticism" procedure to test whether many independent random variables are all N (0, 1), or if some are normal with nonzero mean. The idea behind higher criticism is simple. For a fixed rejection threshold t, we can test the N (0, 1) null by seeing if significantly many variables fall outside [−t, t]. Donoho and Jin (2004) maximize the resulting tstatistic over a range of t.
The higher criticism statistic takes the following form for our data: whereĜ is the empirical cdf of the centered and scaled z(∆ jj ′ ). Based on the recommended interval in Donoho and Jin (2004), we choose the maximizing interval [a, b] to be [Φ −1 ( 1 4 ), Φ −1 ( 1 n(n−1) )] . The null distribution of Λ can be calculated by repeatedly simulating ( n 2 ) N (0, 1) random variables, centering and scaling them, and calculating Λ for each simulated collection.
The transformed column correlations can have slightly heavy tails, especially under moderate row correlation. Figure 4 shows a normal quantile-quantile plot of the centered and scaled z(∆ jj ′ ) under the null and moderate row correlation. The figure shows that the correlations are quite normal in the center, but somewhat heavy-tailed. This deviation from normality can lead to spuriously large values of Λ. Interestingly, the problem is worst for moderate row correlation. When the row correlation is low, Lemma 7 says that the transformed correlations have low kurtosis. When the row correlation is high, the normal after double standardization approximation begins to break down, and the entries of X are light-tailed, as Lemma 1 predicts. Figure 5 shows normal quantilequantile plots of the centered and scaled z(∆ jj ′ ), under the null and low or high row correlation. The correlations are more normal than they are for moderate correlation.  Shrinking the z(∆ jj ′ ) can control the tails and yield better behaved Λs. The obvious approach would be to use the t-distribution that matches the first four cumulants of z(∆ jj ′ ). That is, we would base Λ on Φ −1 (F t (z)), where F t is the scaled t-distribution with appropriate degrees of freedom. However, the correlations are not as heavy tailed as a t-distribution that matches their variance and kurtosis, so the t-transformation overshrinks in the far tail. This hurts power dramatically. One good compromise is a linear approximation to the t-transformation, which shrinks the bulk of the data appropriately but does not overshrink the far tail. Shrinking by a factor of 0.9 approximates the ttransformation with 4 df reasonably well between [−3, 3]; this is a conservative choice of degrees of freedom, since it gives the t distribution infinite kurtosis.
The test's power depends on both the row and column correlations, and indirectly on the sample size. The higher α is, the higher the variance of the column correlations, so the lower our power. Increasing N will increase our power if α decreases, but will have little effect if α remains the same. The test is also sensitive to the particular configuration of column correlations. If the nonzero column correlations are all small, they will only inflate the center of the z(∆ jj ′ ) histogram; centering and scaling will eliminate this effect, and we will be unable to detect the correlations. We have more power when most column correlations are small but some are large. The larger and more numerous the large correlations are, the more power the test will have. Increasing n will increase power if the fraction of large correlations remains the same, since we will get better estimates of the column correlation distribution.

Simulations
We investigate the level and power of our test using simulations. We took N = 1000, n = 50 and generated data with a block row dependence structure. We divided the rows into 5 blocks, with constant correlation within block and no correlation between blocks, then standardized the row correlation matrix. The row correlations ranged from small (independence before standardization) to extreme (correlation of 0.9 within blocks before standardization). We check the level of the test under column independence and its power to detect three column correlation structures. The first corresponds to a "batch effect" -the columns have a block correlation structure like the rows, with 10 blocks. The second corresponds to an adjacent array effect, with ∆ jj ′ = ρ −|j−j ′ | . The third corresponds to a contamination setting, where most columns are uncorrelated, but a small block of 5 columns has constant correlation. All these descriptions are before standardization, which changes the exact correlations slightly but does not alter their broad structure. We determine the test's power in each of these settings under low, moderate or high row correlation. In every simulation, we first generate X ∼ N (0, Σ ⊗ ∆), then double standardize.
Table 1 shows our test maintains its level across the range of row dependence. It is very conservative because of the shrinkage. This effect is most pronounced for weak or strong row dependence, since the correlations are heaviest-tailed for Table 1 Level of test at different row correlations (simulation described in text). The range of αs correspond to within block row correlation ranging from 0 to 0.9, before standardization. The results are for 1000 replications, so standard errors are all less than 0.0055 Despite its conservatism, our test maintains decent power under low to moderate row correlation. Tables 2, 3 and 4 show the results of the power simulations. As we would expect, the power increases with the column correlation and the number of large column correlations, but decreases with the row correlation. The batch effect is easiest to detect, followed by the adjacent array effect and the contamination effect. Surprisingly, the power of our test can actually fall as the column correlations become stronger. In the adjacent-array scenario under high row correlation, our test's power falls when ρ increases from 0.800 to 0.888. This happens because when ρ becomes very large, many columns become weakly correlated, and these small correlations widen the center of the z(∆ jj ′ ) histogram.
Our test seems to be able to detect the kinds of column dependence seen in real data sets. Figure 3 shows that our test strongly rejects on the data sets considered by Efron (2009a). We can gain additional power by combining our test with Efron's permutation test. Since our test is permutation-invariant, the two methods yield independent p-values.
Code to perform our test is available in the R package "colcor," on CRAN.

Acknowledgements
The author thanks Professor Bradley Efron for many very helpful discussions and suggestions, and the referees, whose comments clarified and improved the paper.

Appendix A: Proofs
A.1. Proof of Lemma 1 Y ij is centered, approximately unit variance, and symmetric, so we only need to approximate the fourth moment. Since X is normal and ∆ jj = 1, it is easy to show that Using the Taylor series approximation

A.2. Proof of Lemma 2
Clearly Z is normal with mean 0. We then have

A.3. Proof of Lemma 3
The argument is the same forα 1 andα 2 , so we will only prove the result for α 2 . Olkin and Pratt (1958) show that for a bivariate normal with correlation ρ, f 2 (ρ) is UMVU for ρ 2 . (X ij , X i ′ j ), j = 1, . . . , n are iid bivariate normal with correlation ρ ii ′ , so E(f 2 (ρ ii ′ )) = ρ 2 ii ′ . Summing this equality provesα 2 is unbiased for α 2 . Now,ρ ii ′ is complete sufficient for ρ ii ′ , so f 2 (ρ ii ′ ) is still UMVU for ρ ii ′ in the multivariate normal situation. This means it is uncorrelated with any unbiased estimator of 0. Sinceα 2 is a linear combination of these, it is also uncorrelated with any unbiased estimator of 0, and is thus UMVU for α 2 .

A.5. Proof of Lemma 6
Let D be the sample row covariance matrix of the centered, column standardized matrix (so ∆ jj = 1 but not necessarily Σ ii ). Using the argument of Lemma 4, D has the mean and covariance of a ( ∆ 2 F n 2 ) −1 W ishart(∆, ∆ 2 F n 2 ) random matrix. Now apply the delta method to get the variances ofρ ii ′ as a function of D. The only effect of dependence is to replace 1 N with Σ 2 F N 2 , so the delta method under independence yields the result.

A.6. Proof of Lemma 7
We first approximate the third and fourth moments of the sample correlations. The only approximation we make is to replace the standard deviations denominator by their expectation, 1, yielding nρ ii ′ = j x ij x i ′ j .
Expanding the skewness and kurtosis in terms of the raw moments and some algebra completes the proof.