Inferential procedures based on the weighted Pearson correlation coefficient test statistic

In this note, we evaluated the type I error control of the commonly used t-test found in most statistical software packages for testing the hypothesis on H0:ρ=0 vs. H1:ρ>0 based on the sample weighted Pearson correlation coefficient. We found the type I error rate is severely inflated in general cases, even under bivariate normality. To address this issue, we derived the large sample variance of the weighted Pearson correlation. Based on this result, we proposed an asymptotic test and a set of studentized permutation tests. A comprehensive set of simulation studies with a range of sample sizes and a variety of underlying distributions were conducted. The studentized permutation test based on Fisher's Z statistic was shown to robustly control the type I error even in the small sample and non-normality settings. The method was demonstrated with an example data of country-level preterm birth rates.


Introduction
To date, no valid procedure has been proposed for the inference of Pearson correlation coefficient based on a weighted Pearson correlation statistic.The commonly used statistical software such as SAS (SAS Institute Inc., Cary, NC, USA) does not provide valid inference results for weighted Pearson correlations.Built upon the methods by DiCiccio [5] and [12], we derived the large sample variance of the weighted Pearson correlation and proposed a set of studentized permutation tests.We will show the proposed tests can achieve robust the type I error control even when the sample size is small and the data does not follow normal distributions.
Throughout this note, let (X 1 , Y 1 ), (X 2 , Y 2 ), . . ., (X n , Y n ) be n paired observations from a non-degenerate joint distribution F XY (x, y).The weighted sample Pearson correlation estimator for ρ and fixed weights is given as where Xw = n i=1 w i X i , Ȳ = n i=1 w i Y i , n i=1 w i = 1 and 0 < w i < 1, i = 1, 2, . . ., n.When all w i = 1/n the estimator at (1) reduces to the classic sample Pearson correlation estimator, which we will denote as ρ.In practice, the weights often reflect the relative importance of each observation.For example, it can be the population size if we are interested in the correlation of population-level measurements.The weights may also reflect the possibility of an observation of being an outlier or influential data point if a robust estimation is of utmost importance.Although the sample weighted and unweighted correlations will have different values, we emphasize that, in our setting, both are consistent estimates of the same population correlation coefficient.
Historically for the case when all w i = 1/n the majority of statistical software packages, both commerical and freely available, utilize the oftentimes unrealistic underlying assumption of bivariate normality for testing H 0 : ρ = 0. Investigation of the test's robustness to the distribution assumption can be traced back to E.S. Pearson [17], who has concluded that the distribution of ρ is robust to 'mutilation' and 'distortion' to a remarkable degree, and a similar erroneous conclusion was drawn by later investigators as well [10].In fact, extensive simulation-based studies, especially in the area of psychology, have noted that the tests tend to fail in type I error control when the bivariate normality assumption does not hold (e.g.[1][2][3][4]6]).
More recently, one can find examples of what we term the 'naive permutation' test for testing H 0 : ρ = 0, which consists of permuting either the column of X's or Y's and comparing ρ to the tail of the permutation distribution of the permuted ρ's to obtain a one-sided p -value.However, what is often overlooked in this testing scenario is that the assumption of exchangeability is often not satisfied, i.e. the permutation test of H 0 : ρ = 0 is only exact when the hypothesis is equivalent to H 0 : F XY = F X F Y , which only holds under limited distribution assumptions, such as bivariate normality.Even though the permutation tests of correlation coefficients have been studied for decades [11], mistakes regarding its applicability under non-normality have been constantly made, even in recent works [22].
Recently, DiCiccio and Romano [5] provided a thorough discussion of this subject and proposed a studentized permutation test that asymptotically controls the type I error for testing H 0 : ρ = 0.The test robustly controls type I error even when the exchangeability assumption does not hold, which is achieved by studentizing the test statistic ρ by its large sample variance under H 0 .The standardization ensures the test statistic's permutation distribution and sampling distributions both converge to the corresponding quantiles of the standard normal distribution.Such form of the standardization was first utilized by Tschuprow et al. (1925) for testing H 0 : ρ = 0. On the other hand, most of the commonly used tests for testing H 0 : ρ = 0, as implemented in many statistical software packages, strongly rely on the bivariate normality assumption, which is rarely satisfied in real-life data analysis problems [9].Recently, Yu and Hutson [23] developed the R package perk available at github for calculating the appropriate Pearson correlation permutation test based on the work of [5].In addition, this package accommodates the permutation test for the Spearman correlation and the concordance correlation coefficient based on the same principles.More recently, Hutson [12] derived a bootstrap approach to test the more general hypothesis H 0 : ρ = ρ 0 , which for the specific case of H 0 : ρ = 0 generally outperforms the permutation of [5] in terms of the desired Type I error control.
Using the above as a stepping-off point we now focus on the weighted estimator at (1).There are a variety of reasons provided for utilizing the weighted Pearson correlation from incorporating weights as a function of the time data was collected to downweighting extreme observations.Weights may be fixed or random.The random versus fixed weight scenarios need to be treated as distinct cases.Currently, inference about H 0 : ρ = 0 when using the test statistic ρw at (1) is sketchy at best and oftentimes completely wrong in routines found in commercial and freely available software.For example, PROC CORR, SAS 9.4 ( SAS Institute Inc., Cary, NC, USA), provides a p-value for the test H 0 : ρ = 0 assuming bivariate normality and utilizing the t-distribution as per [8].However, when the WEIGHT statement is used SAS provides an incorrect p-value assuming ρw follows the same underlying distribution as ρ, which is not the case due to a different variance between the two estimators, which in turn leads to a biased test with dramatically increased Type I error rates.Furthermore, SAS also provides a p-value via Fisher's z-transformation for the more general hypothesis H 0 : ρ = ρ 0 for which they state in their online support that 'Furthermore, even the distribution of z is not strictly normal, it tends to normality rapidly as the sample size increases . . .', which [5] clearly illustrate is not true via a detailed simulation study contained in their supplemental material for the specific test H 0 : ρ = 0.In fact, [5] show that the test based on Fisher's z-transform behaves similarly and poorly in a variety of scenarios to that based on the t-distribution version of the test.These errors are compounded when a weighting scheme is considered.Similar errors about testing H 0 : ρ = 0 when using the test statistic ρw at (1) are found across a variety of freely available R packages.
Interestingly, for the case of equal weights very little attention or study has been given to the large sample approximation for the sampling distribution of ρ, which does not rely on bivariate normality and is essentially 'distribution free'.The asymptotic distribution can be derived using the delta method and can be found in [19], which was utilized in the recent work by Hutson [12].Here we give a brief description of the derivation.Firstly, we note that the sample estimator can be expressed as ρ = g(V), where such that we have For expression (3) is the 5 × 5 variance-covariance matrix for the vector of the components of the summands of V given by . The elements of d are given as Comment.As is well-known for the specific case that An estimator for the variance of ρ may be given by substituting the respective moment estimators into (4).Hutson [12] illustrated that for samples of size n > 50 this approximate worked well in terms of type I error control for testing the general hypothesis H 0 : ρ = ρ 0 .
In this note, we combine the elements of large sample theory and the methodology of [5].In Section 2 we provide the large sample approximation for the distribution of ρw for both fixed and random weights.In Section 3 we extend the approaches of DiCiccio and Romano [5] and Hutson [12] for tests of H 0 : ρ = 0 using the weighted Pearson correlation test statistic.In Section 6 a real-world example on the correlation of country-level preterm with stillbirth and neonatal death rates was used to demonstrate the proposed tests.Overall, we will show that the proposed test is highly robust in type I error control and can lead to different conclusions in real-world data analysis.The routine usage of t-test for the inference of weighted Pearson correlation coefficient needs to be carefully re-examined.

Asymptotic distribution for the sample weighted Pearson correlation
In this section, we provide results for the large sample theory for ρw at (1) for both the fixed and random weight settings.

Fixed weights
In terms of preliminaries, we start with the work of Fisher [7] re-written in order to accommodate our notation.For the sequence of real numbers Then we have Lemma 2.1 (Fisher [7]): Sufficient conditions for the strong law (5), to hold are that This yields where Z is a standard normal random variable.Condition A at (6) is described in detail in [7].
It refers to the class of sequences for w n such that the above Theorem holds.Our restriction that n i=1 w i = 1 and 0 < w i < 1, i = 1, 2, . . ., n satisify 'Condition A' at (6).Given the preliminary theory worked out by [7] we now have the following: Theorem 2.2: The asymptotic distribution for ρw defined at (1) for large n is given as where the elements of vector d are given at (4) and is the 5 × 5 variance-covariance matrix for the vector of the components of the summands of V defined at (2) given by Proof: Following a somewhat similar approach found in [19] for ρ first note that ρw may be written in the form which may be expressed as ρw = g(V) where and By Theorem 3.1 of [7] the vector V is AN(E{V}, n i=1 w 2 i ), where the 5 × 5 matrix is the covariance matrix for (X, Y, X 2 , Y 2 , XY).It follows from standard theory pertaining to the transformations of asymptotic normally distributed variables, where that (8) holds.
Comment.Similar to the unweighted case, if Corollary 1: The asymptotic relative efficiency (A.R.E) of ρw to ρ, for weights constrained throughout this note as n i=1 w i = 1 and 0 The A.R.E.result follows from (8) and (3), respectively.
As a simple illustration of the A.R.E suppose we had a sample of size n = 10, where the first five observations had w i = 1/15, i = 1, 2, . . ., 5 and latter five observations had w i = 2/15, i = 6, 7, . . ., 10. Then the A.R.E. of ρw to ρ is 90%.If we had a sample of size n = 10, where the first five observations had w i = 1/20, i = 1, 2, . . ., 5 and latter five observations had w i = 3/20, i = 6, 7, . . ., 10. Then the A.R.E. of ρw to ρ is 80%.In general, the use of the weighted correlation comes down to the weighting motivations, e.g.robustness considerations or other logical reasons, versus efficiency as compared to the unweighted Pearson correlation estimator.

Random weights
For the random weights case, we will utilize the work of Roozegar and Soltani [18] as a starting point.Let S n = n i=1 W i X i denote a randomly weighted average, where In their work, Roozegar and Soltani [18] considered the more general unequal variance case per observation, which goes beyond our needs and hence we can reduce some of their notation to fit our goals.The work of Roozegar and Soltani [18] is focused on the case where the random weights W i are found by generating the uniform order statistic U = (U (1) , . Then the random weights are defined as See Ref. [18] for detailed proof based on the characteristic function of T n converging to the characteristic function of a normal distribution.Denote the randomly weighted sample Pearson correlation estimator for ρ with random weights defined above as W i = U (i) − U (i−1) , i = 1, 2, 3, . . ., n, where U (n) = 1 and U (0) = 0 and U (i) is the ith uniform order statistic from a sample of size n−1 as follows: where Theorem 2.3: For large n and ρW defined at (15), we have the following: where the elements of d are given at (4).
Proof: First, note that n n+1 → 1 as n → ∞.The rest of the proof follows identically to Theorem 1 for fixed weights where the vector V is AN(E{V}, 2 n ) by Lemma 2.5 of [18].The rests of the steps of the proof are identical to the fixed weights scenario where again the elements of the vector d are given at (4).
Comment.It follows from Theorem 2 that the A.R.E. of ρW , with random weights defined as W i = U (i) − U (i−1) , i = 1, 2, 3, . . ., n, where U (n) = 1 and U (0) = 0 and U (i) is the ith uniform order statistic from a sample of size n−1, to that of ρ with fixed weights 1/n is 50%.
A more general statement about the large sample distribution for ρW defined at (15) may be made if similar to the fixed weight scenario we assume n i=1 E(w i ) = 1 and 0 < E(w i ) < 1, i = 1, 2, . . ., n.We then have the following: Corollary 2: For large n and ρW defined at (15), we have the following: where the elements of d are given at (4).
Proof: We basically follow the same basic steps and assumptions of [18] replacing dividing a U(0, 1) distribution with a general absolutely continuous distribution having support (0,1) such that as stated above n i=1 E(w i ) = 1 and 0 < E(w i ) < 1, i = 1, 2, . . ., n.

Variance estimators for ρw and ρW
In order to estimate the Var( ρw ) one needs an estimator for the variance-covariance matrix at (8), where as stated previously is the 5 × 5 variance-covariance matrix for the vector . Towards this end define the weighted moment estimators for E(X), E(Y), E(X 2 ), E(Y 2 ) and E(XY) in standard fashion as xw = n i=1 w i x i , ȳw = n i=1 w i y i , x 2 w = n i=1 w i x 2 i , y 2 w = n i=1 w i y 2 i and xy w = n i=1 w i x i y i , respectively.Similarly, define the moment estimators for Var(X), Var(Y), Var(X 2 ), Var(Y 2 ) and Var(XY) in standard fashion as s 2 , respectively.The n/(n − 1) correction factor is so that the weighted moment estimators align with the unweighted case when all w i = 1/n.Now for compactness of notation let Then the moment-based estimator for the variance-covariance matrix for where the elements 5 × 1 vectors p, q, r, s and t are provided in the paragraph above for i = 1, 2, . . ., n.It follows that where ˆ is given at (19) and the elements of d are defined at (4).
Comment.For the estimator ρw based on random weights, we can use the same variance estimator at (20) in that the observed W i 's are in essence moment estimators for E(W i )'s.

Permutation testing about H 0 : ρ = 0
Diciccio and Romano [5] provide the theoretical justification in the unweighted case for when the straightforward permutation test about H 0 : ρ = 0 is exact, i.e. when it is equivalent to testing H 0 : F XY = F X F Y under exchangeability assumptions holding.They then note that when the permutation test for H 0 : ρ = 0 is not equivalent to testing H 0 : F XY = F X F Y the Type I error may be either dramatically over or underinflated conditional upon the dependence structure of F XY .The correction to this issue is the proper standardization of the test statistic √ n ρ under permutations such that a test is constructed that asymptotically may be shown to control the Type I error rate at the desired level.
We follow the same prescription as Diciccio and Romano [5] for the weighted sample Pearson correlation as per the unweighted case.First, define G n to be the set of all permutations π of {1, . . ., n} for testing independence between two random variables X and Y. Then the permutation distribution of any given weighted test statistic T n,w (X n , Y n ) is defined as where Y n π represents a given permutation {Y π(1) , . . ., Y π(n) }.In this setting, the permutation G n is all possible pairwise combinations between (w n , X n ) and Y n .The key is to note in the weighted setting is that the form T n,w (X n , Y n ) assumes fixed weights such that for each permutation we define the test statistic to have the form i.e. the weights, w i , i = 1, 2, . . ., n, are held fixed per each permutation.For testing H 1 : ρ > 0, the null hypothesis will be rejected if the test statistics is larger than the 1 − α quantile of the permutation distribution.The test is exact under the exchangeability assumption.
In the weighted setting, it requires the distribution of ((w n , X n ), Y n ) is invariant under the group of transformations G n .
The test using the statistics T n,w (X n , Y n ) = ρw is exact when testing H 0 : ρ = 0 is equivalent to testing where F X and F Y are marginal distributions, i.e. exchangeability holds, see Theorem 6 Lehmann (1986).This equivalence holds for underlying distributions such as the bivariate normal distribution and other elliptically contoured distributions.
The weighted Pearson correlation permutation test follows the same identical algorithm as per the unweighted case as derived by Diciccio and Romano [5].When testing H 0 : ρ = 0 is not equivalent to testing H 0 : The results follow straightforward from Theorem 1 and given H 0 : ρ = 0 true.The same large sample result holds for random weights under the conditions of Theorem 2 and given S w,n = √ 2/n ρw /τ .Comment.Practically speaking there is no difference between the behavior of the permutation test for H 0 : ρ = 0 given fixed or random weight since for the fixed weight procedure n i=1 w 2 i is a fixed constant/scalar for the observed ρw and its permuted value given the form (22).The behavior of the permutation test of H 0 : ρ = 0 based on ρw in terms of Type I error control will be examined extensively in the next section.
Note that we can use either the variance estimator in Equation (20) or the form derived by DiCiccio or Romano [5] for studentization.We denote the first test as Perm-1 and the second as Perm-2.The large sample variance is a function of ρ.Equation ( 20) is equivalent to plugging in ρw , while the formula by DeCiccio and Romano can be obtained by substituting ρw with 0 (Perm-2).
More generally, we can define the weights in permutation samples as a function of the original weights and the permutation, where w is the vector of original weights, and w * is the weights under permutation.One example of f (w, π) is to generate a new weight for each of the permuted observation based on the average of the original weights of corresponding X and Y. Namely, assume the sample after permutation is (X n , Y n π ), then the weight will be We denote this test as Perm-ave.Finally, since the variance estimation relies on large sample approximation, we postulate a test based on Fisher's Z-transformation of ρ may further improve the type I error control.The transformed statistic is defined as, Z = − 1 2 log 1+ ρw 1− ρw and its large sample variance can be approximated by (1 − ρ 2 )Var(ρ).Similarly, two variance estimators can be obtained by substituting ρ with ρw (Perm-Z1) or 0 (Perm-Z2).

Permutation testing about H
More generally, we may be interested in testing H 0 : ρ = ρ 0 versus H 1 : ρ > ρ 0 , for which a conventional permutation test cannot be directly applied.In this case, the asymptotic test based large sample approximation will be a natural option, though it may have an inflated type I error rate when n is small.Alternatively, a permutation test about non-zero correlations under the null can be achieved by using a de-correlated sample, as introduced by [13].This approach ensures type I error control for testing the non-zero concordance correlation coefficient even when n is as small as 10.
We adopt this method in testing the non-zero null hypothesis.Specifically, following [13], the original observations are standardized by such that the U i and V i will have zero means and unit variances.The new variables can be then de-correlated by where A(•) is defined as , which satisfies Thus we have ρ(U, V) → 0 under H 0 .Therefore, the original hypotheses are transformed to H 0 : ρ(U, V) = 0 versus H 1 : ρ(U, V) > 0. The testing procedure will follow that in Section 3, but applied to U and V.In this section, we examined whether the asymptotic test and and Perm-Z2 can also robustly control the type I errors when ρ 0 = 0.The same settings as in Section 5.1 were used.We examined the weights generated based on B(1, 1), because similar results are expected under different weight distributions.To generate the data with desired correlation, the X and Y were first standardized by the population standard deviations respectively.The standardized observations were then transformed through

Test for H
where ρ is the target correlation.
Similar to what was presented in Section 5.1, the results in Figure 4 show that the asymptotic test tends to have an inflated type I error when n is small, but the error rates converge to 0.05 when n is large.The Perm-Z2 test still shows robust type I error control.The test tends to be slightly conservative under certain scenarios, e.g.t 4.1 and Uniform when ρ 0 = 0.6.These results suggest the asymptotic variance and its estimation is still valid when the underlying correlation is non-zero.The modified permutation tests can also maintain the type I error control under this scenario.

Power of test via the weighted sample Pearson correlation coefficient
In this section, we examined the power of asymptotic test and Perm-Z2 for the tests of H 0 : ρ = 0 versus H 1 : ρ > 0. The same simulation settings as in Section 5.2 were used.Figure 5 shows the results when the true ρ is 0.3 and 0.6.The power of asymptotic test is generally higher than that of Perm-Z2.However, it should be noted that the type I error rates of asymptotic tests generally suffer severe inflations when n ≤ 50.Therefore, the power of the two tests is not comparable when n is small.When n ≥ 100, the difference in power between the two tests is small under the settings of MVN, Circular and Uniform.On the other hand, such difference can be large under Exponential, t 4.1 and MVT settings when ρ = 0.3.Based on these results, a recommendation is to use a permutation test when the sample size is small, e.g.n ≤ 50.When n is larger, the asymptotic test will provide higher power while still maintaining decent type I error control in general.

Example
To demonstrate the proposed method, we applied it to a data set of preterm birth, stillbirth and neonatal death rates from 28 countries [15].In the original work, the authors studied the correlations of preterm birth rates with stillbirth and neonatal death rates at the country level.The correlations were weighted by the total live births in each country.The preterm births were categorized as < 37, 32-36, 28-31, 24-27 weeks of gestation, while the stillbirths and neonatal deaths were categorized into ≥ 32 and ≥ 37 weeks.Only point estimates were provided for the weighted correlations in the original work.For demonstration purposes, we estimated all pairwise weighted correlations between preterm birth, stillbirth and neonatal death rates in each category.The p-values of one-sided tests by ttest, the test based on Jackknife variance estimation, asymptotic tests and Perm-Z2 were compared (Table 1).Directions of the tests were determined by the sign of ρw .The results (Table 1) show that all three tests will conclude the preterm birth rate (< 37 weeks) is negatively correlated with the stillbirth rate ≥ 37 weeks.Further, the preterm rates between 32 and 36 weeks are negatively correlated with both early and late stillbirth rates.However, there are discrepancies among the three methods.For example, the late stillbirth rates (≥ 37 weeks) are considered significantly associated with the neonatal rates by both the t-test and asymptotic test, but not by the permutation test.This is consistent with the observation in simulations that the t-test tends to have an inflated type I error in general while the asymptotic test tends to have an inflated type I error when the sample size is below 50.It is also notable that the correlation between preterm < 37 weeks and stillbirth ≥ 32 weeks was considered significant by both the t-test and permutation test but not by the asymptotic test.Therefore, these three tests could lead to distinct conclusions in applications.Based on our numeric results, the result of permutation test is clearly more reliable.

Concluding remarks
In this note, we introduced an asymptotic test and a set of studentized permutation tests for testing the hypothesis H 0 : ρ = 0 vs. H 1 : ρ > 0. We showed that the t-test routine that is implemented in widely used software like SAS could lead to severe type I error inflation even when the data is bivariate normal.The proposed asymptotic test tends to have an inflated type I error when the sample size is smaller than n = 50, which is similar to the unweighted cases.We found that the studentized permutation test based on the Fisher's Z statistic is the most robust, even when the sample is small and the data is non-normal.Note that the choice of statistic affects the result of permutation test because the statistic is studentized by its large sample variance.We also showcase that the t-test and proposed tests can lead to a distinct conclusion in real-world applications.The t-test will yield a large number of false positives regardless of the underlying distribution.Therefore, we conclude that the routine usage of t-test in conjunction with the weighted sample Pearson correlation should be removed from statistical practice and eliminated from statistical software packages.On the other hand, the studentized permutation tests have strong type I error control even for very small sample sizes while the asymptotic approximation appears valid for n > 50.
Under the constant finite variance assumption above we start with the following: Lemma 2.5 [ [18] (simplified for a constant variance)].Let T n = √ n + 1S n , where S n is defined above and E w ns x s y w p Wq/(n − 1) ns x w s x 2 Then the permutation distribution of S n,w = n (t) is the permutation distribution for S n , τ 2 = μ 2,2 μ 0,2 μ 2,0 where μ r,s

Table 1 .
The estimates of weighted correlations of preterm birth, stillbirth and neonatal death rates at country levels (n = 28).
Note: The p-values are based on one-sided tests.The statistically significant p-values are shown in bold (α = 0.05).