Global correlation and uncertainty accounting

Abstract For a high dimensional field of random variables, global correlation is defined as the ratio of average covariance and average variance, and its elementary properties are studied. Global correlation is used to harmonize uncertainty assessments at global and local scales. It can be estimated by the correlation of random aggregations of fixed size of disjoint sets of random variables. Illustrative applications are given using crop loss per county per year and forest carbon.


Introduction
This note de nes global correlation, studies its elementary properties and illustrates its use in global uncertainty accounting for crop loss and forest carbon. The rich literature on multivariate correlation can receive only passing mention. Conical correlation [6] concerns the maximal product moment correlation between linear combinations of two random vectors, interclass correlation [10] describes the correlations in grouped data. Multiple correlation and the correlation ratio [9] relate a single variable to a set of variables. Random correlation matrices [5] and the distribution of their determinants [4,13] have sparked interest in the (scaled) determinant of the correlation matrix as a measure of multivariate association. Using Vines [1,2,8,12] have made progress in understanding random determinants of correlation matrices. Micro correlations have attracted attention for their role in limiting the extent of securitization and risk sharing [11], and also for their role in amplifying tail dependence [3]. The problems discussed here involve up to 4 billion variables, and harmonizing uncertainty quanti cation at di erent scales of aggregation requires new techniques.

Methods
The correlation of random aggregates is used to estimate global correlation. All random variables are assumed to have a nite second moment. The following facts and de nitions are used: 1) If X , X are iid random variables with standard deviation σ, then 2) If X , . . . , X N have average variance σ and average covariance c, de ned as and, consequently, c ≥ −σ /(N − ). 3) De ne ρ = c/σ as the global correlation of X , . . . , X N . Let X , . . . , X N and Y , . . . , Y N have average variance σ and average covariance c, both within and between components. That is Then from (1) one obtains The above correlation converges to as N → ∞, for any ρ > . If ρ > and N >> , then This should be compared to the case where c = , which holds if the X i are independent: With independence, the uncertainty (standard deviation) of a sum of N random variables grows with N / , but a small global correlation causes the growth to be linear in N. To appreciate this, let ρ be the global correlation of the amount of forest carbon per hectare; we wish to assess the uncertainty of global forest carbon based on the average variance in the estimates per hectare. The number of hectares of forest on the earth is N = E . With ρ = .
, we have The di erence between the cases ρ = and ρ = . is huge. Recall the Cauchy-Schwarz Inequality: for any x, y ∈ R N , we have

Equality in (3) holds if and only if
with equality if and only if the

Results and Discussion
Lemma 2. With the notation as above for ρ, σ, c; with and the average correlation de ned as we have: By the Cauchy-Schwarz inequality (see (4)) all the σ i are the same. By (i) ρ = ρ * = . Since each ρ ik ≤ and ρ * = , it follows that ρ ik = .
Writing ρ N = ρ N i= X i , N i= Y i we construct a continuous version of ρ N as follows. Solve (2) for the global correlation ρ: Replace ρ N by f (x), x > . For ≤ ρ ≤ write: Di erentiating both sides of (7): Equation (8) provides a graphical representation of the relation between ρ N and ρ (see Figure 1).

Example: crop loss
Crop loss claims per US county per year are tabulated from 1980-2008 (data available at http://www.r . org/events/event/data-climate-change-and-extreme-events). Restricting to counties without zero entries, a dataset of 1334 counties is obtained. For this dataset the average variance over all counties and the average covariance between pairs of counties can be computed. Their ratio is the global correlation, . , as shown in Table 1. Random aggregation of disjoint pairs of size , , , , and counties are also constructed and correlations of the aggregates are computed. Iterating this process 2000 times, the correlation of disjoint randomly drawn aggregates are estimated by averaging over the 2000 iterations. Plugging these estimates into (6) yields estimates of the global correlation, also shown in Table 1.  To illustrate the use of eqn (6), suppose the global correlation is estimated by averaging the correlations of 2000 samples of disjoint pairs of counties of size 20. The value from Table 1 is 0.120. Plugging this value of ρ into eqn (6), the curve f(x), approximating ρ(N) is plotted in Figure 1.   Table 1 is .
. Plugging this value of ρ into (8), the curve f (x) approximating ρ N is plotted in Figure 1   Using (1): where σ is the root of the average variance of forest carbon in [tC/ha], and ρ = c/σ . The challenge is to nd values of σ and ρ that "harmonize" with uncertainty in forest carbon at the global level and the mean density of ∼ tC/ha. If ρ = , then σ = 1.8E06 tC. This would be an extremely fat tailed distribution that is not prima facie plausible. If ρ = , then the average uncertainty (standard deviation) of tC/ha would be 28.3. In itself, this value is not preposterous, but ρ = is. In this case Lemma 2(iii) entails that the uncertainty of the carbon in any two hectares is perfectly correlated.
[14, Table II] suggest σ is in the order of 10% of the measured value up to 100 tC/ha, linearly interpolated between 10% and 30% up to 150 tC. For the above global density range, that yields an estimate of σ = ∼ . Putting σ = ∼ tC/ha in (9), we get ρ = . ∼ . , which is impossible.
Either the estimates of uncertainty at the global level (LHS of (9)) must come down or the uncertainty at the hectare scale (σ) must be larger than suggested in [14], in order that the two can be combined with a plausible value of ρ in (9). If ρ = . then σ = . tC which is in the range of the average density but larger than expected on the basis of existing literature.

Conclusion
Correlations of random aggregations can be used to estimate global correlation. This quantity is important when trying to relate uncertainty at global scales to uncertainty at local scales. The IPCC AR5 estimates of uncertainty in global forest carbon must come down, or local estimates of uncertainty in carbon measurements per hectare must go up to achieve consistency. Statistical properties of estimators of global correlation remain to be explored, and more inequalities between global and average correlation can probably be found.