A new test statistic for climate models that includes ﬁeld and spatial dependencies using Gaussian Markov Random Fields

. A new test statistic for climate model evaluation has been developed that potentially mit-igates some of the limitations that exist for observing and representing ﬁeld and space dependencies of climate phenomena. Traditionally such dependencies have been ignored when climate models have been evaluated against observational data, which makes it difﬁcult to assess whether any given model is simulating observed climate for the right reasons. The new statistic uses Gaussian Markov 5 Random Fields for estimating ﬁeld and space dependencies within a ﬁrst order grid point neighborhood structure. We illustrate the ability of Gaussian Markov Random Fields to represent empirical estimates of ﬁeld and space covariances using ‘witch hat’ graphs. We further use the new statistic to evaluate the tropical response of a climate model (CAM3.1) to changes in two parameters important to its representation of cloud and precipitation physics. Overall, the inclusion of dependency 10 information did not alter signiﬁcantly the recognition of those regions of parameter space that best approximated observations. However there were some qualitative differences in the shape of the response surface that suggest how such a measure could affect estimates of model uncertainty.


Introduction
Climate scientists are interested in developing new metrics for assessing how well climate simulations reproduce observed climate for purposes of comparing models, driving model development, and evaluating model prediction uncertainties (Gleckler et al., 2008;Reichler and Kim, 2008;Santer et al., 2009;Knutti et al., 2010;Weigel et al., 2010;Braverman et al., 2011).Formal methods for accomplishing these goals, such as Bayesian calibration, operate with a single test statistic 1 for determining likelihood measures of different model configurations.A level of skepticism exists within the climate assessment community concerning the sufficiency of any one metric to judge a climate model's scientific credibility.Climate phenomena involve interactions of multiple fields (observables) on a wide range of time and space scales from minutes to decades (and longer) and from 1 A test statistic is a metric that includes information about the significance of modeling errors. 1 meters to planetary scales.Thus there are plenty of challenges that exist for synthesizing the many ways that a climate model can be tested against observational data.
The most common approach to climate model evaluation among climate scientists is to display maps of long-term means of well-known fields (e.g.temperature, sea-level pressure, precipitation) whose distribution is familiar and well understood in order to identify sources of model error.Taylor metrics that are often generated as part of model evaluation are based on spatial means of squared grid point errors for individual fields (Taylor, 2001).Such measures neglect field and space dependencies that arise as a consequence of how the physics of the climate system correlate multiple quantities in space.Neglecting these dependencies therefore ignores additional information that can be used to test whether models are simulating observables for the right reasons.
Here we present a new test statistic based on Gaussian Markov Random Fields (GMRFs) that addresses some of the challenges that currently exist for estimating the significance of modeling errors across multiple fields that takes into account field and space dependencies that exist within observations.Perhaps one of the under-recognized challenges in this regard is the limited amount of observations available to quantify dependencies.Data assimilation is commonly used to fill in gaps in the observational record (Trenberth et al., 2008).While assimilation products help address some aspects of the problem of how one compares point measurements to the scales resolved by climate models, these products include the space and field dependencies of the model that was used to assimilate observations.The imprint of the reanalysis model is readily seen when comparing two or more assimilation products, particularly quantities that are directly related to parameterized physics such as precipitation and radiation.One of the advantages of Gaussian Markov Random Fields (GMRFs) is that it only needs a limited amount of data to decipher space and field dependencies of climate phenomena.This is because GMRFs summarize relationship information as it is expressed across fields of gridded data.
The present application of GMRFs operates on long-term means.While it may be possible to extend GMRFs to capture time dependencies (Cressie and Wikle, 2011), the present application represents an advance over more traditional metrics.
The sections of this paper explain, test, and provide examples of how various components of GMRF work.Section 2 gives a brief introduction to GMRFs and the use of a neighborhood structure for estimating dependency information using a precision operator Q.In this section we also define and discuss the Kronecker product and how it is used to generalize GMRFs to deal with more than one field.Section 3 introduces a graph for testing the extent to which GMRFs represent observed variance-covariances of tropical temperature, precipitation, sea level pressure, and upper level winds.
Finally, in Section 4, we consider the field and space dependencies that are captured by the GMRFbased metric within the response of an atmospheric general circulation model CAM3.1 to two model parameters important to cloud and precipitation physics.What we learned in general is that including the space and field dependencies provides some qualitatively different perspectives about which model configurations are more similar to what is observed.For the example we consider, the effects of space dependencies turn out to be more critical than field dependencies.
2 Gaussian Markov Random Fields (GMRFs) A Gaussian Markov Random Field (GMRF) is a special case of a multivariate normal distribution.
The density of a normal random vector x = (x 1 , x 2 , ..., x n ) T (where T denotes the operation of transposing a column to a row), with mean µ (n × 1 vector) and covariance matrix Σ (n × n matrix), is Here, and |Σ| is the determinant of Σ. Estimating Σ can be quite challenging in many contexts, especially for climate models where there is only limited data.All eigenvalues of Σ must be greater than zero, otherwise Σ −1 becomes a singular matrix and it does not define a valid multivariate normal distribution.It can also be shown that if all eigenvalues of Σ are positive then all eigenvalues of Σ −1 are also greater than zero.Rather than estimating Σ and ensuring all eigenvalues of Σ −1 are positive, GMRFs makes use of the precision matrix P = Σ −1 .We denote x ∼ N(µ, P) to represent x as a multivariate normal distribution with vector mean µ and precision matrix P. GMRFs approximate f (x) using a sparse representation for P by setting all precisions outside a neighborhood structure to zero.Thus GMRFs make the assumption that points outside a neighborhood structure are conditionally independent.As we shall show below, this limitation does not prevent GMRFs from capturing covariances outside the neighborhood structure used to define precisions.
The GMRF-based expression that we have developed for quantifying the significance of differences between model output and observations is where v is the vector of differences between model output and observations with a length given by the product of the number of observational fields and number of grid points, n obs n pts , α is a scalar with a value close to zero, I stands for an identity matrix (a diagonal matrix of ones) of dimension n pts corresponding to v, and Q is a precision operator of dimension n pts × n pts from a Gaussian Markov Random Field (GMRF) induced by a first order neighborhood structure.This cost function captures field dependencies through S −1 which is a matrix of dimension n obs × n obs where each of its elements represents a spatial-average of grid point variances and covariances between fields.The spatial dependency between grids is approximated through Q.The quantity α could be interpreted as a weight of the spatial relationship between grid cells.The Kronecker product ⊗ provides a means for associating the different matrix dimensions of the metric, essentially combining its field and space components.Each of the following subsections provides additional information about the derivation and application of equation ( 2).95

Precision operator of a GMRF
The precision operator of a GMRF Q provides a way to estimate dependencies among neighboring grid cells.Q needs to be constructed such that it: -Reflects the kind of spatial dependency we assume our data has.
-Yields a legitimate covariance matrix, Σ, i.e. symmetric and positive definite, so that it can be 100 used to compute a likelihood function.Consider x, a vector of measurements on a 2 × 2 lattice, as represented in Figure 1.Assume a neighborhood structure between the four elements of x.In Figure 2, the neighbors for each element of x are defined graphically.Given the neighborhood structure shown in Figure 2, the precision matrix that works for this problem is which follows these rules, -Q ij = 0, if x i and x j are not neighbors.
-Q ii gives the total number of neighbors of x i .
While the implementation of GMRFs is simple, the theory and mathematics are rather involved.A more full description of the mathematics of this example is provided in the supplemental material.It may also not be immediately clear to a physical scientist that such a simple specification, where only relationships among neighboring grid cells are taken into account, would be sufficient to quantify correlated quantities across large distances.The mathematics of working with precisions allows one to infer the net effect of long distance relationships through relationship information that exists among neighboring cells.While the GMRF approach does not include information about particular teleconnection structures such as ENSO, the approach is sensitive to how changes in large scale conditions induce local covariances across multiple fields within the entire domain.In this way teleconnections are represented through a conditional dependence.
A problem arises in that one of the eigenvalues of the Q matrix is 0, which implies that this definition of the precision matrix does not induce an invertible covariance matrix.Although Q may be inverted using the Moore-Penrose pseudoinverse, we have solved this problem by using αI+(1− α)Q, instead of Q.If α is small, the neighborhood structure remains essentially unchanged.Section 3 describes our approach to specify a value for α.

Generalizing concepts to deal with multiple fields
The generalization of Q to handle multiple fields involves a Kronecker product (⊗) between S −1 and Q.For reference, a Kronecker product of A ⊗ B where Consider x and y which represent observations for two different fields of interest on a 2×2 lattice.
First, x and y are combined to form one vector v as follows: v T = (x 1 , x 2 , x 3 , x 4 , y 1 , y 2 , y 3 , y 4 ).The average covariances among these observations can be represented by a 2 × 2 matrix between the first field, x, and the second field, y: where V ar(x) = σ 11 , V ar(y) = σ 22 , and Cov(x, y) = σ 12 .Recalling that the correlation between fields 1 and 2 is defined as: ρ = σ12 √ σ11σ22 , one can show that the inverse of S is If we consider the Kronecker product in equation ( 2) when α = 0, In this last expression, one can see that the inverse of S in combination with the Kronecker product with Q includes terms involving cross products between fields.The supplemental materials carries this expression one step further by estimating the conditional mean for the the first element of v to illustrate how this element is related to itself and its neighbors across multiple fields.

A test of GMRF estimates of variance
GMRFs provide a way to approximate field and space dependencies contained in the inverse covariance matrix Σ −1 of equation ( 1) by its GMRF equivalent S −1 ⊗(αI+(1−α)Q).In this section, we will test how well GMRFs are able to reproduce observed space and field dependencies.This may be achieved by comparing field and spatial variance and covariance estimates obtained from the inverse of the GMRF estimate of the precision matrix with those obtained empirically from observational data.It turns out this comparison is sensitive to the value that is selected for α.By construction, the optimal choice of α depends only on geometric considerations of the neighborhood model that is used for GMRF and the number of grid points in the fields and not the properties of the field data.
We introduce a 'witch hat' graph that provides a compact summary of variance-covariance information between these two methods in order to show that GMRFs do a reasonable job approximating observed field and space relationships.

Finding an appropriate value of α
In the effort to compare space and field dependencies approximated by GMRF with empirical estimates we need to determine an optimal value for α.In order to carry out this comparison, we need to find the inverse of S −1 ⊗ (αI + (1 − α)Q), our proposed precision matrix based on GMRF.Using results of Kronecker products, we have that If n is the total number of grid points of the lattice, S ⊗ Q * is a 2n × 2n covariance matrix.Note that each element of diag(S ij Q * ) contains the estimated variance or covariance at each grid point for fields i and j using a GMRF where i can be equal to j.If we average these estimates across the whole lattice, we obtain G ij , the GMRF estimate of the variance or covariance for fields i and j. Therefore, where tr(Q * ) denotes the trace of Q * and Q * kk are its diagonal elements.We will now select a value for α that allows the GMRF estimate for field variances and covariances to be equal, on average, to what has been calculated for S. In order to achieve this, G ij needs to equal S ij .Satisfying this condition is equivalent to finding the solution for It may not be so obvious what the diagonal elements of Q * are.However, one can use the fact that tr(A) is equal to sum of its eigenvalues.In our case, if the eigenvalues of Q are λ 1 , λ 2 , ..., λ n , the implies that in order to satisfy equation (4), we need to find α from Figure 3 shows the relationship between various values of α and f (α).The eigenvalues used to obtain this figure correspond to the precision operator, Q, for a GMRF induced by a first order neighborhood structure and considering a 128 × 22 lattice (which is the dimension of our data).
From the figure we can see that the curve crosses the value of 1 when α is close to 0. By using linear interpolation, we determine that α is approximately 0.0026.Note that this value is independent of fields since equation ( 5) does not contain any field-specific information.

'Witch hat' comparison test
To illustrate any differences that may exist between empirical estimates of the covariance matrix Σ and its GMRF equivalent S ⊗ (αI + (1 − α)Q) −1 , we rely on a graph that shows the spatial average grid point variance and covariances as a function of distance for cells and their neighbors.
We compute the average entries of the covariance matrix corresponding to each grid cell and the corresponding element to the north or east (for the positive distances) or to the south or west (for the negative distances) relative to the main diagonal of the matrix.The zero distance case is the average of variances of the main diagonal.The cells corresponding to one or more grid cells away are mostly on entries in parallel with the main diagonal.On average, covariances decrease with distance making the graph have the shape of a witch's hat.This graph is symmetric because covariance matrices are symmetric.
Figure 4 shows a 'witch hat' test of estimated variances for air temperatures simulated by the Community Atmosphere Model version 3.1 (CAM3.1).The variances are estimated from 15 samples of two year mean summertime temperatures.Setting α = 1 provides a solution to equation ( 5), however, this will shut down the effect of Q and only the variances at the reference point (lag 0) will be well represented.On the other hand, when α = 0.0026, we allow Q to play more of a role which results in a better representation of covariances at neighboring points (lags different of zero).

Climate response to uncertain parameters
In this section we show how inclusion of field and space dependencies using GMRF affect comparisons of the Community Atmosphere Model (CAM3.1)(Collins et al., 2006) with observations.We consider CAM3.1's response to changes in parameter ke, which controls rain drop evaporation rates, and parameter c0, which controls precipitation efficiency through conversion of cloud water to rain water.For this comparison we only consider the response for the June, July, and August (JJA) seasonal mean between 30 • S to 30 • N on four variables including 2 meter air temperature (TREFHT), 200-millibar zonal winds (U), sea level pressure (PSL), and precipitation (PRECT).Experiments with CAM3.1 use observed climatological sea surface temperatures and sea ice extents.
Each experiment with CAM3.1 is 32-years in duration.
The observational data that is used to evaluate the model comes from a reanalysis product ECMWF-ERA interim (Uppala et al., 2005) for 2 m air temperature, 200-millibar zonal winds, and sea level pressure and GPCP (Adler et al., 2009) for precipitation.We make use of approximately 30 years of JJA mean fields between 1979 and 2009.For constructing S, we calculate variances from 2-year means (i.e. 15 samples).
A total of 64 experiments were completed, varying each of the two parameters within an 8 × 8 lattice.For each experiment we calculate three versions of the GMRF test statistic which we refer to as a 'cost' (equation 2).The first version is the traditional cost based on the assumption of space and field independence where the off diagonal components of S are set to zero and setting α = 1 . This approach is similar to what has been done previously for Taylor (2001).The second version of evaluating the cost takes field dependencies into account by including all components of S and setting α = 1.The third version for the cost takes field and space dependencies into account by including all components of S and setting α = 0.0026.The primary field correlations are the values of (-0.313) and (-0.219) occurring between sea level pressure (PSL) and 2 m air temperature (TREFHT), and precipitation (PRECT) and sea level pressure (PSL), respectively.Maps of the grid point correlations between these fields show a lot of structure with regions of both positive and negative correlations.Therefore, providing a mechanistic explanation of the spatially averaged correlation is not particularly meaningful.Despite losing regional information in the S matrix summary of field covariances, GMRF estimated field covariances as seen within 'witch hat' graphs are reasonable as compared to empirical estimates (see supplemental). the Andes Mountains.This finding is a result of the mathematics of GMRF.It does not imply that the large-scale errors are of lesser scientific importance.It only means that GMRF is less sensitive to large-scale anomalies, perhaps because they are associated with fewer degrees of freedom than highly structured errors.Understanding whether and how these distinctions aid model assessment needs further study.We do find it reassuring that GMRF-based metrics of distance to observations are similar, at least in the example provided, to a traditional metric.

Summary
We have developed a new test statistic as a scalar measure of model skill or cost for evaluating the extent to which climate model output captures observed field and space relationships using Gaussian Markov Random Fields (GMRFs).The challenge has been that few observations exist for establishing a meaningful observational basis for quantifying field and space relationships of climate phenomena.Much of the data that is typically used for model evaluation is suspected of having its own relationship biases introduced by the numerical model that is used to synthesize measurements into gridded products.The GMRF-based metric overcomes some of these limitations by considering field and space variations within a neighborhood structure thereby lowering the metric's data requirements.The form of the metric separates space and field dependencies using a Kronecker product that, when multiplied out, has all the terms necessary to represent how different points in space are tied together across multiple field.We also include a scalar α that weights the importance of spatial relationships between grid cells.Its optimal value turns out to be independent of the data type which aids the use of GMRFs for comparing model output to data across multiple fields.Using 'witch hat' graphs, we show a first order (nearest neighborhood) structure does an excellent job of capturing empirical estimates of field and space relationships for various lag-windows or distances.
We have applied three versions of cost that selectively turn on or off field and space dependencies

Figure 4 .
Figure 4. 'Witch hat' graphs for air temperature on a 128 × 22 lattice of the tropics from 30 • S to 30 • N. The empirical estimates are given by the solid red line.The GMRF estimate is given by the dashed blue line.

Figure 5 Figure 5 .
Figure5shows a comparison of the three versions of the GMRF-based cost for the 64 experiments within an 8 × 8 lattice.All versions of cost result in qualitatively similar results with high and low cost values roughly in the same portions of parameter space.The main difference among the versions of cost comes from taking space dependencies into account within the field-space version.In this case, extremely low values of ke result in higher metric values.Figure6examines the reasons for this by graphing the different field contributions to the GMRF-based costs for a slice where c0 = 0.0035 which corresponds to one of the rows of the lattice.By plotting everything differenced from metric values at ke = 3 × 10 −6 , one can learn that the biggest qualitative difference comes from cost values associated with 2 m air temperature.Closer inspection of differences between model output and observations of 2 m air temperature (not shown) indicates that the traditional cost is likely reflecting large-scale differences over the southern hemisphere oceans.Inclusion of space dependencies places much greater significance on smaller-scale anomalies occurring over the continents, particularly over

Figure 6 .
Figure 6.Different field contributions to the GMRF-based costs for a slice of Figure 5 where c0 = 0.0035.Cost values are relative to the default parameter setting for ke.Note that total cost (black dashed line) is a weighted sum of field contributions as given by S −1 with contributions from sea level pressure (PSL, red line), 2-m air temperature (TREFHT, green line), 200-millibar zonal winds (U, blue line), and total precipitation (PRECT, cyan line).

250
in a climate model (CAM3.1)output against observational products for tropical JJA climatologies for 2 m air temperature, sea level pressure, precipitation, and 200-millibar zonal winds.The results show subtle, but potentially important differences among these versions of the cost which may prove beneficial for selecting models that capture observed climate phenomena for the right reasons.6 Code and data availability 255 R code and data for generating Figures 5 and 6 can be obtained through https://zenodo.org/record/33765,Nosedal-Sanchez et al. (2015)