Validation of CMIP5 multimodel ensembles through the smoothness of climate variables

Smoothness is an important characteristic of a spatial process that measures local variability. If climate model outputs are realistic, then not only the values at each grid pixel but also the relative variation over nearby pixels should represent the true climate. We estimate the smoothness of long-term averages for land surface temperature anomalies in the Coupled Model Intercomparison Project Phase 5 (CMIP5), and compare them by climate regions and seasons. We also compare the estimated smoothness of the climate outputs in CMIP5 with those of reanalysis data. The estimation is done through the composite likelihood approach for locally self-similar processes. The composite likelihood that we consider is a product of conditional likelihoods of neighbouring observations. We find that the smoothness of the surface temperature anomalies in CMIP5 depends primarily on the modelling institution and on the climate region. The seasonal difference in the smoothness is generally small, except for some climate regions where the average temperature is extremely high or low.


Introduction
Smoothness is an important characteristic of a spatial process that measures local variability of the process. Due to the presence of strong spatial correlation in most climate variables, the values of a climate variable at nearby locations are considered simultaneously in many studies. Realistic climate models are expected to produce plausible values of climate variables, not only at each grid pixel, but also at its nearby grid pixels, in order to accurately describe spatial variation of the true process. Therefore, the smoothness is an important measure to validate climate models in terms of their ability to simulate fine scale spatial variability of climate variables. The aim of this paper is twofold: first, we seek to validate the spatial smoothness of a (temporal) longterm average climate variable, by season and by climate regions; and second, we compare the smoothness of climate model outputs from the Coupled Model Intercomparison Project Phase 5 (CMIP5) and the meteorological reanalysis data. The spatial smoothness is estimated by assuming isotropy within each climate region, and we assess the nonstationarity of the climate model outputs by assuming that the smoothness varies by region. We also assess the similarities in the smoothness across climate model ensembles. If all climate ensembles represent the same true phenomenon but deviate from the truth by random errors, the difference in the smoothness of ensemble realisations and reanalysis data should be negligible without any patterns across the climate ensembles.
We consider multidecadal averages of land surface temperatures in CMIP5 experiments. CMIP5 comprises a standard set of coordinated climate change experiments of 60 deterministic climate models. It is processed by the Working Group on Coupled Modelling of the World Climate Research Programme, who has gathered around 20 climate modelling groups from across the world. Outputs are archived in a common format and can be downloaded from the Program for Climate Model Diagnosis and Intercomparison web site (PCMDI, www-pcmdi.llnl.gov/). Taylor et al. (2012a) presented an overview of the experimental designs in CMIP5 and Knutti and Sedla´cˇek (2013) described detailed characteristics of climate model projections in CMIP5. The reanalysis data that we consider are from the National Centers for Environmental Prediction *Corresponding author. email: mjun@stat.tamu.edu Tellus A 2015. # 2015 M. Lee et al. This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), allowing third parties to copy and redistribute the material in any medium or format and to remix, transform, and build upon the material for any purpose, even commercially, provided the original work is properly cited and states its license. and the National Center for Atmospheric Research (NCEP/ NCAR). The NCEP/NCAR reanalysis data set is a gridded data set representing the state of the Earth's atmosphere, incorporating observations and numerical weather prediction model output. It is provided by the Earth System Research Laboratory in the National Oceanic and Atmospheric Administration (www.esrl.noaa.gov/psd/). Kalnay et al. (1996) and Kistler et al. (2001) presented the details of the NCEP/NCAR reanalysis.
We introduce the concept of a locally self-similar process, and we model long-term average land surface temperature anomalies as a Gaussian locally self-similar process. The local self-similarity is a weaker (and thus more general) assumption than the Mate´rn covariance function (a widely used covariance model that enables modelling the spatial smoothness of a stochastic process) and is also capable of modelling spatial smoothness. The estimation of smoothness is done by optimising the composite restricted likelihood for the smoothness parameter of a locally self-similar process (Stein et al., 2004;Lee, 2012). The composite likelihood is a general term for any product of marginal or conditional likelihoods (Varin et al., 2011). It has been widely used as a substitute of likelihood, when the likelihood calculation is difficult. This paper considers a product of conditional likelihoods of neighbouring observations. Since nearby observations contain most information on the local behaviour of a process, our approach balances statistical and computational efficiency in estimating the smoothness of the process. In addition to the composite likelihood approach, Gaussian Markov Random Fields form another approximation approach applicable (Rue and Held, 2005). However, we chose the composite likelihood approach since its implementation is closer to traditional statistical practice and may be more familiar to the climate science community.
The remainder of this paper is organised as follows: Section 2 describes the statistical methodology used in the estimation of the smoothness parameter; Section 3 analyses long-term average near-surface air temperature anomalies in CMIP5 and NCEP/NCAR reanalysis by region and season; and Section 4 summarises the key findings and proposes related future work. Details on the statistical methodologies are provided in the Appendix.

Locally self-similar process
Suppose that we observe a mean zero and isotropic Gaussian process, fZðsÞ; s # Eg, for E & R 3 . We assume that the semi-variogram of Z, g( ×), follows a power function around the origin. That is, for all s and u # E, with the Euclidean norm, k ×k: as ks À uk ! 0; (1) where u0(C, H), for the scale parameter, C!0, and the smoothness parameter, H # (0,1) (Gneiting et al., 2012). The notation oðjjsÀujj 2H Þ means that fcðjjsÀujjÞÀCjjsÀujj 2H g/ jjs À ujj 2H ! 0, as jjs À ujj ! 0. The scale parameter controls the overall size of the variation of the process. And the larger the smoothness parameter is, the smoother the realised surfaces of Z. A process that satisfies eq. (1) is called locally self-similar, since a self-similar process with index H satisfies cðjjs À ujjÞ0Cjjs À ujj 2H for all s and u 2 R 3 (Samorodnitsky and Taqqu, 1994;Genton et al., 2007). The index H determines the smoothness of the self-similar process, and it has been widely used as a measure of surface roughness for various natural phenomena, such as the surface of soil, surface height measurements of computer chips, etc. [see Mandelbrot and Wallis (1969) and Adler (1981) for some application examples]. Note that the quantity, 2H, is essentially the fractal index for the mean zero isotropic Gaussian process, Z (Gneiting and Schlather, 2004;Gneiting et al., 2012). A locally self-similar process is more general than a selfsimilar process, in the sense that it does not fully specify the variogram but only in a region around the origin. Indeed, mean zero Gaussian processes with many widely used parametric covariance functions are locally self-similar. These include powered exponential, generalised Cauchy or Mate´rn covariances (Gneiting et al., 2012, Table 1). Among these, the most widely used, the isotropic Mate´rn covariance, takes the following form: CovfZðs þ hÞ; ZðsÞg ¼ r 2 kh=/k n K n ðkh=/kÞ 2 nÀ1 CðnÞ ; (2) for all s and h, where G(×) is the gamma function and K n is a Bessel function of the second kind of order n. The parameters of the Mate´rn covariance function are the partial sill, s 2 !0, the range, f!0, and the smoothness parameter, n !0. The range parameter controls the rate of correlation decay with distance and the partial sill measures the size of variation of the process. A mean zero Gaussian process with the Mate´rn covariance satisfies eq. (1) with H 0n and C0s 2 2 (1 (n f (2n G(1 (n)/G(1'n) when 0 B n B1. Note that our variogram model in eq. (1) is more general than the Mate´rn model as we only specify the variogram near the origin.
In this paper, we restrict our attention to the case when 0 B H B1, under which eq. (1) becomes a statistically valid variogram. Many natural phenomena satisfy this condition. For example, Tuck (2008, p. 14, 41) studied atmospheric variability and observed that temperature is smoother than wind speed. The scaling exponent, which is the same as 2H of eq. (1), of the temperature was shown to be close to but less than unity. Lovejoy andSchertzer (1985, p. 1235) pointed out empirically that 0 B H B1 in the rate of energy transfer, buoyancy, velocity, temperature fluctuations, radar reflectivity and cloud drop volumes. North et al. (2011) found that the spatial covariance of temperature fields based on simple energy balance climate models follows the Mate´rn covariance with n 01, and that n B1 is expected due to rough landscapes. Sun et al. (2015) mentioned that precipitation amounts become smoother when summed over longer periods and they showed numerically that the smoothness of longterm precipitation amounts is less than n 00.5. We determine that the smoothness of multidecadal average near-surface air temperature anomalies is between zero and one in Section 3. One thing to note is that the estimated smoothness may depend on the grid resolution of the climate models. In the estimation procedure described in Section 2.2, the relationship, eq. (1), is applied to the number (k 03,. . .,10) of neighbouring observations. As shown in Table 1, climate models in CMIP5 have various grid resolutions. In Section 3, we check the effect of spatial grid resolution on the estimated smoothness.

Composite likelihood
To estimate the scale and smoothness parameters of a locally self-similar process, we consider the composite restricted likelihood of u. We briefly introduce the idea of composite likelihood as opposed to the likelihood method in this section. Further details on how to calculate composite restricted likelihoods are given in the Appendix.
The idea of restricted likelihood is used to estimate variogram parameters without estimating nuisance parameters such as E{Z(×)} or Var{Z(×)} (Kitanidis, 1983). It is a marginal likelihood associated with any NÁ1 linearly independent error contrasts, mean zero linear combination of the observations. Since a locally self-similar process does not fully specify the variogram, we have neither the exact likelihood nor the restricted likelihood of u. Therefore, we approximate the restricted likelihood of u by the composite restricted likelihood, similarly to Stein et al. (2004) and Lee (2012).
Let us first sketch the idea to obtain a composite likelihood. Suppose that Z( ×) is observed at N locations, {s 1 ,. . .,s N }. Let p(×; u) indicate a generic probability density function, possibly conditional density. We order the observation locations by starting from a random location, s 1 , then selecting s i to be the nearest location to any of {s 1 ,. . .,s i (1 } among the remaining locations, for i ]2. If there are two or more locations at equal distance from the set {s 1 ,. . .,s i(1 }, we choose one randomly. The likelihood of u is pðZðs 1 Þ; . . . ; Zðs N Þ; hÞ ¼ pðZðs 1 Þ; . . . ; Zðs k Þ; hÞ Â Y N i¼kþ1 pðZðs i ÞjZðs 1 Þ; . . . ; Zðs iÀ1 Þ; hÞ: Now, in order to define a composite likelihood, for each s i , define k locations in proximity of s i , among the previously selected locations as fs i;1 ; . . . ; s i;k g&fs 1 ; . . . ; s iÀ1 g, for i!k. Since closely located observations are highly correlated and informative about the smoothness of the process, the composite likelihood approximates eq. (3) Call fZðs i;1 Þ; . . . ; Zðs i;k Þg the conditioning set of the composite likelihood, where k denotes the size of the conditioning set. The composite likelihood, eq. (4), is associated with the statistical optimal property if Z follows a Gaussian process. For a Gaussian probability density, p, pðZðs i ÞjZðs i;1 Þ; . . . ; Zðs i;k Þ; hÞ is the density of the error of the best linear predictor of Z(s i ) based on Zðs i;1 Þ; . . . ; Zðs i;k Þ. Also, the approximation in eq. (4) requires O(k 3 N) operations while the likelihood requires O(N 3 ) operations. It is especially beneficial for large irregularly spaced observations where the likelihood calculation is computationally demanding.
The composite restricted log-likelihood, e rl k ðhÞ, provided in the Appendix, is defined similarly by applying the idea of the composite likelihood to the logarithm of the restricted likelihood. Our estimator, b h, is then defined as a value that maximises the composite restricted log-likelihood. We consider the conditioning set of size k 03,. . .,10 in Section 3. We assess the variance of b h by the sandwich estimator, a widely used measure of the variance of estimators from an estimating equation, r e rl k ðhÞ ¼ 0. Here, 9 denotes the vector of partial derivatives with respect to u. Then we have b h is asymptotically normal with asymptotic covariance matrix fJ n ðhÞV À1 n ðhÞJ n ðhÞg À1 ; where J n ðhÞ ¼ EfÀr 2 e rl k ðhÞg and V n ðhÞ ¼ Varfr e rl k ðhÞg: See Lindsay (1988) and Godambe and Heyde (2010) for more details.

Data
Climate model outputs from CMIP5 consist of 3 and 6 hourly, daily, monthly and annual mean values of more than 404 ocean, land and atmosphere related climate variables for decadal hindcasts and predictions. The NCEP/NCAR reanalysis data consist of 6 hourly, daily and monthly mean values of atmospheric variables from January 1948 to the most recent month. In this paper, we analyse the long-term average near-surface air temperatures measured at 2 m above ground at gridded locations on the Earth from 1979 to 2005, the time period common to all climate models in CMIP5 and the NCEP/NCAR reanalysis.
We analyse 191 ensemble runs from the 48 climate models in CMIP5 (experiment 3.2). Each climate model has 1Á25 ensemble replicates that are initialised under different or the same initial conditions but produced by different perturbed versions of the same model (Taylor et al., 2012b). Ensemble replicates are treated and interpreted independently from each other, and their spatial resolutions vary from ensemble to ensemble. Table 1 lists the climate models in CMIP5 and the NCEP/NCAR reanalysis data set used in this paper, with their grid resolutions and the numbers of ensemble replicates. The climate models are numbered in ascending order of the number of grid pixels. The model number thus represents the rank of the spatial resolution of the climate model. For the climate models with the same spatial resolutions, lower model numbers are given to the ones with smaller average estimated smoothness over the regions.
We focus on the mean surface temperatures in Boreal winter (December, January, February; DJF) and summer (June, July, August; JJA), averaged over 27 yr. That is, at each location, we use multidecadal averages of land surface air temperatures during DJF and JJA. Also, we divide the land area except for Antarctica into the 21 climate regions that are used in Giorgi and Francisco (2000). There are two main reasons for dividing the land areas into climate regions. It is common that the smoothness varies spatially in climate variables. Also, the distance between grid points becomes smaller in regions at higher latitudes. Since the estimated smoothness parameter depends on the resolution of the observed process, dividing regions where observations are separated by similar spacing is reasonable. The climate regions are shown in Fig. 1. The sizes of the regions vary from 807 to 6735 km in the north-south and east-west directions. Each region contains from 12 to 7649 grid pixels of the ensemble outputs from CMIP5, depending on the grid resolutions of the ensembles. The minimum spacing between grid locations at the equator ranges from 83 to 417 km. replicate l, during DJF and JJA, for i01 and i 02, respectively. The number of ensemble replicates varies by climate model. Let l ijl ðsÞ ¼ EfT ijl ðsÞg be the mean of the multidecadal average and e ijl ðsÞ be the anomaly (residual) at location s # D, such that T ijl ðsÞ ¼ l ijl ðsÞ þ e ijl ðsÞ:

Models
Since we focus on modelling the smoothness of the temperature anomalies, e ijl , we first filter the data to estimate the mean, m ijl , and make the anomaly field close to mean zero. Spherical harmonics, fP m n ðsin LÞ cosðmlÞ;P m n ðsin LÞ sinðmlÞj n ¼ 0; 1; 2; . . . ; m ¼0; . . . ; minð3; nÞg; where Àp=2 L p=2 is the latitude, ÀpBl p is the longitude, and P m n is the Legendre polynomial of degree n and order m, provide a natural basis for capturing large-scale spatial patterns (Stein, 2007). Because surface temperatures are closely related to altitude, we estimate m ijl by regressing on the altitude from the sea level in addition to spherical harmonics for n 012, for each climate ensemble realisation in CMIP5 and the NCEP/NCAR reanalysis. The choice of n 012 is made following the literature dealing with similar data sets (Jun and Stein, 2008;Stein, 2008;Jun, 2011Jun, , 2014. After the mean filtering through regression, we assume that o ijl in eq. (5) is a mean zero, locally self-similar Gaussian process that satisfies for s and u 2 D r , 1 2 Efe ijl ðsÞ À e ijl ðuÞg 2 ¼ C ijrl jjs À ujj 2H ijrl þoðjjs À ujj 2H ijrl Þ; as jjs À ujj ! 0, for r 01,. . .,21. The smoothness of the temperature anomalies in the NCEP/NCAR reanalysis is defined similarly. Since e ijl ðsÞ is a multidecadal average of temperature anomalies, its distribution may be close to a Gaussian distribution.
The top panels in Figs. 2 and 3 show the multidecadal average near-surface air temperature, T ijl , the estimated mean, l ijl , and the anomaly, o ijl , in the reanalysis and GFDL-CM3 data, by season. The spherical harmonics terms and the altitude capture most of the patterns in the mean, and the anomalies do not have noticeable large-scale spatial patterns. Figure 4 compares the minimum, median and maximum values of the anomalies, shown in the bottom panel of Fig. 2, by climate region and season. In all regions, the medians of the anomalies are around zero and the ranges of the anomalies are similar regardless of season and region, except for ALA and GRL. The spatial patterns of the mean and residuals displayed in Figs. 2 and 4 are similar to patterns created by other ensemble models.

Estimation of the smoothness
We estimate the smoothness parameter, H, of the anomalies of multidecadal average land surface temperature in the NCEP/NCAR reanalysis and CMIP5 by maximising the composite restricted likelihood with a conditioning set of size k, k 03,. . .,10. Figure 5 shows the changes in the smoothness estimates by increasing k. Each plot represents a climate model in CMIP5 and each curve represents a climate region in Fig. 1. Among the climate regions, WNA, SAH, NAS, AMZ and TIB are coloured. We explain the reason for choosing these specific regions after showing the estimated smoothness of the NCEP/NCAR reanalysis. The smoothness estimates become quickly stabilised as k increases. The estimated smoothness is always less than unity, regardless of the size of the conditioning set. Hereafter, we present the estimated smoothness using the composite restricted likelihood with a conditioning set of size k 05. Figure 6 maps the smoothness estimates of the temperature anomalies in the NCEP/NCAR reanalysis of the corresponding climate regions during DJF and JJA. Similarly, a smoothness map is drawn for each of the climate   Figures 6Á8 show quite a lot of regional variation in each smoothness map. The smoothness maps from CMIP5 also vary across climate models. The climate modelling institution and the climate region are the main factors that determine the smoothness of the surface temperature anomalies. The relative regional characteristics, however, do not change across climate models. Figure 9 plots the smoothness of the temperature anomalies in CMIP5 against the climate model number, i.e. the rank of the spatial resolution of the climate model. There are 21 curves, each of which represents a climate region. Again, the curves that represent WNA, SAH, NAS, AMZ and TIB regions are coloured. Generally, all the curves in Fig. 9 resemble each other. This implies that the climate model is the primary factor that determines the estimates for the smoothness, and the spatial resolution of the model has a weaker effect than the climate model on the estimated smoothness. Each climate model generates the relative regional characteristics well, while the average level of smoothness differs for each climate model. The climate models developed by NASA GISS, IPSL, and MOHC produce rougher temperature anomaly fields than do the other climate models over all climate regions during both seasons. The crosses at model number 22 indicate the smoothness of the NCEP/NCAR reanalysis, as the resolution of the reanalysis is similar to the resolution of climate model 22.
Some smoothness estimates are near the boundaries of the range of the smoothness parameter, H :0, suggesting that the estimation failed. For the NCEP/NCAR reanalysis data, we fail to estimate the smoothness over the region CNA during JJA. In CNA, there were seven pairs of neighbouring observations out of 90 observations, which temperature anomalies differ significantly. Furthermore, the failure of the estimation of CMIP5 happens when we do not have sufficient number of data, for the models with coarse grid resolution. Climate models 1 (CMCC-CESM) and 2 (HadCM3) of CMIP5 fail to estimate the smoothness of CAM, CNA, ALA or NEU during JJA or DJF. Climate model 1 has a spatial resolution of 96 )48 on the Earth, and there are only 12 and 25 observations for CAM and CNA, respectively. Figures 10Á12 show the estimated scale parameters of the multidecadal average near-surface air temperature anomalies in reanalysis and CMIP5 during JJA and DJF. The values are plotted on a logarithmic scale due to wide range of scale parameter estimates for some climate models and/or climate regions (this is for the display purposes only). The climate models developed by NASA GISS and IPSL give large estimates of the scale parameters for MED, CAS and TIB, while NASA GISS gives large estimates for AMZ, CAM, GRL and NEU. Relatively large estimates of the scale parameters occur together with relatively little smoothness. NASA GISS and IPSL produce rougher temperature anomalies in those regions. In other models and regions, the estimates of scale parameters range from 0.0002 to 1.42 (in logarithmic scale). The standard errors of the smoothness and scale parameter estimates were mostly small, except for the climate models developed by NASA GISS, MOHC, and IPSL that produce rough land surface temperature anomaly fields (not shown). appears in some climate models from CMIP5, while the others exhibit similar smoothness during JJA and DJF. The smoothness of the multidecadal average near-surface air temperature anomalies in CMIP5 depends primarily on the modelling institution and the region. Interestingly, there are strong similarities in the smoothness between the climate models generated from the same institution, which supports observations that have been reported frequently in the literature (Tebaldi and Knutti, 2007;Jun et al., 2008a, b;Knutti et al., 2010).
In the future, we plan to examine the smoothness of various climate variables, such as precipitation amount, pressure, wind speed, etc. This can give us more insights into the characteristics of climate regions and climate models. Also, CMIP5 is an archive of well-designed experiments of vast climate models. This paper analyses historical runs (experiment 3.2) of CMIP5 under which all forcings are implemented. Experiments 5.1Á5.5 of CMIP5 are composed of simulation runs of the same climate models as in experiment 3.2 but with emissions forcings with fixed or different scenarios of the carbon cycle. The carbon cycle is an important factor that affects surface temperature, as do the presence of sulphate, clouds, interactive aerosols and greenhouse gas emissions. By comparing the smoothness between the climate models with or without forcings, we may test the effect of forcings on the local variation of the surface temperature anomalies.

Acknowledgements
The authors thank the reviewer for their detailed comments that greatly helped improving the presentation of the paper. The authors also thank Dr. Michael Stein, Dr. Ramalingam Saravanan, Dr. Kenneth Bowman and Dr. Francis Zwiers for their very constructive and helpful comments that led to a number of improvements in this work. This publication is based in part on work supported by Award No. KUS-C1-016-04, made by King Abdullah University of Science and Technology (KAUST). Also, Mikyoung Jun's research was supported by NSF grant DMS-1208421. The authors acknowledge the modelling groups (listed in Table 1 of this paper) for making their simulations available for analysis, the Program for Climate Model Diagnosis and Intercomparison (PCMDI) for collecting and archiving the CMIP5 model output, and the World Climate Research Programme (WCRP)'s Working Group on the Coupled Modelling (WGCM) for organising the model data analysis activity. The WCRP CMIP5 multimodel data set is supported by the Office of Science, U.S. Department of Energy. The authors also acknowledge the Physical Sciences Division of the Earth System Research Laboratory in the National Oceanic and Atmospheric Administration for providing the NCEP/NCAR Reanalysis.

A.1. Details on composite restricted likelihood
Recall that the composite likelihood, eq. (4), factorises the densities of the error of the best linear predictor of Z(s i ) based on a few neighbouring observations preceding s i in the ordering of the observation locations. Analogously, the composite restricted likelihood factorises the densities of the error of the best linear unbiased predictor (BLUP) of Z(s i ) based on the few preceding observations in a neighbourhood of s i . More specifically, let Z 0(Z(s 1 ),. . .,Z(s N )) T . For i 0k'1,. . .,N, let B i,k (u) be a vector of length N so that W i,k (u) 0B i,k (u) T Z is the error of the BLUP of Z(s i ) based on Z(s i,1 ),. . .,Z(s i,k ). For i0k, take B i,k (u) to be a fixed matrix (independent of u) of size N)(kÁ1) with rank kÁ1 so that W i,k (u) is a set of contrasts of Z(s 1 ),. . ., Z(s k ). Then, W i;k ðhÞ e N ð0; V i;k ðhÞÞ, for V i;k ðhÞ ¼ B i;k ðhÞ T VarðZÞB i;k ðhÞ, i ]k. The composite restricted loglikelihood is: e rl k ðhÞ ¼À NÀ1 2 logð2pÞÀ 1 2 P N i¼k ½logfdetðV i;k ðhÞÞg þW i;k ðhÞ T V i;k ðhÞ À1 W i;k ðhÞ: Refer to Stein et al. (2004, Appendix B) for the equations of B i;k ðhÞ, r e rl k ðhÞ and r 2 e rl k ðhÞ that are required in the sandwich estimator of the variance of the composite restricted likelihood estimator of u. In the analysis in Section 3, we consider e rl k ðhÞ with k 03,. . .,10. The estimation of u is done by profiling out C from e rl k ðhÞ and maximising the profiled equation for H over (0,1), by a combination of golden section search and successive parabolic interpolation.