On using principal components to represent stations in empirical statistical downscaling By

We test a strategy for downscaling seasonal mean temperature for many locations within a region, based on principal component analysis (PCA), and assess potential benefits of this strategy which include an enhancement of the signalto-noise ratio, more efficient computations, and reduced sensitivity to the choice of predictor domain. These conditions are tested in some case studies for parts ofEurope (northern and central) andnorthernChina.Results show that the downscaled results were not highly sensitive to whether a PCA-basis or a more traditional strategy was used. However, the results based on a PCA were associated with marginally and systematically higher correlation scores as well as lower root-mean-squared errors. The results were also consistent with the notion that PCA emphasises the large-scale dependency in the station data and an enhancement of the signal-to-noise ratio. Furthermore, the computations were more efficient when the predictands were represented in terms of principal components.


Introduction
Some future planning and climate change adaptations require reliable information about future changes in local temperature or precipitation. Global climate models (GCMs) provide the best tools for making climate change projections (Solomon et al., 2007), but they are only designed to describe the large-scale features and do not provide sufficient details for many applications (Hellstro¨m et al., 2001). However, the local climate is linked to the large scales, and it is possible to make some inferences about local climate change through downscaling. Downscaling is often used to quantify the local response to global warming, either through the means of empiricalÁstatistical downscaling (ESD) or regional climate models (RCMs) (Christensen et al., 2007). While there are various RCMs with different abilities to simulate local climate (Linden and Mitchell, 2009), ESD too may involve many different approaches and techniques (Wilby et al., 1998a(Wilby et al., , 1998bEasterling, 1999;Zorita and Storch, 1999;Huth, 2002;Salath, 2005;Benestad et al., 2008;Brands et al., 2013;Gutman et al., 2012). Indeed, the results of downscaling may vary depending on the chosen strategy or method, and hence the sensitivity of the results to different options must be examined (Hewitson et al., 2014). A relevant question for the projection of local climate change is how results from downscaling depend on assumptions and chosen strategies.
The optimal approach to deriving the local response to a large-scale climate change varies depending on the context. In some circumstances, the objective is not just to infer the local temperature at a single location, but one may be interested in the temperature over a region. A traditional approach has been to downscale one site at a time (Benestad, 2011;Fan et al., 2013), but it is also possible to represent the stations in terms of a principal component analysis (PCA) (Strang, 1988;Press et al., 1989;Wilks, 1995). The variations in the station records are expressed mathematically as a weighted combination of spatial patterns when PCA are used as a basis for downscaling, with different weights applied at different times (see Appendix). In this article, *Corresponding author. email: rasmus.benestad@met.no Tellus A 2015. # 2015 R. E. Benestad et al. This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), allowing third parties to copy and redistribute the material in any medium or format and to remix, transform, and build upon the material for any purpose, even commercially, provided the original work is properly cited and states its license.
we examine whether the traditional approach and the multisite PCA-based approach to downscaling give comparable results, and if one outperforms the other.

Motivation for using the PCA-basis
Local manifestations of a global climate change imply a scale dependency, where the local temperature or precipitation (yðrÞ) can be expressed as a function of the large-scale region (X), geography (gðrÞ) and local processes (h): wherer is a vector containing the coordinates.
It is possible to apply ESD techniques to a group of sites such as in methods based on canonical correlation analysis (CCA) or the analogue method (Barnett and Preisendorfer, 1987;Barnston, 1995;Fernandez and Saenz, 2003;Mo, 2003), for instance using PCA to represent the multi-sites data variability according to where Y is a matrix containing a set of local time series yðrÞ, U is the left inverse that describes a set of spatial patterns describing the most pronounced spatially coherent variability (Benestad, 2013), L is a diagonal matrix holding the eigenvalues describing the prominence of each spatial pattern, and V T is the transpose of the right inverse describing how strongly present each of the spatial patterns is at a given time. The left and right inverse satisfy U T U 0V T V0I, where I is the identity matrix.
There are several potential benefits associated with PCA: (1) it could enhance the signal since it is designed to emphasise coherent spatial variations where the leading modes represent the patterns with maximum variance, (2) it accounts for the spatial covariance between the predictands at different locations, and (3) it can provide an effective means for downscaling several stations in a dense network if a reduced set of principal components (PCs) describes the large-scale dependent variability (higher order PCs may represent local noise h with no dependency to the large scales). The reduction in computational demand is valuable when the downscaling is applied to large ensembles of GCMs or RCMs, which is necessary in order to account for natural variations (Deser et al., 2012) and model differences (Allen et al., 2001;Meehl et al., 2005Meehl et al., , 2007aMeehl et al., , 2007b. The application of a PCA-basis for downscaling calls for a research question: Are the downscaled results sensitive to the choice of using PCA rather than a more traditional approach? Hence, the hypothesis that we want to test is: H 0 ¼ PCA and traditional single-station downscaling provide comparable results. There are also additional hypotheses: H 1 ¼ PCA emphasises the large-scale dependency in the station data, and H 2 ¼ PCA makes the results less sensitive to the predictor domain. Hypothesis H 1 implies that the higher order PCs represent local noise and can be addressed with the question of how many PCs are needed to recover most of the variability in observations. It is inspired by known mathematical properties of PCA (Strang, 1988;Press et al., 1989), where the leading modes represent the patterns associated with the highest variance that are usually spatially coherent. The higher order modes represent small-scale noise, and the PCA is expected to have a similar effect as filtering the data according to scales and variance.
Hypothesis H 2 may be true as a consequence of H 1 . The sensitivity of the results to the spatial domain size was examined through repeated downscaling using different domains. The assessment of the sensitivity was based on the correlation between the downscaled and observed seasonal temperature anomalies for both downscaling strategies. Rootmean-squared errors (RMSE) were also computed for each station and were used to assess the sensitivity of two downscaling approaches to the choice of predictor domain size.

Data and method
Climate can, for all intents and purposes, be defined as the statistics of weather, and in any study of climate, a statistical sample of a weather variable should be sufficiently large to get a representative estimate of the probability distribution function (pdf) or a parameter describing some characteristics of the pdf. The present study analysed seasonal mean temperature, as it was assumed that scales in time and space are connected. The statistical sample for the seasonal time scale is about 90 data points, although the real sample size is somewhat less due to autocorrelation. The mean estimates for seasons with less than 90 d of valid data were flagged as missing values. We downscaled the winter, summer, spring and autumn mean temperature estimates independently from temperatures taken from either of two different European data sets in two experiments: the monthly mean North Atlantic Climate Dataset NACD (Frich et al., 1996) for the period 1890Á1990 (not shown) and daily mean temperatures from the European Climate Assessment ECA&D (Klein Tank et al., 2002), respectively. Only the stations with complete ON USING PRINCIPAL COMPONENTS data were retained. The predictor was taken from either of two different reanalyses: large-scale 2-metre air temperature from the ERA40 reanalysis (Simmons and Gibson, 2000;Bengtsson et al., 2004) and the NCEP/NCAR reanalysis I (Kalnay et al., 1996). The predictors were represented in terms of EOFs, and the seven leading modes (for PCA-based ESD, the number of EOFs representing the predictors is unrelated to the number of PCs representing the predictands). The results were sensitive to neither the choice of predictand data set nor the predictor data.
The experiment was repeated for China to check whether the results were robust with respect to region, where the winter, summer, spring and autumn mean temperatures were estimated from daily mean temperature. The observed daily mean temperature dataset from 825 stations in China was collected from the China Meteorological Data sharing service system (www.data.cma.gov.cn for details). Only 499 (whole of China) or 58 (northern China) stations with complete data over the period 1961Á2012 were retained. The predictor was taken from large-scale monthly 2 m air temperature from NCEP/NCAR reanalysis.
To assess the robustness of the PCA-based ESD further, an additional set of downscaling was carried out for the annual wet-day mean m (with threshold for wet-day at 1 mm/ day) for 62 Norwegian rain gauge records taken from the Norwegian Meteorological Institute's climate archive (www. eklima.met.no). The set of station records for m was complete for the period 1907Á1997; however, since the NCEP/NCAR reanalysis only covered 1948Á2014, the downscaling was carried out for 1948Á1997 (50 yr). The saturation vapour pressure was used as predictor for m, and was estimated using the ClausiusÁClapeyron equation and the surface temperature taken from the reanalysis, as in Benestad and Mezghani (2015). The locations of all stations are shown in Fig. 1.
The downscaling used a stepwise multiple linear regression based on the esd-package (Benestad et al., 2014) for the R computing environment (R Core Team, 2014). 1 The downscaling analysis for the NACD data was carried out for the 1958Á1990 period and 63 different stations in the Nordic countries as well as on the east coast of Greenland, the British Isles and in the Netherlands (not shown). The downscaling that involved the ECA&D data and NCEP/NCAR reanalysis are shown here, and the data spanned the years 1948Á2013 and the region 118WÁ218E/ 468NÁ648N. The experiments were done with three different sets of PCAs (20, 10 and 4, respectively). Crossvalidation was used to reduce the risk of artificial skill, whereby the data were split into five different equally sized sub-samples and where four sub-samples were used for calibration and the remaining was used for out-of-sample evaluation.

Results
A comparison between the results from traditional singlesite downscaling applied to each of the stations separately and the PCA-based multi-site downscaling indicated similar results for all seasons. Figure 2 shows the downscaling results for ECA&D using the NCEP/NCAR reanalysis as predictor, but replacing either of these with the NACD or ERA40 gave the same results. The regression gave a good fit to all the stations and for the 20 PCs, and the results were quite robust to the choice of strategy.
The cross-validation correlation between the downscaled and observed seasonal mean anomaly temperatures was The highest score for each trial is shown in bold and the numbers in the bracket indicate the 95% confidence interval for the correlation estimate. The standard deviation of the PCA-based scores was 0.07, whereas the standard deviation for the traditional method was 0.08, suggesting that the PCA-based ESD was less sensitive to the choice of predictor size.
systematically higher for PCA-based downscaling than for traditional individual station downscaling. Where the correlation scores for the PCA-based results were 0.93, 0.80, 0.81 and 0.88 for winter, spring, summer and autumn, respectively, the traditional approach gave similar but marginally lower scores: 0.92, 0.80, 0.80 and 0.88. Table 1 shows similar scores for when the exercise shown in Fig. 2 was repeated with different predictor domain choices. The lowest crossvalidation correlation score 0.69 and was associated with the largest predictor domain and the traditional downscaling approach. The highest correlation score was 0.94 for the PCA-based downscaling and the smallest predictor domain. The standard deviation of the cross-validation correlation scores for the set of exercises based on different predictor choices was 0.07 for the PCA-based strategy and 0.08 for the traditional approach, indicating that the downscaled results were more robust for the PCA-based approach (Table 1). Figure 3 shows the downscaling repeated for smaller sets of PCs used to represent the station data. The correlation for all stations were comparable to the traditional singlesite approach even when only four PCs were retained. In both these cases the cross-validation correlation was 0.93 as in the original exercise with 20 PCs (Fig. 2), and the higher order modes had little effect on these bulk scores for the entire set of stations.
However, some stations nevertheless showed greater discrepancy with observations for reduced PC sets. Figure 4 shows a more detailed comparison between the observations (black) and the results from the two downscaling strategies (PCA-based in red and traditional in blue) for a small selection of stations. All the panels except the upper right one show the downscaled temperature for the location with  (Fig. 2). The same statistics for the summer was 0.81 and 0.80.
The PCA-based downscaling preserved to a larger extent the spatial covariance of the seasonal mean anomalies of the station data than the downscaling for the individual stations. Figure 5 shows the covariance matrix for the observations (left) and the logarithm of the ratio of the corresponding downscaled results to the observations. The middle panel which shows the results for the PCA-based downscaling and exhibits smaller values for lnðcovðx DS Þ=covðx o ÞÞ than the right panel displaying the results for the individual stations. For both cases, the covariance of downscaled results was lower than that of the observations (anomalies) because results from the regression model capture less than 100% of the variance. However, the correlation between the spatial structure in the covariance matrix was 0.99 for the PCAbased results and 0.98 for the traditional approach. Furthermore, the covariance matrix for the traditional downscaled results contained some elements with values greater than 1, also suggesting a slight degrading of the spatial covariance between the stations.

ON USING PRINCIPAL COMPONENTS
To further test H 1 , we repeated the PCA-based downscaling exercise after excluding the four leading PCA modes (Fig. 6). The multi-station-multi-year correlation for the PCAbased results was reduced to 0.08 in winter and 0.12 in summer. Hence, most of the signal connected to the large scales was embedded in the four leading PCs.
Another measure of skill that is sensitive to mean biases and the magnitudes of the downscaled results is the RMSE, where high skill is associated with low RMSE. Figure 7 shows a comparison between the RMSE estimated station-wise for the PCA-based and traditional approach, plotted against each other. The results indicate higher RMSE for the traditional approach as the points in the scatter plot tend to lie above the diagonal (Fig. 7). The RMSE scores were taken from the same tests as presented in Table 1 for a number of different domain sizes.
Using PCA to represent predictands instead of individual station records reduces the downscaling time. It took 18 s to downscale 62 stations over 65 yr and one season using the PCA-basis with 20 PCs (Fig. 2; on a laptop with a 64-bit Intel Core i7-3540M CPU running at 3.00 GHz )4 and 15.5 GiB of RAM), compared to 30 s for the more traditional approach downscaling each station individually. Tables 2 and 3 show the analysis repeated for China and northern China, respectively, and suggest a similar tendency for the PCA-based ESD to be associated with slightly higher scores as in northern Europe. Hence, the main finding that the PCA-based strategy was at least as good as the traditional method was not specific for the European region. A similar experiment carried out for the annual wet-day mean m for 62 locations in Norway gave a more even performance between the PCA-based and traditional approach (Table 4). In this set of experiments, the trials with the same predictor region were repeated trice but with different number of PCAs retained for the stations and, hence, were not independent in terms of predictor domains (the downscaled results for the traditional approach was identical in all these three trials). When taking into account the confidence estimates in Table 1, there were six trials (50%) where the PCA-based approach gave higher scores and where the respective 95% confidence intervals did not overlap and none where the traditional approach was associated with significantly higher skill than the PCA-based approach. The corresponding assessment for m in Table 4 indicated two cases where the PCA-based approach gave significantly higher scores and where the respective 95% confidence intervals did not overlap and none where the traditional approach gave significantly higher correlation.

Discussion
The downscaled results were not very sensitive to the use of PCA, and hypothesis H 0 could therefore not be rejected. However, the PCA-based skill scores were marginally higher than the results from downscaling individual stations in all the tests. Performing downscaling with different numbers of PCs retained for the station PCA also suggested that most of the dependency to the large scales resided in the leading modes. Hence H 1 also was consistent with these results. Finally, the results from the exercise with repeated analysis based on different predictor domains indicated less sensitivity for the PCA-based approach, in agreement with hypothesis H 2 .
The results presented in this study suggest that the leading PCs actually capture the large-scale dependency of local temperature variability in northern Europe. Downscaling using PCA to represent the predictand offers a way of making  more efficient calculations and accounting for the spatial covariance. Higher order PCs can describe noise and, in conjunction with weather generators, could be used to simulate stationary stochastic small-scale variability through a Monte-Carlo approach. Furthermore, the leading modes of PCA can be used as a diagnostic for large-scale dependency. In other words, using PCA to represent the predictands can be useful when the signal is weak because it will emphasise the large scales in the leading modes and hence separate the signal from noise.
The number of PCs needed to reproduce the observations is expected to vary with the chosen element, season, the extent of the area represented by the stations, and the region. This depends on the proportion of the total variance accounted for by the leading modes. In general, only a small number of PCs is needed for predictands representing spatially coherent variations as, for example, the winter temperature in northern Europe which was sufficiently represented by four PCs. In other situations, such as for the less spatially coherent precipitation, a small number of PCs may be insufficient.
The PCA-basis is not expected to be appropriate if the stations are too widely spread and involve different regional climate phenomena.

Conclusions
In summary, the results from a series of experiments suggested only marginally yet systematically higher correlation between observed seasonal mean temperature anomalies and corresponding downscaled results for PCA-based ESD. This finding was true for both cross-validated results as well as downscaled result based on the calibration data. Our results were consistent with the notion that PCA emphasises the The predictor was the NCEP/NCAR reanalysis and predictands included 58 stations for North China. Numbers in bold face mark the highest score in each test. The entries are correlation scores from the cross-validation. The predictor was the NCEP/NCAR reanalysis and predictands included 499 stations with complete time series over the period 1961Á2012. The highest score in each of the comparisons is shown in bold.
large-scale dependency in the station data and therefore gives a better reproduction of the original data. We also found the PCA-based approach to be more efficient and less sensitive to the choice of predictor domain than corresponding analysis for each station individually. Our conclusion is therefore that ESD can benefit from the use of PCA as a means of accounting for the predictands.

Acknowledgement
This work has been supported by COST-VALUE STSM, the Norwegian Meteorological Institute, CHASE-PL (EEA grant PL12-0104), SPECS (EU Grant Agreement 3038378), Swedish MERGE, BECC and VR. It was also meant to contribute towards efforts within CORDEX-ESD experiments.

Regression
The regression for the PCA-strategy was similar to Benestad (2013): b The regression was applied to each principal component in the same fashion as it was applied to each station record for the traditional downscaling strategy. The main difference was that the original temperatures were recovered from the singular value decomposition (Strang, 1988;Press et al., 1989) used to estimate the EOFs: where X represents the data matrix holding the original seasonal mean temperature, U holds the EOFs, L the eigenvalues, and V the principal components (PCs). The difference between the PCA-based and traditional strategies was that in the former, y j in eq. (A1) was taken to be the respective PCs [columns of V in eq. (A2)] whereas in the latter was the original data [columns of X in eq. (A2)]. The area of a sector was taken as A 0f acos(F)ad U Â Dh=ð2 Ã pÞ, where a is the mean radius of the Earth (6378 km), u is the longitude of the western and eastern boundaries in radians and F is latitude of the southern and northern boundaries. This integral can be solved analytically, and the area was computed according to: A ¼ a 2 ½sinðU max Þ À sinðU min ÞDh: (A3)