A spatiotemporal analysis of U.S. station temperature trends over the last century

This study presents a nonlinear spatiotemporal analysis of 1167 station temperature records from the United States Historical Climatology Network covering the period from 1898 through 2008. We use the empirical mode decomposition method to extract the generally nonlinear trends of each station. The statistical significance of each trend is assessed against three null models of the background climate variability, represented by stochastic processes of increasing temporal correlation length. We find strong evidence that more than 50% of all stations experienced a significant trend over the last century with respect to all three null models. A spatiotemporal analysis reveals a significant cooling trend in the South‐East and significant warming trends in the rest of the contiguous U.S. It also shows that the warming trend appears to have migrated equatorward. This shows the complex spatiotemporal evolution of climate change at local scales.


Introduction
[2] Changes in climate have significant implications for societies, future generations, the economy, ecosystems, and agriculture. Consequently, climate change, and especially its anthropogenic forcing, has been, and continues to be, the subject of intensive scientific research and public debate [The Royal Society, 2010]. Due to the complex nature of climate dynamics, understanding the response of the climate system to different forcings is a challenging problem, involving analysis of the variations and trends in long time series of atmospheric measurements and proxy records. The global mean surface air temperature is one of the most important and most discussed indicators of global change. It has risen by about 0.74 ı C from 1906 to 2005 and been attributed (mostly) to a rise in greenhouse gases [Intergovernmental Panel on Climate Change, 2007].
[3] While global mean temperature is a useful indicator of global climate change, many policy makers and the general public are more interested in whether they already feel the effects of global warming where they live. Because the background climate variability plays a much larger role on smaller spatial scales than for the global mean [Hawkins and Sutton, 2009], local and regional temperature trends are easily masked by natural temperature fluctuations, and their identification is further complicated by the fact that climate change is not happening in general in a monotonic and uniform way [Wu et al., 2007].
[4] The term trend is frequently encountered in data analysis and is one of the most critical quantities in global change research. For example, in a casual Internet search, there are presently more than 12 million items related to trend and detrending [Wu et al., 2007]. Since there is no general definition of what a trend should be [Wu et al., 2007;Franzke, 2009], different techniques have been used to identify a trend, and so, in general, a trend depends on the method one is using to identify it. The simplest definition of a trend, and the one most often used in climate research, is a straight line fitted to the data. But this may be illogical and physically meaningless in the real nonlinear and nonstationary world [Wu et al., 2007;Franzke, 2009]. Another definition of trend is a running mean of the data, which requires a predetermined time scale to carry out the smoothing operation. This has little rational basis, since in a stochastic or chaotic nonstationary process, the local time scale is unknown a priori [Wu et al., 2007]. More complicated trend extraction methods, such as regression analysis or Fourier-based filtering, are often based on stationarity and linearity assumptions and thus face a similar difficulty in justifying their usage [Wu et al., 2007]. Since the trend should be an intrinsic property of the data, the processes of determining it have to be adaptive to accommodate data from nonstationary and nonlinear processes. The recently developed empirical mode decomposition (EMD) method [Huang et al., 1998] fits these requirements.
[5] Fundamentally, climate change science concerns the identification and understanding of climate trends caused by changing external physical processes. However, it is complicated by the fact that the climate system is a complex system in which a large number of processes nonlinearly interact with each other to produce variations over vastly different time and space scales. Thus, even a stationary climate system will create apparent, so-called stochastic, trends over rather long periods of time, which need to be distinguished from that of a nonstationary external influence [Fatichi et al., 2009;Barbosa, 2011;Franzke, 2009Franzke, , 2010Franzke, , 2012.
[6] In order to do this, it is usual to compare the observed trend with the statistical distribution of possible trends from a simple stochastic null model. The most widely used null models in climatic trend analysis are Gaussian white noise [e.g., Wu et al., 2007] or an autoregressive process of first order (AR(1)) [e.g., Santer et al., 2000Santer et al., , 2008Franzke, 2009]. These two are so-called short-range dependent processes. However, there is increasing evidence that the climate system is long-range serially correlated, unlike white noise, and over a longer range than AR(1) [e.g., Huybers and Curry, 2006;Koscielny-Bunde et al., 1998;Franzke, 2010]. For this reason, we will also use a long-range dependent process in our trend analysis.
[7] In the present paper, we study the statistical properties of nonlinear annual mean local temperature trends from 1167 stations covering the 48 geographically contiguous states of the U.S. for the period 1898 through 2008. A previous study by Lu et al. [2005] analyzed an earlier version of this data set covering a shorter time period and less stations. They also tested the significance of the linear trends against a periodic autoregressive moving average process. Here we will use the updated version of this data set and test the significance of the trends against three different null models of varying correlation length and also use a nonlinear and nonstationary method to identify the trends. In section 2, we present the data, and we provide a short description of the nonlinear and nonstationary EMD method for trend identification and its application to the U.S. station data set. Then, we analyze the statistical significance of the temperature trends (section 2), and finally, we present the results (section 3) and draw conclusions (section 4).

Temperature Data
[8] In this study, we analyze the trends of monthly temperature times series at individual stations from the United States Historical Climatology Network (USCHN version 2) [Menne et al., 2009] (the data are available at http://cdiac.ornl.gov/epubs/ndp/ushcn/ushcn.html). The full data set consists of 1218 stations and temperature records are homogenized, namely processed to remove systematic changes in the bias of a climate series and to mitigate the impact of data gaps [Menne et al., 2009]. Moreover, the urbanization effect is likely small [Menne et al., 2009;Hausfather et al., 2013].
[9] In order to have same time coverage for each station, we selected 1167 stations in the 48 contiguous states of the U.S. spanning 111 years from 1898 up to 2008 ( Figure 1).

Trend Identification
[10] The trends are identified through the empirical mode decomposition (EMD) technique, developed to process nonlinear and nonstationary data [Huang et al., 1998;, and successfully applied in many different fields [Loh et al., 2001;Echeverria et al., 2001;Coughlin et al., 2004;Laurenza et al., 2012;Capparelli et al., 2011;Ouarda, 2011, 2012]. EMD decomposes a time series into a finite number of intrinsic mode functions (IMFs) and a residual by using an adaptive basis derived from the time series through a so-called "sifting" process, namely, where T denotes the temperature time series and each IMF Â j (t) and residual r m (t) are time-dependent. The "sifting" algorithm works as follows: Figure 2. Annual mean temperature record for the station of Highland, Alabama. The EMD trend is the solid red line, while the dashed lines denote the linear least square fits to the data (black) and the EMD trend (red).
[11] 1. All local maxima and minima of the time series T(t) are identified.
[12] 2. All local minima (maxima) are connected through a cubic spline as a lower (upper) envelope of the time series.
[13] 3. The signal h 1 (t) is calculated as the difference between the data and the mean of lower and upper envelopes [14] 4. h 1 (t) is identified as the first IMF Â 1 if it satisfies two criteria: (i) the number of extrema and the number of zero-crossings must either be equal or differ at most by one; (ii) at some point, the mean value of the lower and upper envelope is zero.
[15] 5. If h 1 (t) does not satisfy one of these conditions, then step 3 is repeated using h 1 (t) as the data, namely, h 11 (t) = h 1 (t)-m 11 (t), where m 11 (t) is a mean of the envelopes derived from h 1 rather than T. This procedure is iterated k times until h 1k satisfies the conditions (i) and (ii) or until the difference between the successive sifted results is smaller than a given limit [Huang et al., 1998] which in our case is fixed as thr = 0.3.
[16] 6. The first IMF is identified as Â 1 = h 1k and steps 1-5 are repeated to find the next IMF.
[17] 7. When no more IMF can be extracted, the difference between the data T(t) and the sum of all identified IMFs defines the residual r m (t). In this study, we define this residual as the EMD trend.
[18] As shown by Wu et al. [2007], the IMFs and the residual obtained by using EMD can depend on the stopping criterion for the sifting process. The ambiguity depends on the data set and in particular on the sampling rate. In  a temperature record, the main modulations are the daily oscillation (due to the difference of temperature from day to night) and the annual oscillation (due to seasonality). If we apply the EMD decomposition on temperature time series with a sampling rate less than 1 year, then a single IMF mode will turn out to be dominant compared to the rest of the IMF modes , making the iteration to identify the EMD residual very sensitive to the sifting conditions. For this reason, we decided to use the annual mean of the temperature time series for each station. Using these data, we verified that for all stations, the shape of the residuals r m (t) does not depend on the value of thr selected.
[19] The complexity of the definition of a trend can be seen in Figure 2, where we show an example of the annual mean temperature time series of the station Highland (Alabama).
[20] A linear least square fit to the data reveals an overall secular decrease with a slope of m data = (-0.9860 .172) ı C/century, which is statistically significant with respect to the null hypothesis of a white noise times series with no trend. However, the residual r m (t) of the EMD analysis is a nonlinear function of time, implying the existence of a "nonlinear trend." While a linear fit of the residual function r m (t) gives the linear slope m EMD = (-1.1850 .046) ı C/century, which is compatible with m data , the occurrence of a nonlinear trend function implies different physical and practical interpretations with respect to the linear model, with local acceleration and deceleration of the average mean temperature.

Statistical Significance Test
[21] In order to illustrate the fact that the stations experienced significant variability on many time scales, we first consider the variance of each IMF with respect to a simple white noise model. Following the arguments of Wu and Huang [2004], Figure 3 shows the theoretical spread function (dashed line) of the logarithm of the variance and the logarithm of the averaged period obtained from a white noise process using a 95% confidence level. This is compared with (diamonds) the relationship associated with each Â j (t) and r m (t) for the representative station of Highland (Alabama).
[22] This analysis shows that this station experiences variability on many time scales which are different from the simple uncorrelated noise model because the variances of the IMFs are above the 95% spread function curve with the exception of IMF with j = 2. In addition, note that the observed trend is significant at the 95% level and thus unlikely to be due to the intrinsic fluctuations of this simple null model of the background climate variability.
[24] To systematically assess the significance of trends, we use the following approach for each station: [25] 1. Estimate the slope of the residual of the empirical annually averaged temperature record obtained by EMD decomposition with thr = 0.3.
[26] 2. Estimate the parameters of the null model from the monthly temperature station anomalies.
[28] 4. Appropriately average the surrogate data so that they correspond to annual mean data.
[29] 5. Estimate the slopes of the surrogate data by EMD decomposition with thr = 0.3, i.e., in the same way as for the empirical data in step 1.
[30] 6. Compare the slope of the empirical trend with the distribution of surrogate slopes. If the empirical trend is outside the 2.5th or 97.5th percentiles of the surrogate distribution, then we identify that this station has a trend that is unlikely to have arisen from the background climate variability described by the corresponding null model.
[31] Here we use a standard approach to estimate the AR(1) parameters [Hosking, 1981;Franzke, 2010]. The parameters of the FARIMA(0, d, 0) model, which is a particular case of the standard FARIMA (p, d, q) processes are estimated by a semiparametric estimator [Geweke and Porter-Hudak, 1983;Hurvich and Deo, 1999] from detrended monthly temperature anomalies. Note that the surrogate data are generated on monthly time scales, which is less than the annual averaging period used for the empirical data. This is in order to include the effect of so-called climate noise, in which the averaging of rapid fluctuations produces variability on much longer time scales and apparent trends [Leith, 1973;Feldstein, 2000;Franzke, 2009].

Statistical Significance of Linear Slopes
[32] The results of the Monte Carlo trend analysis described above is summarized in Figure 4 and Table 1.
[33] As expected, stations with a linear slope close to zero were preferentially excluded by the null model tests, corresponding to a flat trend or an oscillating residual r m (t), as illustrated in Figure 5.
[34] (Compare with the significant trend shown in Figure 2.) Out of all 1167 stations, 900 (77%) have a significant trend against the white noise null model, 616 (52.8%) against the SRD model, and 751 (64.5%) against the LRD model. Thus, interestingly, while all stations which are significant against the SRD model are also significant against the white noise model, the number of significant trends does not simply depend on the correlation range of the null model.
[35] Overall, 593 stations (50.8% of all stations) show a significant trend against all three null models. This provides strong evidence [Franzke, 2012] for large-scale temperature change of the contiguous U.S. over the last century, confirming and extending the results of previous studies [Jones et al., 1999;Hansen et al., 2001;Lund et al., 2001;Lu et al., 2005] which tested trends against only white noise or SRD models.
[36] The average significant trend corresponds to an increase of temperature during the last century over the United States. However, the trend distributions display both significant cooling trends (negative slope) as well as significant warming trends (positive slope). These vary in location as shown in Figure 6 for significant EMD trends against all null models and in Figure 7 for trends obtained by a linear least square of the station data. Figure 6 shows that both the widespread significant warming trends and the smaller region of significant cooling trends (in the south-east of the United States close to the southern Appalachian Mountains) reported by others [Lu et al., 2005] are robust to the correlation structure of the null model, including the LRD model not previously tested. Moreover, this pattern has previously been referred to as a "warming hole" [Robinson et al., 2002;Pan et al., 2004;Leibensperger et al., 2012].
[37] To gain insight into the likelihood of finding significant trends in a region, we subdivided the U.S. into grid cells of the same size (about 400 km 2 ) and then computed the ratio of stations with significant trends against all three null models and the total number of stations in the grid cell.   Figure 8 shows the geographic variation of this probability. There is a large region covering the South-East U.S. and states like Pennsylvania and Ohio as well as regions in the North-West (parts of Washington State and Montana) which have a low probability of significant trends against all three null models, although note that these are generally greater than the assumed 5% critical probability. Furthermore, large parts of the contiguous U.S. have probabilities of 50% or higher of significant trends against all three null models.

Nonlinear Trend and Local Geographic Parameters
[38] The advantage of EMD is to obtain the residual as a time-dependent nonlinear function without any a priori assumption about its structural form (see equation (1)). For each station at a given time t, we can look for the trend slope and its derivative. This information allows us to investigate the climate trends from a point of view that is local in time. In Figure 9, we show a map which displays the temporal evolution of the time derivative of the EMD trends. The derivative is obtained numerically through second-order finite differences for each station. The stations are sorted according to the latitude (Figure 9a) and longitude (Figure 9b). Coherent structures are evident, suggesting robust relationships. In regard to the latitude, we can see an increase of temperatures starting around 1920, but with a lag of about 30 years for the stations situated below of 35 ı N. For the longitude, we can note a homogeneity in the onset time of warming, except for a middle strip from -90 ı to -80 ı where it is not possible to detect any positive (warming) trend, and including the region of significant cooling in the South-East U.S. seen in Figure 7. [39] In this study, we discussed the spatiotemporal distribution of station temperature trends over the contiguous United States covering the period 1898 through 2008. We use the EMD decomposition to extract trends in a nonstationary situation [Wu et al., 2007]. Our main results can be summarized as follows:

Conclusions
[40] 1. We defined temperature trends as the residual of an EMD analysis. The most commonly used definition of a trend, which is a straight line fitted to the data, has been compared with the straight line fitting of the residual EMD function r m (t). The EMD analysis additionally reveals deviations from linear temperature trends. Our results extend the previous works about the temperature trends [Jones et al., 1999;Hansen et al., 2001Hansen et al., , 2010Lund et al., 2001;Lu et al., 2005], providing a more detailed view of the geographic distribution of the positive (warming) and negative (cooling) slopes.
[41] 2. By comparing these trends against three different null models for the background climate variability, we have increased confidence in the significance of the observed trends. About 50% of all stations are found to be significant against all three null models. This provides strong evidence that the U.S. has experienced climate change over the last century, irrespective of the assumed correlation structure of the null model.
[42] 3. The value of trend rate, calculated only for the stations with a significant trend against to all three null models, is 0.898˙0.731 ı C/century, which compares with 1.126˙0.658 ı C/century [Lund et al., 2001] and 0.9320 .612 ı C/century [Lu et al., 2005].
[43] 4. According to Lu et al. [2005], the region spanning the South-East U.S. up to the states of Ohio and Illinois have seen a cooling trend while most other regions experienced a warming trend. Our analysis shows that these patterns are significant against three different null models.
[44] 5. Using the time derivative of the residual EMD function r m (t), we can define the instantaneous changing slope of temperatures. We found that the changing slope is well ordered by the geographical longitude and latitude of the place where measurements are taken, and suggests that the onset of warming migrated in latitude.
[45] Taken together, using an updated data set covering an extended period than previous studies [Lund et al., 2001;Lu et al., 2005], our results strengthen, support, and extend the evidence of a significant cooling in the South-East U.S. and of a dominant significant warming over most of the contiguous U.S. during the last century.