Ecological forecasting under climatic data uncertainty: a case study in phenological modeling

Forecasting ecological responses to climate change represents a challenge to the ecological community because models are often site-specific and climate data are lacking at appropriate spatial and temporal resolutions. We use a case study approach to demonstrate uncertainties in ecological predictions related to the driving climatic input data. We use observational records, derived observational datasets (e.g. interpolated observations from local weather stations and gridded data products) and output from general circulation models (GCM) in conjunction with site based phenology models to estimate the first flowering date (FFD) for three woody flowering species. Using derived observations over the modern time period, we find that cold biases and temperature trends lead to biased FFD simulations for all three species. Observational datasets resolved at the daily time step result in better FFD predictions compared to simulations using monthly resolution. Simulations using output from an ensemble of GCM and regional climate models over modern and future time periods have large intra-ensemble spreads and tend to underestimate observed FFD trends for the modern period. These results indicate that certain forcing datasets may be missing key features needed to generate accurate hindcasts at the local scale (e.g. trends, temporal resolution), and that standard modeling techniques (e.g. downscaling, ensemble mean, etc) may not necessarily improve the prediction of the ecological response. Studies attempting to simulate local ecological processes under modern and future climate forcing therefore need to quantify and propagate the climate data uncertainties in their simulations.


Introduction
There is wide interest within the global change research community in using model projections of future climate to forecast ecological responses to climate change. To date, the vast majority of such studies have used large scale, globally calibrated vegetation and carbon cycle models (e.g., [1,2]) that have input requirements compatible with the typical spatial (hundreds of kilometers) and temporal (monthly) resolution of climate model output. These models are most commonly used to investigate changes in vegetation dynamics and carbon storage over large spatial scales (regional to global), with the implicit understanding that these results will have much lower fidelity and utility at any single location or grid cell. Increasingly, however, researchers are using output from global climate models to force site-specific ecological models that predict species phenology, fecundity, and population dynamics (e.g. [3][4][5][6][7][8][9][10]). This is often done without quantifying the fidelity of the climate simulations at the study location, and typically using output from only a few (or even a single) climate models.
Calibrating and validating these ecological models using climate observations presents a unique challenge to the ecological research community. In many cases, local climate data are simply not available at the same location as ecological observations (e.g. [11]). As an alternative, data from nearby weather stations may be used for model calibration and validation, assuming that these stations will be representative of the climate at the location of the ecological observations. If contemporaneous nearby weather stations are not available, there are a variety of secondary, derived data products that can be substituted. These products are interpolated from sparse observations to uniform and continuous spatial and temporal grids using statistical or process based models (e.g. [12,13]). The usefulness of these interpolated datasets for local modeling, however, can be limited by the quantity and quality of the underlying observations, their often coarse temporal (e.g. monthly) and spatial (e.g. hundreds of kilometers) resolutions, and the nature of the variable itself (e.g. continuous versus discrete). While global vegetation models can be constructed relatively easily using these coarse global gridded climate datasets, localized models typically require climate data at much finer spatial and temporal resolution than either these gridded datasets or climate models can provide.
We use a case study approach to highlight uncertainties in site-specific ecological modeling, focusing on the climate inputs used to drive ecological models for both the modern period and future climate change scenarios. For our ecological models, we develop three species-specific models of first flowering date (FFD). Phenology models (especially FFD) are one of the most commonly used and easily understood ecological models, due in part to their relatively modest data requirements (e.g. daily temperatures), availability of long-term datasets, and reasonably high understanding of the underlying biological processes. This case study will address three questions: • How well do the FFD models reproduce the observations when forced with various observational climate datasets? • How well do the FFD models reproduce the general trends and variability when forced by climate model simulations of the twentieth century? • What is the range of FFD predictions for a future climate scenario that tracks current greenhouse gas emissions?
Observations of FFD are obtained from the Mohonk Preserve at Mohonk Lake, NY (41.77 • N, 74.15 • W, 380 m ASL) [14], a site with an accompanying high quality local record of daily temperature and precipitation [15]. The preserve itself is located roughly 135 km north of New York City and covers an area of approximately 2800 hectares. The combination of both local weather and ecological data at this site represents a significant and rare opportunity for evaluation, allowing the use of local phenology and climate data as a benchmark to evaluate the sensitivity of ecological forecast models to the use of alternative climate forcing data.

Climate and phenology at Mohonk Lake, NY
To parameterize the phenology models, we use records of FFD and daily temperature from the Mohonk Preserve. For the last century, the Mohonk Preserve has recorded and maintained records of plant species phenology (FFD) [14] and local climate (daily temperature and precipitation) [15]. The climate record extends from 1896 to the present and is remarkably homogeneous, largely free from biases related to instrumentation changes, observer methodology, or alterations to the surrounding environment. The phenology records begin in the 1920s using consistent observation routes [14]. If, for a given year, a species was not observed to flower at these locations, flowering for that species was not recorded. This results in the occasional missing data point, but helps to ensure high data quality. We selected FFD observations for three woody perennial plant species: Amelanchier arborea (shadbush), Kalmia latifolia (mountain laurel), and Sambucus racemosa (red berried elder).
Each species has at least 30 years of observations, and some of the highest sensitivity to climate of all species at Mohonk Lake, as established in a previous study [14].

Observational climate records
To evaluate the impact of using various datasets to drive the phenological models, we implement temperatures from five observationally derived datasets to assess the ability of the FFD models to reproduce the observations (table 1). In this study, we define these data sets as observationally derived to indicate that some manipulation is required to convert the original, raw observations to site-specific climate observations. This could include spatial interpolation, temporal interpolation, or other statistical smoothing such as modifying the temperature records to account for elevation.
The FFD models are initially calibrated and validated using the local Mohonk Lake climate record (OM), which serves as our baseline best case scenario. The second dataset is derived from the CRU version 2.1 climate grids (CM), which are statistically interpolated from individual station records to uniform 0.5 • × 0.5 • grids at monthly mean temporal resolution [13]. From this dataset, we select the grid cell that contains the Mohonk Lake site and perform a simple temporal repeating of the monthly mean values for each day of the month. While there are many other techniques for temporal downscaling (e.g. weather generators), we implement this simple method here to highlight the differences between monthly and daily data. The third dataset is daily data from nearby stations in the Global Historical Climatology Network (GHCN) [16], interpolated to the Mohonk Lake location from nearby stations using Kriging (KM); a fourth, alternative dataset is identically calculated, but also includes elevation as a covariate in the model (KME). Finally, the fifth dataset is the North American Regional Reanalysis (NM) [17], dynamically downscaled to 36 km spatial resolution from the NCEP-NCAR reanalysis data [12]. The NARR data are also available at a daily resolution, and for our study we use the nearest grid cell to the Mohonk Lake site. For our observational comparison, we use the period 1981-2002 as a common validation interval across all datasets.

Global and regional climate model output
Output from nine general circulation models (GCM) is taken from the World Climate Research Programme's Coupled Model Intercomparison Project (CMIP3) database [18]. We select models for which daily data are available for both the modern 20C3M (coupled ocean, twentieth century climate forcings) and future SRES A2 (coupled ocean, high greenhouse gas emissions) scenarios. We use mean surface temperatures from the land-based grid cell closest to the Mohonk Lake site. Output for the four regional climate models (RCM) is obtained from the North American Regional Climate Change Assessment Program (NARCCAP) [19]. These are limited area regional simulations over North America, forced by boundary conditions taken from GCM simulations in the CMIP3 database. The potential advantages of using RCMs include a higher spatial resolution and specific, regional parameterizations that may improve the fidelity of the climate simulation at Mohonk Lake. Our RCM simulations use the same modern and future climate forcing scenarios as the GCMs (20C3M and SRES A2). To account for biases inherent in most climate simulations, all temperature output from the GCM and RCM simulations (20C3M and SRES A2) is converted to anomalies relative to the model mean for 1971-95. These anomalies are then added to the mean from the Mohonk Lake record (OM) for 1971-95. In this way, the mean climate of Mohonk Lake is preserved in the GCM/RCM temperature output, while allowing the models to superimpose their own trends and variability.

Phenology model calibration and validation
For our phenology model, we iterate the following series of equations through each day of the year, starting from day of year (doy) DayStart: where doy is the current day of the year (always greater than DayStart), T mean (doy) is the mean temperature on doy, GDD(doy) is the growing degree day value for the current day of year, GDD thresh is the baseline temperature for a day to be considered a GDD, GDD sum (doy) is the cumulative sum of GDDs from DayStart to doy, and F crit is the critical GDD sum threshold required for FFD to be triggered. On the day GDD sum matches or exceeds F crit , the FFD for that species is recorded. Values for three parameters (DayStart, GDD thresh , F crit ) are estimated separately for each of the three species using a root mean square error minimization procedure (table 2). We use 1928-80 as the calibration interval and reserve years 1981-2002 as the common validation interval across all observationally derived climate records. We also performed a parallel model calibration incorporating a fourth parameter (chilling day requirements), but found that any improvement in model fit did not sufficiently compensate for the addition of a fourth parameter, as determined using Akaike's Information Criterion (AIC). For the observational climate data comparison, these models are forced with varying observation temperature inputs over the common validation interval . For the GCM/RCM comparison, these models are forced over modern (1971-95) and future (2046-65) intervals for which the climate model output is available.

Observationally derived climate data
Temperature, via the GDD and GDD sum variables, is the primary driver for our FFD models. All datasets show significant positive trends in annual (not shown) and half year (1 January-30 June) averaged mean temperatures (figure 1, left panel). Mean temperatures calculated from the local climate data (OM) have a stronger positive trend and are slightly warmer than some of the other datasets, especially towards the end of the validation period. Abrupt and transient cooling following the 1991 eruption of Mount Pinatubo is apparent in all datasets, and the interannual variability is generally coherent between datasets. For GDDs accumulated over the first half of the year ( figure 1, right panel), the observationally derived datasets tend to be much colder. Additionally, while the trends are similar if calculated for the first part of the validation period (1981-91), only OM shows a strong positive (warming) trend in GDDs over the entire interval, indicating large differences in the ability of the different observationally derived datasets to reproduce trends in the important driving variable for the FFD models. The reason for the biases in the mean state and trends in the alternative datasets is unclear. Possibilities include micro-site effects (e.g. land cover changes) or methodological issues related to data recording (e.g. instrumentation, station moves) and data processing (e.g. quality control, interpolation techniques). Regardless, these differences do showcase the potential pitfalls in using non-local climate data for highly localized modeling applications, and strongly suggest that care must be taken in making any assumptions about the suitability of alternative data products.

Phenology simulations: observational forcing
Because of the cold bias in GDDs in the observationally derived datasets, our FFD simulations are biased towards later than observed predictions of FFD (figure 2, left panels). The fairly uniform bias across datasets translates to an ensemble median that is also biased, counter to prior climate model studies suggesting that ensemble means or medians may provide better predictive capability than any single ensemble member [20,21]. We quantify the model matches with observations over the validation periods using Taylor diagrams [22]. All simulations capture interannual variability in the FFD, with varying degrees of success, quantified using a Pearson correlation (figure 2, right panels). Notable events captured by all simulations are delayed FFD during 1992 (associated with cooling from the Mt Pinatubo eruption) and earlier FFD during 1998 (one of the warmest years on record). For all three species, the lowest fidelity (based on Pearson) comes from the monthly dataset, CM, that required downscaling to daily resolution. While results may improve when using a more detailed temporal downscaling technique, these results highlight the need for daily temperature resolution. All other datasets show similar correlations; however it is important to note that all are biased cold relative to OM, suggesting that the seasonal dependence of many ecological forecasting models are subject to the biases of the driving datasets. If the proper seasonal forcing is not accurately represented in the forcing datasets, then the subsequent model response will not accurately reflect the local observations.

Phenology simulations: climate model forcing
GCM simulations from the CMIP3 archive include a dynamic, prognostic ocean, resulting in internally modeled sea surface temperatures (SST) and interannual climate variability that will not necessarily match with observations. These simulations are still useful, however, for understanding how GCMs reproduce the average dates and general trends in FFD over both modern and future intervals.
For the modern interval, two of the three species (Amelanchier arborea and Kalmia latifolia) show decreasing trends over time towards earlier FFD dates (black diamonds, figure 3). For these two species, the GCM and RCM simulations generally capture the mean timing in FFD, as well as the trends observed at the site, although the model ensembles generally have a large spread biased towards later than observed FFD and underestimated FFD trends. In the GCM ensemble, some ensemble members even indicate an opposite, positive trend in FFD. There is no notable improvement in the higher resolution RCM simulations in reproducing the ecological response, although this may be attributed to the smaller ensemble size in the RCM ensemble. Of the three species, only Sambucus racemosa has positive trend during the modern period, which is not reproduced by the GCM and RCM ensemble simulations. Some of the GCM and RCM ensemble members do reproduce the positive trend during the modern period, and this again highlights that the multi-model ensemble mean or median may not be the most appropriate implementation for ecological studies. With a warmer climate over the future interval (2046-65), both the GCM and RCM ensembles shift towards earlier FFD dates. Trends in FFD calculated over 2046-65 in the GCM ensemble also shift towards more uniformly negative trends, although the ensemble spread is still quite large. The ensemble spread of the mean and trend is lower for the RCM simulations compared to the GCM simulations; again, this may be due simply to sample bias because of the lower RCM ensemble size. In addition, the robustness of these trend calculations may be in question because of the limited intervals over which we calculated the trends. We note that, for the future interval especially, we were limited in our choice of models and intervals by the availability of daily temperature output from the IPCC model archive and NARCCAP.

Discussion and conclusions
In any modeling study, there are multiple sources of error and uncertainty that need to be characterized. Here, we illustrate some of the uncertainties contained within climate data used to drive highly localized ecological models. For the ecological community, these uncertainties are key because: (1) climate data are often not available directly at the sites of ecological observations; and (2) there is often a large disconnect in scales (spatial and temporal) between highly localized ecological data and the climate data used to develop and drive these local ecological models. This becomes increasingly important in areas with complex local climatological conditions (e.g. microclimates) and when the climate variable of interest has low spatial covariance. In addition to these scaling issues, there are also uncertainties in the ability of globally (or even regionally) parameterized climate models to adequately simulate features of highly localized climates. Understanding and quantifying these uncertainties is a crucial task for fully understanding the robustness of various ecological and climatological forecasts.
Our observational data comparison showcases some of the inherent risks in using non-local climate data, including biases and the potential for strong disconnects between climate and ecological variables. For example, even though all the observational data sets show a similar trend in seasonally resolved mean temperatures, none reproduce the warming trend in GDDs observed in the local climate record (OM). Additionally, temporally downscaling from monthly to daily observations for the FFD model (the CM dataset) generally resulted in the lowest fidelity simulations. For models that require high temporal resolution input data, as many localized ecological models do, available climate data may be a significant limitation because of the limited availability of quality daily data and the potential problems associated with using coarser temporal scales to calculate or represent higher resolution inputs (e.g. GDDs). The FFD simulations using GCM and RCM output also showcase the importance of showing the ensemble spread, in addition to the ensemble mean or median. Showing only the mean or median may give false confidence in the robustness of the modeling results, masking what may be a large spread in ecological variables. Recent work using Bayesian methods to quantify and propagate uncertainty in climatology and ecology (e.g. [23][24][25]) holds promise to provide even more robust methods to account for the structural and parametric uncertainty between and within empirical and dynamic models.
Based on our case study, we recommend that local ecological modeling studies: (1) use a suite of observational climate datasets if local climate data are not available; (2) use caution when selecting datasets that cannot explicitly resolve the temporal resolution of needed input variables; and (3) present not only the ensemble median or mean but also the full ensemble spread. Considering these issues will increase awareness of the uncertainties surrounding projections of ecological responses to climate change, and allow the community to more readily assess the robustness of various predictions.