Data-based modelling and environmental sensitivity of vegetation in China

A process-oriented niche specification (PONS) model was constructed to quantify climatic controls on the distribution of ecosystems, based on the vegetation map of China. PONS uses general hypotheses about bioclimatic controls to provide a “bridge” between statistical niche models and more complex process-based models. Canonical correspondence analysis provided an overview of relationships between the abundances of 55 plant communities in 0.1 ◦ grid cells and associated mean values of 20 predictor variables. Of these, GDD0 (accumulated degree days above 0 C), Cramer–Prenticeα (an estimate of the ratio of actual to equilibrium evapotranspiration) and mGDD 5 (mean temperature during the period above 5 C) showed the greatest predictive power. These three variables were used to develop generalized linear models for the probability of occurrence of 16 vegetation classes, aggregated from the original 55 types by k-means clustering according to bioclimatic similarity. Each class was hypothesized to possess a unimodal relationship to each bioclimate variable, independently of the other variables. A simple calibration was used to generate vegetation maps from the predicted probabilities of the classes. Modelled and observed vegetation maps showed good to excellent agreement ( κ = 0.745). A sensitivity study examined modelled responses of vegetation distribution to spatially uniform changes in temperature, precipitation and [CO 2], the latter included via an offset toα (based on an independent, databased light use efficiency model for forest net primary production). Warming shifted the boundaries of most vegetation classes northward and westward while temperate steppe and desert replaced alpine tundra and steppe in the southeast of the Tibetan Plateau. Increased precipitation expanded mesic vegetation at the expense of xeric vegetation. The effect of [CO2] doubling was roughly equivalent to increasing precipitation by ∼ 30 %, favouring woody v getation types, particularly in northern China. Agricultural zones in northern China responded most strongly to warming, but also benefited from increases in precipitation and [CO 2]. These results broadly conform to previously published findings made with the process-based model BIOME4, but they add regional detail and realism and extend the earli r results to include cropping systems. They provide a potential basis for a broad-scale assessment of global change impacts on natural and managed ecosystems.


Introduction
The patterns of vegetation and primary production on large spatial scales are primarily controlled by climate.This generalization applies not only to natural and semi-natural ecosystems, but equally to the viability of different agricultural systems.It is useful to model the potential geographic distribution of vegetation types in terms of bioclimatic variables expressing the requirements of plants for warmth and moisture.The same approach can be applied at the level of biomes, vegetation types or species and can be carried out for agricultural systems or specific crops.The value of modelling potential distributions -i.e.those distributions that would be H.Wang et al.: Data-based modelling and environmental sensitivity of vegetation achieved in equilibrium with climate -is that it allows directions of change in response to environmental changes to be characterized irrespective of lags in the establishment of new vegetation types, or in the responses of agricultural systems to changed conditions.Dynamic modelling of natural changes in vegetation is well established, yet there are still major differences among models (e.g.Sitch et al., 2008), and dynamic modelling of land use change is at an early stage (e.g.Rounsevell et al., 2012).The usefulness of dynamic models depends strongly on their ability to correctly predict directions of change, i.e. the potential distribution to which the dynamic processes are tending.With huge advances in the availability of relevant observations to constrain models, and in the size of problems that can now be tackled using statistical methods, there is considerable scope to develop relatively simple models, which are informed by process understanding but also firmly based on observations (see e.g.Smith et al., 2013).
Anthropogenic climate change is beginning to shift the potential and actual spatial patterns of ecosystems and species.This much is clear from thousands of observations worldwide of both expansions and contractions in species' observed ranges and phenologies (Rosenzweig et al., 2007).Climate change is also creating conditions whereby certain types of agriculture are increasingly marginal in some regions (e.g.wheat growing in southwestern Australia (Howden et al., 1999) and paddy rice planting in northern China (Li and Wang, 2010)).In other regions, new modes of agriculture may be starting to become viable.These changes so far have been subtle because climate change has been relatively small in a global context, especially when compared with natural interannual variability.But some degree of continuing climate change is unavoidable, and its effects are expected to become increasingly prominent during the coming decades (Prentice et al., 2012).It is useful to foresee at least the direction of such effects, even if their eventual magnitudes remain largely unpredictable.
The concentration of carbon dioxide ([CO 2 ]) itself has been recognized as a potentially important non-climatic factor that is already shifting vegetation patterns through its effect on the competition between woody and herbaceous plants through an increase in the water use efficiency of C 3 plants, in addition to its effects on climate through the greenhouse effect.This physiological effect of CO 2 is a prominent candidate to explain "woody thickening", the tendency for trees and shrubs to increase in abundance at the expense of grasses, as has been observed in savannas worldwide (Prentice et al., 2011).CO 2 concentration also has an enhancing effect on the growth and yield of C 3 crops, although the rise in [CO 2 ] to date is thought to have been only a relatively minor factor in increasing crop yields (compared with crop breeding and other technological advances; Easterling et al., 2007) and the positive effects may increasingly be offset by negative effects of warming, especially in hot climates (Antle et al., 2002).
So-called niche models or species distribution modelsempirical models fitted to species presence or abundance data as a function of climate variables using statistical methods -have been widely and successfully used to describe the relationships between species distributions and climate (Peterson, 2001), and more controversially to project responses to future environmental changes (Warren, 2012;Pearson and Dawson, 2003).One key limitation of niche models as usually implemented is that their responses are strongly dependent on the choice of environmental predictors (Peterson, 2001).The selection of predictors is usually somewhat arbitrary.Such models often rely on ordinary meteorological summary variables (such as mean annual precipitation) that are only indirectly related to the environment "experienced" by the biota.Alternatively, in the BIOCLIM strand of modelling, an attempt is made to represent bioclimate, but this is done through the use of a set of ad hoc combinations of variables such as "annual temperature range" and "precipitation of the warmest quarter" (Beaumont et al., 2005).The problem of equifinality in the choice of predictors is not entirely avoidable, because correlations among different aspects of climate mean that the "correct" predictors cannot be chosen unambiguously on the basis of empirical correspondences alone.It is therefore valuable to make use of basic process understanding of the mechanisms controlling species' viability (Harrison et al., 2010) to derive composite bioclimatic variables expressing different aspects of the environment.Furthermore, it is reasonable to assume that different types of environmental requirements (for example, for warmth and moisture availability) act independently.This assumption allows a considerable simplification of the modelling process (e.g.Sykes et al., 1996).A further limitation of niche models as usually applied is that they do not include the modifying effects of changes in CO 2 concentration, even though these are potentially very important (Keenan et al., 2011) and are absolutely required in order to account for the nature of observed, major vegetation changes over glacial-interglacial cycles (Harrison and Prentice, 2003;Prentice and Harrison, 2009;Prentice et al., 2011;Bragg et al., 2013).
In this paper we adopt an approach that we call "processoriented niche specification" (PONS) because it provides a "bridge" between empirical statistical modelling and complex mechanistic modelling.The PONS approach is based on the following underlying assumptions: (a) that the distributions of ecosystems and species can be represented as the consequence of a small number of bioclimatic controls; (b) that these controls can be represented by "bioclimatic variables" derived from climate data; (c) that each control (when correctly represented by a bioclimate variable) acts independently, so that distributions can be represented through the multiplication of functions describing relationships between probability of occurrence and each variable independently; and (d) that these relationships can be adequately represented by either unimodal or monotonic functions.These principles underlie earlier work by Sykes et al. (1996) and a more recent study by Gallego-Sala and Prentice (2013), who modelled the world distribution of the blanket bog biome and its response to climate change based on three independent bioclimatic limits.Here we apply a PONS approach to model the natural and managed vegetation of China, and we demonstrate a novel method by which CO 2 effects can be incorporated into a niche model.The approach is innovative in combining a well-established technique in statistical niche modelling (multiple logistic regression, which is closely related to the popular maximum entropy method; Phillips et al., 2006) with algorithms used to estimate bioclimatic variables for dynamic vegetation and biogeochemistry modelling, and a process-based method to take account of CO 2 concentration effects.By fitting models with linear and quadratic (but not interaction) terms for each predictor, our modelling approach is consistent with Boucher-Lalonde et al. ( 2012), who showed that the probabilities of occurrence of tree species in North America could be predicted with high efficiency from independent Gaussian functions of climate variables.

Study area
China, with the third largest land area of any country, contains almost the entire range of world vegetation types from rainforest to desert and from tropical to alpine vegetation (Fig. 1).Furthermore, due to its large population and rapid economic development, China is a key region where it is important to identify potential risks and opportunities both for the productivity and carbon storage of natural ecosystems and for the production of food in agricultural systems.
Temperature variables generally decrease from south to north in China.Elevation changes interrupt the latitudinal trend, notably the extremely high and cold area of the Tibetan Plateau, and some low-lying and hot areas in the northwest.Moisture supply from the Pacific and Indian oceans gradually declines from the south towards the north and the interior (Fig. 2).The natural vegetation patterns reflect these temperature and moisture gradients (Fig. 1).The diversity of agricultural systems is also closely related to climate.For example, the Loess Plateau (see Supplement Figure for the distribution of geographical regions in China) and some valleys in the interior of western China experience a moisture regime similar to that of northeastern China, but are much warmer in winter, allowing orchards to grow as well as annual crops (Fig. 1).However in north China especially, irrigation is extensive, and it extends the climatic range of temperate crops towards drier regions provided there is a local supply of water for irrigation.

Data
The baseline climatology data were derived from records of mean monthly temperature, precipitation and percent-  Hancock and Hutchinson, 2006).We selected the following 20 predictor variables, calculated as in Prentice et al. (1993) and Gallego-Sala et al. (2011), for an initial exploratory multivariate analysis: -mean temperature of the coldest (MTCO,  growing season, taking 0 • C and 5 • C respectively as the base temperature -total (PAR 0 , mol photon m −2 ; PAR 5 , mol photon m −2 ) and mean (mPAR 0 , mol photon m −2 ; mPAR 5 , mol photon m −2 ) photosynthetically active radiation during the growing season, defined as the period above 0 • C and 5 • C respectively -annual equilibrium evapotranspiration (PET, mm a −1 ), moisture index (MI, dimensionless) defined as the ratio of MAP to PET, and the Cramer-Prentice plant-available moisture index (α, dimensionless) (Prentice et al., 1993) calculated as the ratio of annual actual to equilibrium evapotranspiration.Actual evapotranspiration is calculated by a soil moisture-accounting scheme as the daily integral of the lesser of a demand term (PET) and a supply term which is 1 mm h −1 times current soil moisture (as a fraction of available water-holding capacity).The calculation is based on monthly climatological data, interpolated to a daily time step, and repeated over a sequence of years until the annual cycle of soil moisture converges.
-Three additional dimensionless variables were selected to reflect the seasonal concentration (seasonality) and the sine (Timing sin ) and cosine (Timing cos ) of the timing of maximum precipitation (Harrison et al., 2003).
-We also included two variables (NPP LUE and NPP WUE , in g C m −2 a −1 ), which are estimates of annual potential net primary production (NPP) based on the light-and water-use efficiency models (Wang et al., 2012), fitted independently to an extensive forest production data set (Luo, 1996).
The digitized vegetation map of China at the scale of 1 : 1 million was used to obtain vegetation data.Fifty-five plant communities (48 natural plant communities and seven cropping systems) were aggregated into 16 vegetation classes (Table 1) based on their bioclimatic context by k-means clustering (Hartigan and Wong, 1979) of the 20 predictor variables.We performed the clustering with k set at different values, and selected k = 12 as a compromise between detail and tractability.The k-means method, in common with most clustering algorithms, tends to avoid identifying classes with very few members.Some further adjustments were accordingly made.A few vegetation classes with anomalous climatic distributions within each of the 12 machine-identified clusters were separated.The temperate desert communities from two k-means clusters were merged together as one vegetation class."Temperate needleleaf forest" was identified as a single-type vegetation class due to its unique bioclimatic space (low to median energy, medium to high moisture).(Names of vegetation types here refer to the English-language legend of the vegetation map.) "Temperate microphyllous deciduous woodland" and "temperate grassforb community" were all considered to be temperate xeric vegetation, with quite similar bioclimatic spaces, and were grouped together.Three cultivation systems ("one crop annually and cold-resistant economic crops"; "one crop annually, cold-resistant economic and deciduous orchards"; and "three crops two years and two crops annually non-irrigation, deciduous orchards") in north China were separated from the clusters to which they had been assigned, as the climatic moisture range occupied by these types (due to irrigation) exceeds that of otherwise bioclimatically similar natural vegetation types.Except for "one crop annually short growing period cold-resistant crops", with a similar climatic space to the temperate deciduous complex, all the other cultivation systems are typical of south China and share the same bioclimatic space as the natural plant communities there.These cultivation types were therefore kept within their machineidentified vegetation classes (tropical monsoon forest complex or subtropical forest complex).Accurate fractional areas of each class were extracted in ArcGIS from the digitized vegetation map at 0.1 • grid resolution.The bioclimatic data were up-scaled from the original 0.01 • grid to the same 0.1 • grid by simple averaging.

Data analysis
Canonical correspondence analysis (CCA; Ter Braak and Prentice, 1998) was carried out based on 94 510 records for the 16 vegetation classes and (as predictors) the 20 predictor variables, for an initial exploration of the relationships between climate and vegetation.Non-vegetation grid cells (the white area in Fig. 1), such as glaciers, bare ground, and lakes, were excluded.The Akaike information criterion was used to select the three most important among the bioclimatic variables for further analysis.
Generalized linear modelling (GLM) was applied to each vegetation class separately, using the logit link function and assuming a binomial distribution of the class frequencies.This analysis is equivalent to multiple logistic regression.The input data were the frequency of the class, and values of the three selected bioclimatic predictors, at the grid cells.In each case, we fitted an initial model including linear and quadratic terms for each bioclimatic variable.In some cases, one or more terms were excluded after this initial step, and a new model fitted.Three criteria determined the exclusion of particular terms: (a) Terms whose inclusion led to unrealistic partial relationships.This situation was sometimes encountered due to high correlation between mGDD 5 and GDD 0 in both the coldest and the warmest range of climates.The inclusion of both predictors resulted in sparse alpine vegetation, alpine tundra and steppe apparently responding positively to GDD 0 , although they are more abundant in colder climates.We therefore rejected GDD 0 as a predictor in these cases.Conversely, mGDD 5 was rejected as a predictor for tropical forests, since the GLMpredicted probability curve against mGDD 5 peaked at a very low value compared with the real distribution of tropical forests.
(b) In a few cases, inclusion of quadratic terms resulted in a U-shaped (rather than Gaussian) fitted distribution to a particular variable.This result led to the rejection of mGDD 5 as a predictor for temperate desert.Sparse alpine vegetation, boreal and subalpine forest and shrubland also showed initial fitted U-shaped responses to α.In these cases just the quadratic term was rejected, leading to a realistic (sigmoid) response to α.
(c) Terms for which the coefficients were not significant at P < 0.05 were rejected.These cases were few because of the very large sample size.
With the estimated regression coefficients and intercept from the final fitted GLM, a predictive model is obtained for each vegetation class.This can be written in a simple generic form: , where P i is the GLM-predicted probability for vegetation class i and a, b, c, d are parameters specific for each vegetation class and for each of three predictor variables: α, mGDD 5 and GDD 0 .The parameter values for the finally accepted models are listed in Table 2.
A simple linear calibration was used to relate fitted probabilities optimally to observed frequencies, as follows.For each vegetation type, we performed the following ordinary linear regression of the GLM-predicted class probabilities (P i ) on the observed class frequencies (f 0 i ) with m 1 and m 2 as the regression parameters.
This regression relationship was then inverted as Eq. ( 3) to obtain the weighting factors (l i 1 and l i 2 , Table 3) to be applied to the GLM-predicted probabilities (P i ) and the calibrated predicted probabilities (f * i ), which were finally used in generating the predicted vegetation distribution map (Fig. 1): Negative values arising from this step were set to zero.The predicted vegetation class at each grid cell is then the one with highest predicted probability after weighting, i.e. the one with highest f * i .We also predicted the potential natural vegetation class at each grid cell by applying the same criterion but excluding agricultural classes from consideration.
There is a strong correlation (and therefore statistical confounding) between winter and summer temperatures in China, particularly in the eastern forest belts where both gradients run from north to south.We added one additional constraint to the model, namely that the MTCO for tropical a and b distinguish predictor terms excluded due to lack of realism in the fitted model and statistical significance, respectively.See text for further explanation of these criteria.
rainforest complex must exceed 12 • C. The calibrated predicted probability f * i for this vegetation class was reset to zero whenever this constraint was not met.The constraint has no effect on the predicted present distributions but was introduced in order to eliminate the unrealistic prediction of tropical rainforest in dry inland areas under some climate change scenarios.The constraint is consistent with the known requirement of tropical trees for warm winters.

Assessing goodness of fit
The kappa statistic (Cohen, 1960;Prentice et al., 1992) was used to quantify the similarity between the predicted and observed vegetation maps.Kappa is a suitable measure to compare two maps where the variable mapped is a multi-class, qualitative variable.Kappa ranges from zero to one.One means perfect agreement; zero means agreement that is no better than would be expected by chance, i.e. by random assignment of classes to grid cells.

Inclusion of a CO 2 effect
[CO 2 ] does not vary significantly in space, and cannot therefore be used as a predictor in the development of empirically based models for vegetation and productivity.As a measure of the effect of increased [CO 2 ] on vegetation distribution, we estimated the increase in α that would produce the same gain in NPP, according to a process-oriented light-use efficiency (LUE) model that has been fitted independently to an extensive forest production data set (Wang et al., 2012).This approach is based on the assumption that the major effect of elevated [CO 2 ] on vegetation distribution is to enhance water use efficiency (Keenan et al., 2013), which is equivalent to increasing water availability.To estimate this equivalence, we fitted a multiple regression of the logarithm of forest NPP (data as in Wang et al., 2012) against α, mGDD 5 and GDD 0 .The log transformation implies for the underlying model that the effects of climate variables on NPP are multiplicative.Then the contribution of α (dimensionless) to NPP (g C m −2 a −1 ) can be expressed as Here, a is the fitted partial regression coefficient for α, with a = 1.15, and b represents all other effects from intercept, mGDD 5 and GDD 0 together, which will then be eliminated in deriving "effective" α at elevated [CO 2 ]. Figure 3 shows the relationship between ln NPP and α.
The LUE model of Wang et al. (2012) is used to estimate the effect of elevated [CO 2 ] on NPP, indicated by the ratio (x) of NPP (g C m −2 a −1 ) at elevated (NPP ) and reference (NPP 0 ) [CO 2 ].According to this model, the ratio is dependent on the leaf-internal [CO 2 ] at reference [CO 2 ] (c i , ppm), the leaf-internal [CO 2 ] at elevated [CO 2 ] (c i , ppm) and the CO 2 compensation point ( , ppm), which is temperaturedependent (Bernacchi et al., 2003): (5) The "effective" value of α at elevated [CO 2 ] (α ) is then given by

Sensitivity analysis
Sensitivity analysis was carried out to investigate the response of the predicted vegetation pattern to the separate and combined effects of a large increase in temperature, increase or decrease in precipitation and increase of [CO 2 ], applied uniformly across the region.For comparability with previously published results using the global BIOME4 model (Wang et al., 2011), we applied the same environmental changes as in that paper.Thus the mean temperature of each month was increased by 5 K, the mean precipitation of each month was increased or decreased by 30 %, and [CO 2 ] was doubled from a reference value of 376 to 732 ppm.These changes were applied separately and in combination.

Data analysis
The triangular pattern illustrated by the CCA biplot summarizes the climate-vegetation relationship in China (Fig. 4a).
The vertex of the triangle to the right of the biplot represents the extreme of high temperature, rainfall and productivity in south China.The other two vertices represent the dry and cold extremes, respectively, exemplified by the interior deserts (upper left) and the high elevations of the Tibetan Plateau (lower left).Energy-related variables (mGDD 5 , MTWA, mGDD 0 , PAR 5 , PAR 0 , PET, GDD 0 , GDD 5 , MAT) and primary production (NPP LUE and NPP WUE ) tend to align together, pointing towards the highproductivity vertex.Moisture-related variables (MI, α, MAP) point in a direction opposite to the dry vertex.The shape of this diagram indicates the fundamental trade-off between high annual productivity (associated with climates that are both warm and wet) and tolerance of dry or cold conditions, i.e. both dry and cold conditions are incompatible with high productivity.The two precipitation timing variables point towards the high-productivity vertex, consistent with the fact that both cold and dry vegetation classes are associated with strong summer or autumn rainfall maxima.Precipitation seasonal concentration points away from the high-productivity vertex, consistent with the negative effect of a prolonged dry season on total annual productivity.As the length of the thermal growing season declines towards cold climates, both mPAR 0 and mPAR 5 increase (because the growing season is increasingly restricted to the summer, when solar radiation is at a maximum).Therefore, the direction of mPAR 0 is almost opposite to MTCO.That the direction of mPAR 5 is somewhat different to that of mPAR 0 is consistent with the fact that the cold-resistant vegetation types tend to have cloudy conditions when temperatures exceed 5 • C, whereas xeric vegetation types tend to have more sunny conditions.The three variables GDD 0 , mGDD 5 and α collectively can predict almost exactly the same pattern as in Fig. 4a (see Fig. 4b).GDD 0 expresses the direction towards highproductivity vegetation; α expresses the direction away from dry conditions; mGDD 5 expresses the direction towards cold conditions.

Model testing
Table 2 summarizes the GLMs fitted to the frequencies of each vegetation class as a function of these three variables.These models were used with the calibration procedure (see Table 3 for the calibration parameters for each vegetation class) to generate maps of the predicted geographic distribution of each class with the highest calibrated predicted probability (f * i ).Comparison of predicted and observed vegetation distributions (Fig. 1) yielded a kappa value of 0.745, judged as "substantial" agreement according to the criteria of Cohen (1960) and "good" (but very close to "excellent") agreement according to Monserud (1990).
The predicted vegetation map successfully captures the distributions and boundaries of most vegetation types in China.The distribution of the subtropical forest complex (the most widely distributed forest class in China) is slightly overestimated, extending into areas occupied by tropical rainforest along the south coast and northward into the temperate crop region in the north.The tropical monsoon forest com-plex along the south coast and the tropical rainforest complex in the lowlands to the southeast of the Tibetan Plateau and on Hainan Island are successfully predicted as separate classes.The temperate deciduous forest complex and temperate steppe are successfully predicted as the two major natural vegetation classes in north China, separated according to dryness.Temperate desert is correctly predicted as the most extensive vegetation class in northwestern China.Alpine tundra and steppe are correctly shown as occupying a large part of the Tibetan Plateau, transitioning to sparse alpine vegetation in the north and to boreal and subalpine forest and shrubland in the east; boreal and subalpine forest and shrubland are also distributed in the high-elevation area of northeastern China.
The primary agricultural systems in north China are also predicted well with temperate crops dominant across the North China Plain, cold-resistant crops and deciduous orchard on the Loess Plateau, and cold-resistant crops in a large area of northeastern China.The scattering of croplands at the foot of mountains in Xinjiang Province are also captured.Temperate crops are also (incorrectly) predicted in the Yungui Plateau of southwest China.The real vegetation there is the predicted potential vegetation, i.e. the subtropical forest complex; see upper second panel in Fig. 5.Other crop systems that are included in the natural vegetation classes (temperate deciduous forest complex, subtropical forest complex and tropical monsoon forest complex) are implicitly predicted within the distribution of these natural vegetation classes.
Predictions of potential natural vegetation in areas currently dominated by crops are as follows (Fig. 5): temperate woodland and dry grassland, temperate steppe, temperate deciduous forest complex and temperate woodland in the temperate zone, and subtropical forest complex in the subtropical zone.Temperate woodland and dry grassland, as a transition between temperate steppe and temperate deciduous forest complex, is predicted as the potential vegetation with continuous and extensive distribution in the temperate zone.

Sensitivity analysis
Results of the sensitivity experiments are shown in Fig. 5. Increasing temperature by 5 K shifts the predicted boundaries of most vegetation types northward and westward.Tropical monsoon rainforest is predicted to occupy a large area in south China and a small area in the Sichuan Basin.Across the greater part of the Tibetan Plateau, alpine tundra and steppe are predicted to be replaced by temperate steppe in the south and in the north, and the temperate deciduous forest complex, boreal and subalpine forest and shrubland in the east.In this scenario, cold-resistant crops and deciduous orchards become suitable for planting along the river valleys in the southern part of the plateau.
Changes in precipitation by ±30 % shift the boundaries of mesic versus xeric vegetation with respect to a northeast-southwest axis where the current precipitation is at intermediate levels.In the southern (high-elevation) part of this axis, increased precipitation causes alpine tundra and steppe to give way to boreal and subalpine forest while spreading slightly northward into the area currently occupied by alpine sparse vegetation.In the northern part of this axis, increased precipitation benefits the mesic vegetation (boreal and subalpine forest and shrubland, the temperate deciduous forest complex, temperate woodland and dry grassland) at the expense of xeric vegetation (temperate steppe, temperate desert).The effects of decreasing precipitation are broadly opposite to the effects of increasing precipitation.Increased precipitation also leads to more heterogeneous vegetation on the Loess Plateau, with temperate woodland and dry grassland, temperate needleleaf forest and a temperate deciduous forest mosaic all widely distributed.In the rest of China, where precipitation is either very low or high, the vegetation distribution responds to precipitation changes much less strongly.The boundaries of tropical forest and temperate desert remain almost the same whether precipitation is increased or decreased, while the northern boundary of the subtropical forest complex shifts slightly to the north or to the south.
[CO 2 ] doubling is estimated to have effects similar to those of increasing precipitation by 30 %, favouring more woody vegetation in the forest-grassland transition region (Fig. 5).In the temperate zone, [CO 2 ] doubling produces a shift from temperate steppe to temperate woodland and dry grassland, and to temperate deciduous forest complex.On the Tibetan Plateau, [CO 2 ] doubling favours boreal and subalpine forest and shrubland over alpine tundra and steppe.

Comparison with previously published results
The prediction for current vegetation distribution is broadly consistent with those made previously with the global BIOME4 model.However, apart from slightly overestimating the extent of the subtropical forest complex (called warmtemperate forest in BIOME4), the new empirical model captures some vegetation boundaries more accurately than the BIOME4 simulation.We successfully distinguished sparse alpine vegetation (tundra in BIOME4) from alpine tundra and steppe (dry tundra in BIOME4) on the Tibetan Plateau.The distributions of these classes are unrealistically simulated by BIOME4.The empirical model also more successfully captures the boundary between temperate steppe (grassland and dry shrubland in BIOME4) and the temperate deciduous forest complex (temperate forest in BIOME4).
BIOME4 overestimates the distribution area of temperate steppe at the expense of temperate forest.By refining the categories of forest vegetation, the empirical model also provides more detailed information about the most economically productive semi-natural vegetation in China.Most importantly, the ability of the empirical model to predict cropland distributions points towards a way to assess the changing suitability of agricultural systems under scenarios of changes in climate and [CO 2 ].
As also shown by BIOME4, the transition region between mesic and xeric vegetation along a northeast-southwest axis is the region most likely to experience major vegetation changes as a result of climate change.Warming is generally expected to favour woody vegetation there.The effects depend on the trajectory of precipitation changes; without an increase in precipitation, some regions suffer drought and thus a decline in forests.However the positive effect seems much stronger than the negative one, since water limitation is not severe in the majority of this region.On the Tibetan Plateau, where energy is the key limitation for woody plants, boreal and subalpine forest and shrubland are predicted to expand in response to warming, causing alpine tundra and steppe to retreat westward.In the central part, temperate woodland and dry grassland tend to expand at the expense of temperate steppe, while the subtropical forest complex expands at the expense of temperate woodland.In semi-arid regions in the north, warming benefits temperate woodland and dry grassland at the expense of temperate steppe, but also temperate steppe expands into the current distribution area of the temperate deciduous forest complex, due to drought.Changes in plant water availability, either directly due to changes in precipitation or indirectly due to [CO 2 ]-induced changes in water use efficiency, are predicted to modify the effects of warming by influencing the competition between woody and herbaceous plants.
The Tibetan Plateau is identified both here and in the earlier study with BIOME4 as a region liable to experience large changes in vegetation as a result of climate change.Both models predict that warming will cause alpine vegetation to retreat to the colder and drier areas toward the north.But BIOME4 suggests that alpine vegetation will be replaced mainly by forests, while the empirical model suggests that it will be replaced by temperate steppe.The vegetation around the eastern edge of the plateau was predicted by BIOME4 as being quite resistant to precipitation changes, but our new empirical model identifies this region as particularly sensitive to precipitation changes.
The present study indicates that the northern boundary of the tropical monsoon forest complex would move northward by as much as 4 • of latitude and even emerge in the Sichuan Basin.BIOME4 did not show this movement, probably because of the strong minimum temperature constraint applied to tropical forests in that model.The northward movement predicted here is probably more realistic than the stasis shown by BIOME4.
The empirical model makes no prediction about the vegetation on the North China Plain when temperature increase is combined with precipitation decrease under recent [CO 2 ].This is the most severe scenario for mesic vegetation.The region in question was identified as a sensitive area by BIOME4, which predicted a transition to grassland and dry shrubland.The empirical model makes no prediction because the climate under this scenario is outside the range that the empirical models could predict based on current observations.The actual vegetation is temperate croplands, which are adapted to the climate with the help of irrigation.However, the North China Plain is one of the most water-scarce regions in the world (having less than half the water availability per person than water-scarce Egypt, in relation to its population).The observed warmer and drier climate over the last four decades, combined with increasing water requirements both for industrial production and in daily life, has already exerted considerable pressure on irrigation systems.Most climate models suggest that precipitation should increase in this region, but this is not certain and not predicted by all models (Cruz et al., 2007).Thus irrigated agriculture in the North China Plain represents a potential area of vulnerability to changes in climate and water supply.

The climate sensitivity of agriculture
By including crops in our analysis, we could investigate climate-dependent shifts in agricultural zones.Generally, warming is projected to shift agricultural zones significantly northward.This is directly demonstrated for the major cropping systems in north China, and implicitly indicated by the projected shift of natural vegetation types that include some agricultural types.These findings are consistent with data from the Chinese National Bureau of Statistics (2009) showing that, during the period from the early 1980s to 2007, warming enabled a significant northward expansion of rice planting in the northernmost region, between 48 and 52 • N. By 2007 the planted area of winter wheat in northern China moved northward by nearly 100 km compared with the 1960s.Paddy rice, the new dominant crop in the original corn-planting areas, has greatly expanded in Heilongjiang Province (Li and Wang, 2010).However, our analysis also points to the strong dependence of the cropland area in north China on precipitation, and the importance of irrigation requirements that could be limited by water supply.Unlike rainfed crops, decreased runoff with climate change could become an important limitation for the shift of irrigated crops in some areas.
Agricultural systems that were included in natural vegetation classes showed relatively minor responses to precipitation change, with one exception: "one crop annually short growing period cold-resistant crops", which was included in the temperate deciduous forest complex.The effects of [CO 2 ] on crop distribution cannot realistically be assessed in this framework, however, because the main mechanism would be via the total increase of C 3 crop productivity due to CO 2 fertilization as well as water saving, rather than the effects of enhanced water use efficiency mediated by competition.
Topography could limit agricultural expansion.Although crops are predicted by the model in the Yungui Plateau, and their area of suitability predicted to expand, the actual vegetation there today is forest.This area is well known for its karst topography which makes agriculture difficult.

Regional variations in the response to uniform perturbations of climate and [CO 2 ]
According to projected climate trends summarized in Cruz et al. (2007), warming over all of China is likely to be greater during winter than summer, and the warming is likely to be especially strong on the Tibetan Plateau and in semi-arid regions.Mean precipitation will likely increase in most of China but decrease in some western areas.Thus, woody vegetation will be favoured by future climate, as well as by elevated [CO 2 ].In other words, forests with their important functions in carbon sequestration, water retention and high biodiversity will likely continue to be the predominant natural vegetation cover in a large area of China, and the area suitable for forests is likely to expand into regions currently occupied by grasslands.The Tibetan Plateau and some inland desert areas are projected to experience large vegetation changes.

Caveats
The prediction skill of our empirical model for the present climate is inevitably highly reliant on the accuracy of the gridded climate data used to develop and run the model.One problematic region is the lower-elevation area to the southeast of the Tibetan Plateau (Zangnan area, a disputed territory between China and India) where the gridded climate data are not well constrained by observations.This data problem is probably the cause of the model's underestimation of tropical rainforest in this region, and means that the projection of climate change effects here should be treated with scepticism.
Here, we have shown results of climate change in the form of stylized sensitivity experiments, rather than plausible scenarios of the future.The idea was to understand how a uniform perturbation of climate would affect vegetation patterns.It remains to be seen how realistic climate change scenarios, derived from climate models, translate into projected effects on ecosystems in China.Realistic scenarios include additional regional variations -in the climate changes themselves -and in particular, the whole region is not predicted to get uniformly drier or wetter; but rather some areas are predicted to get drier and some wetter.Only [CO 2 ] is expected to change uniformly across the country.Another study has applied future climate change projections derived from a set of seven global climate model outputs, to assess likely directions of change in vegetation distribution and productivity during the 2070s (Wang 2013).
Due to the equilibrium assumption, the approach we applied here entails some unavoidable limitations.The fitted empirical models do not predict when changes in vegetation are likely to happen.The predicted responses of vegetation distributions to climate could be achieved in reality only some decades to centuries after the new climate state has been established.Also, since we focused on the primary controls of mean climatic conditions on large-scale patterns of vegetation distribution, processes related to vegetation succession and migration, such as fire disturbances and dispersal constraints, are not modelled.But despite these limitations, the method described here makes good use of extensive observational data sets applying to the specific region of interest.The results should therefore be a reliable guide to the general direction and magnitude of changes to be expected in the region in response to the prescribed scenarios of change in [CO 2 ] and climate.

Fig. 1 .
Fig. 1.The observed (upper panel) and predicted (lower panel) vegetation distribution in China.The plant communities from the Vegetation Atlas of China at a scale of 1 : 1 million (Hou, 2001) were aggregated into 16 vegetation classes based on their bioclimatic context.

Fig. 2 .
Fig. 2. The distribution pattern of the three bioclimatic predictors used to construct models for each vegetation class: annual accumulated temperature above 0 • C (GDD 0 ), mean temperature of the period above 5 • C (mGDD 5 ), and the Cramer-Prentice index of plantavailable moisture (α).

Fig. 3 .
Fig. 3. Partial residual plot (Breheny and Burchett, 2013) of observed NPP (natural log scale) against the Cramer-Prentice α index of plant-available moisture.The partial regression line with confidence band as shown was based on a multiple regression of ln NPP against α, GDD 0 and mGDD 5 .

Fig. 4 .
Fig. 4. Biplot of plant communities and environmental variables with all sample sites (grey dots) from canonical correspondence analysis using (a) 20 predictor variables, and (b) a subset of 3 predictor variables selected using the Akaike information criterion.See text for the abbreviations of environmental variables, and Table 1 for the code numbers of plant communities.Vegetation classes are distinguished with colours as in the legend of Fig. 1.

Fig. 5 .
Fig. 5. Changes in predicted vegetation distribution at projected scenarios, with a reference distribution at the baseline scenario (left panels for both natural and agriculture vegetation types, right panels only for natural vegetation type).

Table 1 .
Allocation of observed plant communities from the Vegetation Atlas of China at a scale of 1 : 1 million (Hou, 2001) to vegetation classes, based on k-means clustering with modifications as described in the text.

Table 2 .
Parameters used in Eq. (1) for each vegetation class.All parameters are statistically significant at p < 0.001 except b 1 and b 2 for subtropical montane forest significant at p < 0.05.

Table 3 .
Parameters used in Eq. (3) for each vegetation class.