Interactive comment on “ Optimising the FAMOUS climate model : inclusion of global carbon cycling ”

Using a perturbed physics ensemble technique the authors try to optimize the parametrization of the terrestrial and marine carbon cycle components of the model FAMOUS. For the terrestrial component they use both modern observations and past reconstruction to optimize the parameters, while they use only modern observations for the marine carbon cycle. They also present the performances of the last version of the FAMOUS model.


Model description and motivation
The climate model used in this work is FAMOUS (Jones et al., 2005;Smith et al., 2008), which is a lower resolution version of the HadCM3 climate model (Pope et al., 2000;Gordon et al., 2000).The atmospheric component of FAMOUS has a resolution of 5 • × 7.5 • (compared to the 2.5 • × 3.75 • of HadCM3) and has 11 vertical levels, a significant reduction compared to the 19 in HadCM3.The ocean has twice the resolution of the atmosphere (i.e.2.5 • × 3.75 • ) and 20 vertical levels.HadCM3's ocean resolution is 1.25 • × 1.25 • and also has 20 vertical levels.The atmospheric time step for FAMOUS is 1 h, twice that of HadCM3, whereas the time step in the ocean is 12 h, compared to just 1 h for HadCM3.The reduction in model resolution and increase in model time steps means that FAMOUS runs approximately 10 times faster than its parent model.For example, a 1000 yr, coupled atmosphere-ocean simulation with HadCM3 takes approximately 100 days on 8 processors and generates 1 Tb of model data.An equivalent FAMOUS simulation runs in one tenth of the time and produces one quarter of the amount of output data, due to the lower spatial resolution and longer time steps in the atmosphere and ocean.
While FAMOUS does not match some of the process fidelity or high resolution of current GCMs, e.g.Collins et al. (2011), it has been developed explicitly as a coarse resolution model.Despite belonging to an older generation of climate models, the atmosphere-ocean component (which is common to FAMOUS and HadCM3) still performs in the best handful of models on larger scale climate measures (Reichler and Kim, 2008;Nishii et al., 2012).
Published by Copernicus Publications on behalf of the European Geosciences Union.
All previously published versions of FAMOUS have used the MOSES (Met Office Surface Exchange Scheme) 1 land surface model (Cox et al., 1999).However MOSES 1 does not include carbon cycle processes or interactive vegetation, which are both important elements of a comprehensive Earth System model.In order to include these features, the newer MOSES 2.2 model (Essery et al., 2003) has been incorporated into FAMOUS.MOSES2.2 describes the fluxes of CO 2 , water, heat and momentum at the interface between the land and the atmospheric boundary layer, and is capable of hosting a number of sub-gridscale tiles in each grid box, allowing a degree of heterogeneity in surface characteristics to be modelled.
MOSES 2.2 can function in two modes, either calculating surface exchange fluxes for each surface type individually and then averaging them into a grid-box mean for the atmosphere model, or by aggregating the characteristics of the different surface types together before calculating a single, common exchange flux for the grid box.The latter mode is used in this work, as it has been found to produce better results in early tests of MOSES2.2 in FAMOUS.It is possible to run MOSES2.2using static or dynamic vegetation, the latter using the TRIFFID dynamic vegetation model (Cox, 2001).
Subgrid land surface processes present in the simulations presented here are due to five different plant functional types (PFTs) as represented by the TRIFFID dynamic vegetation model; broadleaf trees (BT), needleleaf trees (NT), C3 and C4 vegetation and shrubs.In addition to these PFTs, MOSES 2.2 also calculates fluxes due to four non-vegetation surface types; urban environments, inland water, bare soil and land ice (which is constrained to a grid-box coverage fraction of either 0.0 or 1.0 only).Future research with this configuration of FAMOUS will, in part, aim to examine climates of the past where human intervention in the structure of the land surface was negligible or zero.Therefore, the urban fraction is set to zero throughout this work.
The TRIFFID model dynamically updates the five PFT distributions (and soil carbon content) using a combination of "carbon balance" and inter-PFT competition (Cox, 2001).The carbon balance is itself derived from the MOSES surface exchange scheme (see above).Other PFT-dependent limiting factors affecting plant growth within the TRIFFID model include the presence or absence of light (and its subsequent effect on photosynthesis) and photosynthetic enzymes (Cox, 2001).The combination of the "2.2" version of MOSES with TRIFFID had not been used before in FAMOUS and this was the reason for the perturbed physics ensemble presented here.
In addition to land surface processes, the ocean carbon cycle is also simulated within the model.This sub-model is known as HadOCC, the Hadley Centre Ocean Carbon Cycle model (Palmer and Totterdell, 2001).HadOCC is an "ecosystem model" due to its explicit inclusion of phytoplankton and zooplankton populations.Phytoplankton productivity in the model is limited by the availability of nitrate (the only nutrient simulated) and light and therefore primary production below the photic zone of the ocean is greatly reduced.In addition to plankton, total CO 2 , alkalinity and detrital material densities are calculated through a system of coupled differential equations describing, for example, zooplankton grazing and detrital sinking due to gravity.The interested reader is referred to the Appendix of Palmer and Totterdell (2001) for a full description of the equation system.The carbon cycle model is fully coupled to the physical ocean model and hence all compartments are subject to advective and diffusive transport.It is important to note that HadOCC is purely biological in nature, meaning that nitrate (strictly nitrate plus ammonium) is not lost or gained due to sedimentation or due to addition from rivers for example.The flux of carbon through the NPZD (nutrient-phytoplanktonzooplankton-detritus) model is coupled to the prognostic flux of nitrogen through constant C : N, "Redfield", ratios (Palmer and Totterdell, 2001;Redfield, 1958).
Climate models contain many adjustable parameters, each with an associated uncertainty.This uncertainty comes, for some parameters, from the inability to measure the value of an observable to arbitrary accuracy.For example N L0 -the ratio of nitrogen to carbon in a leaf, a model constant representative of a given plant functional type (e.g.shrubs) -is a measurable quantity at the plant leaf scale.The uncertainty associated with this parameter comes from upscaling site measurements to a global quantity.There is also some uncertainty from structural parameters in model parameterisations, which do not have a directly observable equivalent in the real world.For example, LAI min is a competition parameter which controls how plants will expand.This is not a directly observable quantity, instead the plausible uncertainty ranges are established largely from insight from the model developers based on how variations in this parameter influence properties of the simulations that are observable, such as forest extent.Previous versions of FAMOUS have had their parameters tuned through different procedures (Jones et al., 2005;Smith et al., 2008;Gregoire et al., 2010), but the combination of a complex land surface scheme coupled to dynamic vegetation and an ocean carbon cycle has not been used before in FAMOUS.The computational efficiency of FAMOUS provides an opportunity to explore relationships between parameters and model response and hence identify the set of structural parameters in this new model which give the highest fidelity output when compared to appropriate observations.To this end, building on the tuning of atmosphere and ocean parameters by Gregoire et al. (2010), two 100 member perturbed physics ensembles were performed: one for the land surface and one for the ocean carbon cycle variables.The full coupling of the terrestrial and ocean carbon cycles is ongoing and will be described in a forthcoming paper.
For both the land surface and the ocean perturbed physics ensembles, the set up of the control run was the same.Constant, preindustrial levels of CO 2 in the atmosphere (290 ppmv) were used.For all simulations using dynamic Table 1.List of parameters used in the land surface carbon cycle perturbed physics ensemble.The values of the minimum leaf area index (LAI) for C3, C4 and shrubs are not varied in this work and hence only one value is given.The three different parameters used are (1) the minimum value used in the Latin hypercube sampling scheme (2) the "standard" value used in the simulation framework before parameter perturbation and (3) the maximum value.Note that the ranges used in this work are the same as in Booth et al. (2012) 1.5, 2, 3.5 1.5, 2, 3.5 1.5, 2, 3.5 1.5, 2, 3.5 1.5, 2, 3.5 V crit,α 0, 0.5, 1 0, 0.5, 1 0, 0.5, 1 0, 0.5, 1 0, 0.5, 1 T upp 31, 36, 41 125, 0.25, 0.375 0.125, 0.25, 0.375 0.125, 0.25, 0.375 0.125, 0.25, 0.375 0.125, 0.25, 0.375 vegetation in this study, an accelerated mode was used, which enables more rapid convergence of the final distribution of PFTs under constant forcing conditions.This works by coupling the vegetation scheme to the surface exchange scheme only every 5 yr (although this time period can be altered if desired) and thereby exchanging the carbon flux output during that time with the vegetation.After each iteration of this coupling, the dynamic vegetation model is then run asynchronously using a large time step of 100 yr.This enables equilibrated states of even the slowest responding variables to be approached more rapidly.More information on the technical details of this coupling can be found in Cox (2001).
For the vegetation distribution, the control and all ensemble members were initialised at 1860 values and each ensemble member was run for 200 yr of physical-climate time; this was found to be more than sufficient for equilibrium to be reached, particularly bearing in mind that the dynamic vegetation model is run in an accelerated fashion.
For the ocean ensemble, a run length of 200 yr was also found to be sufficient for the variables of interest to equilibrate, even when the ocean tracers were initialised with constant values throughout the ocean.It should be noted that there is no equivalent accelerated mode for the ocean carbon cycle as is used for the land surface.To bring the deep ocean into thermal and carbon equilibrium with the surface would take several thousand years and so it is unfeasible to run a 100 member ensemble where each member is run for this long.The ocean ensemble is validated using near-surface observations (5 m depth) where equilibrium is easily reached in 200 yr.Climatologies were constructed for the last 30 yr of each ensemble member for both the land surface and ocean ensembles.

Perturbed parameters -land surface
The number of structural parameters present in this version of FAMOUS is large and since the main departure from previous versions concerns the carbon cycle (both on land and in the ocean) it was deemed appropriate to find an optimum set of parameters which best reflect the present-day status of the biosphere.
Previous work (Booth et al., 2012) used the Latin hypercube method (e.g.Gregoire et al., 2010) to efficiently sample parameter space within bounds reflecting the uncertainty with which these model parameters are known.Booth et al. were then able to demonstrate that uncertainties in the values of carbon cycle parameters can give rise to significant uncertainty in projections of future climate.The present study also uses the Latin hypercube method to vary the same parameters as Booth et al. (2012) over the same ranges of values (with the addition of R grow ) which are described in Table 1.Note that the values for all the plant functional types (PFTs) are co-varied, i.e. if the value of certain parameter for broadleaf trees is doubled, the equivalent parameters for the 4 other PFTs will also be doubled, as in Booth et al. (2012).
The parameters in Table 1 are now described in detail.
-N L0 -The "top leaf nitrogen concentration".This is defined as the amount of nitrogen per amount of carbon and has the units (kg N)(kg C) −1 (Cox et al., 1999).
f 0 -The ratio of CO 2 concentrations inside and outside leaves at zero humidity deficit (Cox et al., 1999).
-LAI min -Any PFT must achieve this value of the leaf area index before it starts to contend with other PFTs for growing area (Cox, 2001).
-Q 10 -This parameter describes how the respiration rate of soil varies with temperature.This is done using a power law multiplier, the exponent of which rises by 1.0 when the temperature rises by 10 • C (Cox et al., 1999).
-The "KAPS" parameter, which describes the specific rate of soil respiration at 25 • C and at optimal soil moisture, is co-varied with Q 10 to maintain respiration at this temperature at the standard model rate.
-V crit,α -This is a new parameter which has been integrated into the model code and is defined by where, V crit , V sat and V wilt are "by volume" soil moisture concentrations (m 3 of water per m 3 of soil).Below V wilt , leaf stomata close; V sat is the soil moisture amount at the point of saturation and V crit is the amount above which PFTs are not water limited.The fact that V crit,α varies between zero and one means that V crit varies between V wilt and V sat (Cox et al., 1999).
-T upp -This is one of two parameters which affect how photosynthesis varies with temperature (Cox et al., 2000), the other being T low .As can be seen from Table 1, there is actually only one free parameter for T upp , because the values for NT, C3, C4 and shrubs are also co-varied.In addition, the values of T low are as follows: T low,BT = T upp,BT − 36, T low,NT = T upp,BT − 41, T low,C3 = T upp,BT − 36, T low,C4 = T upp,BT − 23, and T low,shrub = T upp,BT − 36.
- Booth et al. (2012) present a variable transformation and define T opt = T upp − 4.0 here.This is because T opt is more directly observable.The full definitions of T upp and T low are retained here for completeness and to aid the understanding of the model user.
-R grow -The "growth respiration fraction".The total respiration, R p , of plants can be divided into those amounts required for the maintenance, R pm , and growth, R pg , of the plant, where R pg is defined as R pg = R grow G − R pm , and G is the "gross canopy photosynthesis" (Cox et al., 1999).A corollary of this set of definitions is that R pg is also equal to one third of the net primary productivity, = G − R p .More information on the precise definition of these parameters can be found in Cox et al. (1999).
Previous work by Gregoire et al. also used an ensemble approach to identify optimal configurations of FAMOUS with respect to atmosphere and ocean parameters which are known to have a significant effect on the climatology (Gregoire et al., 2010;Jones et al., 2005;Murphy et al., 2004).It was therefore desirable that the results of this earlier work were incorporated into the present optimisation framework and, to this end, the ten highest scoring models from Gregoire et al. (2010) were sampled using a further "state parameter", β.The incorporation of this extra parameter means that it is not just the carbon cycle's uncertainties which are being perturbed in the ensemble but also those of the physical atmosphere and ocean which have previously been shown have a significant impact on model climate (Jones et al., 2005).The fact that only the 10 highest scoring models from Gregoire et al. are chosen for examination here means that it is only the more plausible combinations of values of the physical parameters which are sampled.
The state parameter, β, was varied continuously between 0 and 1 using the same Latin hypercube sampling technique as for all the other model parameters.However, β was then converted to an integer value between one and ten which was used to discriminate between the ten highest scoring sets of parameters from Gregoire et al. (2010).Therefore, in total, eight free parameters were varied and an ensemble of one hundred members was run.For Latin hypercube sampling, it is advantageous to have at least ten times as many ensemble members as free parameters; this condition is therefore easily fulfilled in this case.It would have been statistically advantageous to vary each parameter independently for each PFT but this would have increased the necessary size of the ensemble beyond that which was possible due to computational constraints.

Perturbed parameters -ocean
A further ensemble, perturbing the parameters in the HadOCC sub-model was also carried out.Table 2 shows the control values of the structural parameters in the ocean carbon cycle of FAMOUS which are varied in this work (see Table 2 of Palmer and Totterdell, 2001 for more detailed information).Since there are twenty structural parameters listed in Table 2, to vary each parameter individually would require at least two hundred simulations to be performed which is currently impractical.Therefore, the parameters were subdivided into five categories by their compartmentalisation in the model (the "free parameter index" in Table 2): (1) C : N ratio (2) phytoplankton-specific parameters (3) zooplankton-specific parameters (4) detritus-specific parameters (5) carbonate precipitation.Each parameter represented by these five indices was co-varied and therefore the condition of having at least ten times as many ensemble members as free model parameters (i.e. 5) is met.This method of co-variation was decided upon after discussions with the HadOCC code developers (P.Halloran, Met Office Hadley Centre, personal communication, 2011) and is in line with the work of Booth et al. (2012) whose co-variation scheme is used here for the land surface parameter perturbations.All parameters in Table 2 were varied by ±50 % in the Latin hypercube-generated ensemble and, as with the land surface, an ensemble of 100 members was run.
The method of studying uncertainty by varying ocean biogeochemical parameters by ±50 % has been used before by Kriest et al. (2012) and was the basis for doing so in this study.The study of Scott et al. (2011) also uses a perturbed physics approach to parameter uncertainty in the HadOCC model, although in a 1-D sense.This reduction in model dimensionality enables all parameters to be varied independently rather than using a co-variation method as used here.As discussed above, this was not feasible in this study due to computational (that is, time) constraints.Sinking rate for detritus 4 rain ratio 0.007 Carbon export as calcite, as a proportion 5 of primary production Due to the inclusion of the state parameter, β, in the land carbon cycle simulations, some ocean parameters differ between the best land surface and ocean simulations.It has been shown however that these differences to the ocean diffusivity and viscosity (Gregoire et al., 2010) make no significant difference to the model climatology.
4 How the perturbed physics ensembles were evaluated 4.1 Land surface

The Amazon now
Evaluation of how well the land surface ensemble members matched observations was done by comparison with data adapted from the Advanced Very High Resolution Radiometer (Loveland et al., 2000).This dataset was constructed via a joint European-American project, coordinated through the International Geosphere-Biosphere Programme.The construction of this dataset utilised advanced quality-control techniques and was the first global database of land-surface cover categories produced from high resolution (1 km) satellite data.
Figure 1 shows which of the surface types used in TRIF-FID has the highest fraction within each grid box and additionally what the fractional coverage of the dominant tile fraction in each grid box is equal to.From this figure it is clear that there are large areas of the world where the dominant tile fraction is significantly different from 1.The global average of the quantity given in the right-hand side of Fig. 1 is 0.63 and the spatial standard deviation is 0.18.The equivalent value for the ensemble mean is 0.72 with a spatial standard deviation of 0.12.The combination of these values (higher mean, lower variability) show that the simulations tend to favour non-coexisting PFTs in each grid box, compared to observations.In the discussion to follow, Fig. 2 is shown which is the same as Fig. 1 but for the ensemble member which is identified as having the most suitable set of parameters.This clearly shows that the dominant tile fraction in the simulations is significantly higher than observed (Fig. 1).For this reason, the dominant PFT in a grid box is used to evaluate the efficacy of the different ensemble members' reproduction of vegetation cover.Figure 1 shows that the Amazon region is a good one to concentrate on because it is a large area where the observed fraction of the dominant surface type is close to 1 and also because of the region's known effects on global climate (e.g.Werth and Avissar, 2002) and terrestrial carbon budget (e.g.Cox et al., 2000).
The Amazon region is defined to be 40 • W-80 • W, 20 • S-10 • N in this work and is predominantly defined by its BT coverage (Fig. 1).In this region there are 28 land grid boxes and in the observations 22 are BT, 4 are C4, 1 is bare soil and 1 is shrub.Figure 3 shows a histogram of the fractional agreement between PFTs in the ensemble and the observations, i.e. how many of the 28 grid boxes are assigned the same PFT in the ensemble members and in observations.In this instance the term "PFTs" is broadened to include bare soil cover.
Figure 3 shows that the majority (80) of the ensemble members agree with the observations in less than half of the grid boxes in the Amazon region.Of the remaining 20 members, 9 have 50 %-60 % agreement, 10 have 60 %-70 % agreement and 1 does better than 70 %.To reduce the number of ensemble members for inclusion in the search for a credible set of carbon cycle parameters, the top 10 scoring members are chosen for further investigation, this is done by examining the dominant PFT globally.Amongst the top 10 scoring simulations, there are some common biases such as the overestimation of the NT density over North America and the C3 fraction over Northern Eurasia.In addition to these, the models do not reproduce the observed NT distribution over Eurasia and, although the distribution is promising, the global density of BT is somewhat overestimated.It should be noted that over large parts of these areas, the fractional coverage of the dominant PFT is approximately 50 % or less in the observations (Fig. 1), whereas in the 10 best ensemble members, the fractional coverage is often well over 70 % and sometimes over 90 %.This highlights a characteristic feature of the PFT density calculations internal to the TRIFFID model; coexisting PFTs are minimised compared to observations.
Of the top 10 models, a further 3 are discarded due to the almost complete coverage of northern Eurasia with C3 vegetation and so in summary, 7 ensemble members (termed the α7 simulations) are left for further consideration albeit with some common biases in their reproduction of contemporary vegetation cover.
It could be argued that training the perturbed parameter ensemble specifically on the Amazon region will tend to "overfit" the ensemble to the observed broadleaf tree distribution in this region.However, it has been found that ensemble members which give a good reproduction of the observed dominant grid box PFT in the Amazon also tend to do better globally too.This is illustrated in Fig. 4, which shows the dominant grid box PFT for (a) the observations, (b) HadCM3, (c-e) the three top performing α7 simulations and (f-h) three further ensemble members with decreasing fractional agreement over the Amazon region.Firstly, considering sub-figures (a-c) in Fig. 4, it is apparent that HadCM3 (which is the "parent" model of FAMOUS)  (c-e) the three top performing α7 ensemble members in terms of their fractional agreement with observations over the Amazon region, (f-h) three ensemble members with decreasing Amazon agreement.There is a clear correlation between the agreement over the Amazon region and that over the whole globe.
performs better than the top performing α7 ensemble member.For example, the NT distribution over North America and the extent of the bare soil region representing the Sahara Desert are more in line with observations than FAMOUS.These improvements notwithstanding, there is still notable disagreement between observations and HadCM3 in several regions, such as the overestimation of BT in sub-Saharan Africa and the underestimation of C3 vegetation in western North America.
The biases common to the α7 ensemble members noted above are clearly visible in Fig. 4c-e (overestimation of NT in North America and C3 in Northern Eurasia) and the similarity of the global dominant PFT distribution in these figures is striking.This is important because it shows that training the ensemble on the Amazon region does not lead to significant differences in PFT distributions in other areas of the world and hence that the resulting PFT distributions are robust.The range of each of the parameters varied in this ensemble are indicated in Fig. 5 for the top three α7 simulations shown in Fig. 4. Figure 4f-h show example ensemble members with decreasing observational agreement over the Amazon region.Crucially, the global agreement also decreases markedly so that in the case of Fig. 4h, the agreement between the simulated and observational PFT distribution is virtually non-existent.
In summary, it has been shown that the agreement between the dominant PFT in perturbed physics ensemble members and that in observations is a good indicator of the fidelity of global PFT reproduction.Globally, notable regional biases remain in the ensemble members with the leading agreement in the Amazon region but two aspects of this remaining disagreement should be borne in mind.

The initial reason for choosing the Amazon region as
the training area for the ensemble was due to it having a dominant-PFT fractional coverage close to unity (Fig. 1) which is an emergent property of the simulations (Fig. 2 for example).The only other area of any real size with this property is the Sahara, which is by definition covered with bare soil.
2. It is likely that if the ensemble had been trained to a different area of the globe that a different resulting PFT distribution would have resulted.The aim of this study is not per se to show that if the Amazon region's plant biosphere can be reproduced accurately then the rest of the world's PFTs will follow suit.It is known however that the Amazon exerts a significant influence on global climate (Werth and Avissar, 2002) and so a model which can reproduce the gross features of its land surface properties was deemed crucial, although admittedly somewhat subjective.

Sensitivity of results to perturbed parameters
The 8 individual free parameters all influence different aspects of the land surface and hence the wider climate response in the model.Selecting the 7 sets of optimal parameter combinations (the α7 simulations) tells us something about how the observed metrics can constrain these parameter ranges.If the α7 simulations all correspond to similar values of a certain parameter, then this is an indication that only a relatively small range of the currently considered plausible parameter space is consistent with observed land surface coverage.This is illustrated in Fig. 5 where the individual parameter values plotted on the vertical axis are normalised between 0 and 1, where 0 represents the lowest value of the parameter chosen by the Latin hypercube sampling, and 1 represents the highest, with all other values being linearly interpolated between the two.Note that the top three simulations (in terms of their fractional agreement with the observed dominant PFT in the Amazon region) are shown with different symbols to the other α7 ensemble members for clarity.The global PFT agreement for these three simulations is shown in Fig. 4.
Figure 5 shows that some of the credible parameter ranges obtained from the ensemble are considerably smaller than others.For example, T upp could take essentially any value sampled in the ensemble, whereas f 0 is found to be limited to higher values and V crit,α to lower values.The methodology was not developed to place formal constraints on individual parameter ranges, nevertheless, Fig. 5 does indicate that the comparison of simulated and observed broadleaf extent could be used to narrow the plausible range for a number of parameters.Numerically, the parameters are fractionally constrained as follows: f 0 (31 %), LAI min (88 %), N L0 (53 %), R grow (62 %), T upp (92 %), Q 10 (78 %), V crit,α (30 %) and β (59 %).
The fact that the largest parameter uncertainty lies with T upp poses a challenge for future carbon cycle changes, where temperature dependence of plant photosynthesis (represented by this parameter) is the dominant uncertainty in future responses (Booth et al., 2012).This result suggests that contemporary plant distributions do not provide a potential constraint on the range of plausible T upp values, and hence a way to constrain the range of future changes.This analysis, however, does illustrate that model comparisons with observed vegetation cover may provide a stronger constraint on other parameters (f 0 , N L0 and V crit,α ), which is novel.Indeed, this approach suggests that observed forest cover may provide a metric to reduce uncertainty ranges on these parameters which have important roles in the climate's hydrological response.

The Amazon in the past
The Amazon rainforest has been part of the landscape of South America for millions of years.However, its structure has not remained constant throughout that time (Maslin et al., 2005).Since the reproduction of the structure of the Amazon is highly sensitive to model parameters (see Fig. 3 for example), it is important to further validate the model by perturbing the simulations in another way.This is done by changing the orbital forcing of the α7 simulations.It is known that the forest's structure was similar to today during the mid-Holocene (6000 yr ago) and so the α7 simulations were run for an orbital configuration corresponding to 6000 yr ago and compared to the equivalent for the presentday.The leaf area index (LAI) is a parameterisation of the area of leaf cover per unit area of ground (Law et al., 2008) and the differences between the mid-Holocene (and LGM) and their α7 equivalents are shown in Fig. 6.
It is clear from Fig. 6 that the LAI is generally increased across the Amazon for all of the mid-Holocene simulations with the exception of that shown in Fig. 6a.Maslin et al. have also shown that at the Last Glacial Maximum (LGM) 21 000 yr ago, the density of the Amazon was reduced, as represented by a reduction in LAI.Only Fig. 6h shows a considerable reduction in LAI at the LGM, as required for agreement with the work of Maslin et al. and this is in agreement with the result in Fig. 6a which also identifies this simulation as containing a suitable set of parameters.Therefore a combination of present-day observations and paleoclimatic reconstructions of the Amazon rainforest has been used to identify a realistic set of terrestrial carbon cycle parameters suitable for use in further research.
Figure 2 shows the dominant PFT in each grid box and its fractional coverage for the best performing ensemble member identified in the preceding discussion; it is analogous to Fig. 1 which shows the equivalent data for the observations.
The biases common to the α7 ensemble members (discussed at the end of Sect.4.1.1)are clearly seen in Fig. 2, as is the tendency for TRIFFID to not have different PFTS cohabiting in the same grid box.It should be emphasised that some of these biases may be associated with issues within MOSES/TRIFFID but other biases may be associated with problems with the control climate.For instance, FAMOUS has a tendency to make Australia too wet and hence the Australian desert area is underestimated.Unfortunately, TRIF-FID cannot be run offline and hence it is not possible to explicitly separate the climate biases from TRIFFID biases.

Identification of suitable parameters
The fidelity of the ocean carbon cycle is considered by comparing the concentration of the rate-limiting nutrient in the system, nitrate, with global observations from the World Ocean Atlas (Garcia et al., 2006).The annual mean concentration at 5 m depth in the simulations is compared with the average of the surface and 10 m depth values from the observations.The quality of the model fit to the data is calculated using the Arcsine Mielke skill (AMS) score which gives a score of 1 for perfect correlation and −1 for perfect anti-correlation.If a model field bears no resemblance to the observations then the score will be zero.Further information Fig. 7.The AMS for the ocean carbon cycle ensemble's nitrate concentration when compared against World Ocean Atlas data.The ensemble member giving rise to the highest AMS is marked with a filled circle.
regarding the AMS can be found in Jones et al. (2005) and Watterson (1996), for example.The nitrate data in the World Ocean Atlas data is given on 1 • resolution and therefore it must be regridded onto the model grid of 2.5 • × 3.75 • before meaningful comparisons can be made.
Of the 100 ensemble members, 4 gave unphysical values for the nitrate concentration in the climatologies; Fig. 7 shows the remaining 96 members' AMS values.
It is important that when a model parameter is varied to find an optimum configuration, the range of values of that parameter give rise to a broad range of model responses.It is apparent from Fig. 7 that this condition is met for nitrate, where the AMS ranges from 0.040 to 0.72 (mean 0.51) with a standard deviation of 0.16.On the contrary, if one compares the sea surface temperature from the model ensemble with observations from Rayner et al. (2003), the standard deviation is just 0.0017 around a mean of 0.85.
It should be noted here that, in reality, the productivity of the Southern Ocean is iron limited (Boyd et al., 2000).Therefore, as a further check of the validity of this method, the same AMS calculations were performed but excluding ocean points south of 60 • S.Even with this restriction on the area of study, the parameter set identified as the best in Fig. 7 still provides an AMS score of 0.67, compared to a maximum of 0.72 and a minimum of 0.05.The average difference between the AMS scores for the global and no-Southern-Ocean cases is +0.02 and the standard deviation of this quantity is 0.04.Therefore, the +0.05 difference between the value of 0.72 for the global case and 0.67 for the no-Southern-Ocean case is within this range of variability.It is reassuring that, even excluding the Southern Ocean from the data analysis, the parameters found to give the best global nitrate concentration still give a high fidelity reproduction compared to the majority of the other ensemble members.
The parameters from the highest scoring member of the ensemble (as identified in Fig. 7) are given in Table 3 along with their relationship to the control value.It is encouraging that all but one of the 5 free model parameters deviate noticeably from the control value as it adds weight to the necessity of the exercise.Additionally, none of the 5 parameters are at the extremes of the distribution of parameter space (±50 %) when compared to the control simulation, which shows that the postulated range of parameters is plausible.
In addition, Doney et al. ( 2004) have shown that the background physical state (e.g. the ocean circulation) is perhaps more important for the realism of the ocean carbon cycle than the model parameters themselves.These studies, along with the comparison to observed ocean nitrate concentration performed here, clearly show that a more coordinated study of ocean carbon cycle parameter uncertainties is required and that the work presented here is a step towards achieving the goal of better constrained parameters affecting the global carbon budget.
Although the ensemble member with the highest AMS is clear from Fig. 7, there are several other ensemble members whose scores are very similar.In light of this, the range of each parameter (with respect to the control value) for the top six highest scoring ensemble members are given in the righthand column of Table 3 and plotted in Fig. 8.This figure shows that the free parameters affecting the value of the zooplankton, detritus and carbonate precipitation parameters are poorly constrained, with their values representing the top six scoring ensemble members covering virtually the entire parameter range.However, the C : N ratios are confined to approximately the top two-thirds of explored parameter space and the phytoplankton parameters are constrained to roughly the top 25 %.This is precisely analogous to the result for the land surface ensemble which showed that the parameter f 0 is constrained to high values within its perturbed range by fitting to the dominant PFT found in the Amazon region.
The reason for this initial "singling out" of the six top performing ensemble members is that they were thought to be essentially indistinguishable in Fig. 7.However, for the avoidance of doubt, this test was increased to encompass the top ten simulations.Under these less stringent conditions, the ranges of the C : N ratios and phytoplankton parameters are essentially unchanged compared to the consideration of the top six and show that phytoplankton-specific parameters are well constrained by this work.
As a demonstration of the degree of variability present in the ensemble's reproduction of ocean nitrate, Fig. 9 shows the nitrate concentration for the observations, the top scoring ensemble member and four other ensemble members with decreasing AMS scores.The resulting five simulated nitrate distributions vary hugely in their ability to reproduce observations and the behaviour is not "linear" with Arcsine Mielke score.For example Fig. 9d has an AMS of 0.32 and shows

Physiological justification of suitable parameters
Now that a set of suitable parameters has been found to reproduce the observed surface nitrate concentration, it is important to be able to justify their magnitude physiologically.As previously stated, each free parameter was varied by ±50 % around the control value, i.e. the value "hard-wired" into the model code and parameters relating specifically to each of C : N ratios, phytoplankton, zooplankton, detritus and carbonate precipitation were co-varied.This latter approach was designed purely to reduce the number of free parameters in the model from the original number (20) to a number which could be sufficiently sampled using a 100 member ensemble (i.e.10).As shown above, the only parameters which are notably constrained by the six top scoring ensemble members are the parameters relating specifically to phytoplankton which have been shown to lie in approximately the top 25 % of the sampled range (Table 3, Fig. 8).However, even in the top six performing ensemble members, the parameters relating to zooplankton, detritus and carbonate precipitation are barely constrained at all within their ±50 % ranges around the control value.What this means is that although the resulting nitrate concentrations for the top six ensemble members are all virtually indistinguishable (from the perspective of their AMS scores) the individual values of the parameters can vary significantly.It should be noted that since all zooplankton and detritrus variables in Table 2 are co-varied, it is likely that the parameter set found to give the best AMS overall will contain individual parameters which are outside the plausible range which would have been found had each parameter been varied individually.Indeed, the goal of this study was to identify a suitable parameter set for ocean carbon cycle-climate studies rather than to explicitly elucidate the magnitude of each constituent parameter.In order to achieve this, every individual parameter should ideally be varied independently and this was not computationally feasible within the timescale of this study.In spite of this, the values of each parameter are now examined in relation to literature derived values.
As mentioned above Scott et al. (2011) performed a similar perturbed physics ensemble of HadOCC runs to that carried out here and although they use a considerably larger parameter set than the present authors (1000 sets of parameters), the simulations are run in 1 dimension and for run lengths of just 9 yr to examine the model's internal sensitivity to model parameters, without calling for model-data comparison as performed here.However, this study does provide a useful cross-validation of the validity of the parameter values identified in this work.Table 3 in the Scott paper shows the ranges of the parameters varied.It is important to note here that not only do Scott et al. (2011) not vary exactly the same parameter set as the set perturbed here, they also use some different nomenclature to the original Palmer and Totterdell (2001) paper.Table 4 provides a guide to the names of the parameters varied in this study in relation to those given in Palmer and Totterdell (2001) and in Scott et al. (2011).
Of the parameters which are common to both ensemble approaches (Table 4), only two parameter values identified in this study fall outside the range given by Scott et al. (2011) (note that the deep detritus remineralisation rate is neglected here since the range in Scott et al., 2011 is derived from its shallow equivalent, not from the literature); these are the linear zooplankton mortality rate, µ 1 , and the "rain ratio".The value of the former derived here is 0.02594 which falls just outside the literature-derived range of 0.03-0.2,however the value found here is indistinguishable to the lower bound of this range when quoted to the same level of precision.The value of the rain ratio is identified as 0.009729 which lies just outside the range quoted in Scott et al. (2011), i.e. 0.013-0.25.Now that plausible parameters have been identified for the land and ocean carbon cycles, it is necessary to examine the climatology of this new version of FAMOUS to ensure that the results obtained do indeed represent an improvement in model skill.

Climatology and validation
Since the first FAMOUS documentation paper (Jones et al., 2005), there have been a number of improvements made.Smith et al. (2008) described advances in the representation of sea ice and ozone as well as the introduction of the HadOCC ocean carbon cycle component.Smith (2012) shows improved upper level winds through the introduction of a Rayleigh friction term at the top of the atmosphere and also described other changes relating to, for example, oceansolar radiation interactions and the effect of snow at coastal points due to the fractional land-sea mask in FAMOUS (e.g.Smith et al., 2008).The climatologies of runs using the newly identified carbon cycle parameter sets are now described.5.1 Atmosphere and land surface

Near-surface air temperature
It is important to confirm that the new versions of FAMOUS described here are compatible with those published previously (Jones et al., 2005;Smith et al., 2008;Smith, 2012) and with HadCM3.This is because FAMOUS was originally calibrated against HadCM3 in order to provide an analogous climatology but with significantly reduced run times.The FAMOUS simulations in question (denoted by their unique 5 letter Met Office Unified Model simulation index) are given below and are denoted a generation number to indicate the order of their documentation date.The version of the land surface scheme, MOSES, is also given.
-  -Generation 4b The generation 3 simulation, XFHCC, is the most recently documented version of FAMOUS prior to this work although most work currently being undertaken with FAMOUS uses XDBUA (the generation 2 model) or XFXWB (Smith, 2012).
The only major structural difference between XFXWB and XFHCC (the generation 3 model used here) is the inclusion of Rayleigh Friction in the upper 3 atmospheric model levels; a change which has been shown to improve the climatology.XFHCC is therefore chosen above XFXWB as the generation 3 model to enhance traceability in the documentation of FAMOUS; noteworthy differences between XFXWB and XFHCC are described in Smith (2012).
All previously documented versions of FAMOUS have used the MOSES 1 land surface scheme and a fixed vegetation distribution and so the newly optimised description of the model represents a step change in model complexity.Figure 10 shows the 1.5 m air temperature for the simulations described above and Table 5 shows the corresponding AMS values.
One particularly apparent aspect of the FAMOUS results shown in Fig. 10 is the persistent cold bias in the Northern Hemisphere in DJF although this is significantly improved in more recent versions of the model compared to the 1st generation.Generations 3, 4a and 4b are strikingly similar in DJF with a cold bias which is shifted east compared to generation 2. In addition, the agreement between FAMOUS and HadCM3 is noticeably better in JJA compared to DJF; this is evident in all versions of the model.
Another result (not shown) is that the introduction of MOSES 2.2 (with fixed vegetation cover) whilst maintaining the un-optimised carbon cycle parameters overcompensates for the Northern Hemisphere winter cold bias and introduces a summer warm bias.Using the optimised parameter set does leave some cold bias in place (Fig. 10) but significantly improves this "new" summer warm bias.So in summary, the introduction of MOSES 2.2 provides an annual mean temperature climatology which is as good as any of the previously documented versions of FAMOUS.If the vegetation is fixed  to observations of the contemporary biosphere, the optimisation procedure described above provides not only a good global AMS score, but also helps to alleviate the persistent DJF Northern Hemisphere cold bias.

Vertical temperature profile
Having studied the ability of FAMOUS to reproduce HadCM3's surface temperature distribution, the air temperature aloft is now examined with respect to the ECMWF 40 yr reanalysis (Uppala et al., 2005).The vertical temperature structure of FAMOUS was last studied in Smith et al. (2008) (their Fig. 5) and Fig. 11 shows an updated version of this figure but with simulation output plotted with respect to ERA-40 data rather than ERA-15.
The atmospheric resolution of FAMOUS is significantly reduced compared to HadCM3 (11 vertical levels compared to 19); indeed there is frequently just a single model layer at pressures lower than the tropopause (Smith et al., 2008).Therefore, the ability of the generation 4b version of FA-MOUS to accurately reproduce the temperature structure of HadCM3 and ERA-40 up to 10 mbar (the lowest value pressure level available for all the simulations presented here) is very encouraging.
One reason for the improvement in upper-atmosphere temperature profiles (along with improved upper level winds as described in Smith, 2012) is due to the different ozone parameterisations in the separate model versions and these values are shown in Table 6.Smith et al. (2008) used the CPC Merged Analysis of Precipitation (CMAP) dataset (Xie and Arkin, 1997) to validate the 2nd generation FAMOUS model, XDBUA, and this dataset is also used here.Figure 12 shows the annual mean total precipitation for the CMAP climatology, the 3rd generation model, XFHCC and the 4th generation models XFHCU (fixed vegetation) and XFHCS (dynamic vegetation).This figure also shows the respective AMS values.

Precipitation
The land surface scheme of a climate model can be expected to have a significant effect on precipitation over land.For example, a significant difference between the land surface schemes of the 4th generation versions of FAMOUS and those documented previously is the introduction of plant functional types which can individually affect the fluxes of water and CO 2 at the land-atmosphere interface.
In light of this, it is reassuring that in both 4th generation versions of FAMOUS the global representation of precipitation is improved compared to the 3rd generation version as shown by the AMS scores in the subtitles to Fig. 12b-d.The main features to note are the improvement to the (positive and negative) biases over the equatorial Pacific in Fig. 12c  and d and also over the Amazon basin in Fig. 12c, where the vegetation is held constant.The significant improvement to the positive bias over the Maritime Continent is also rather striking, particularly given this region's significant influence on large-scale heating and atmospheric circulation (Neale and Slingo, 2003).In tandem with these improvements, there is a small increase in the positive bias in the equatorial Atlantic in the 4th generation models but overall the global precipitation is noticeably improved with respect to the earlier version.Figure 12a clearly shows that the areas of highest rainfall are located in the ITCZ and SPCZ (Inter-Tropical and South Pacific Convergence Zones).What this means is that the precipitation anomalies with respect to the CMAP observations in Fig. 12b-d mainly highlight these areas.Figure 13 shows the same data as Fig. 12 but only for the northern and southern mid-latitudes (30 • -60 • ) and Table 7 gives the respective AMS scores.
From Figs. 12 and 13 as well as Table 7 it can be seen that although the global, tropical and southern mid-latitude AMS is improved in the generation 4 simulations compared to the generation 3 version, this is not the case for northern midlatitudes.This slight deterioration is due to an increase in the positive bias over western North America and an increase in the negative bias over the Northern Pacific.

Surface nitrate
Figure 14 shows the annual average surface nitrate concentration for observations from the World Ocean Atlas (Garcia et al., 2006) and for the generation 3 and 4a simulations.
There are no significant differences between the nitrate distributions for the two generation 4 models XFHCU -Fig.14c -and XFHCS (not shown) which is expected because these simulations differ only in their representation of terrestrial vegetation.When comparing Fig. 14b and c however, a marked improvement in FAMOUS' ability to reproduce the observed nitrate concentration is seen between generation 3 and 4, which is clearly manifested in a significant increase in the AMS score for XFHCU as shown above Fig.14b,c.Clearly this is the expected result because the optimised ocean carbon cycle parameters used in XFHCU were tuned to the nitrate concentration in Fig. 14a.However this does provide a good illustration of the power of the tuning method employed in this work.For example, the large positive bias in the equatorial Pacific is significantly reduced and, although the positive bias in the south Atlantic is increased, the overall Southern Ocean bias is markedly reduced.As previously mentioned however, the Southern Ocean bias is of lesser importance here since the ocean productivity in this region is, in reality, iron limited (Boyd et al., 2000).

Vertical nitrate profiles
Although the surface nitrate distribution has been improved by the newly identified set of ocean carbon cycle parameters, the effect on the same quantity at depth is now investigated, again with respect to the World Ocean Atlas dataset.Figure 15 shows the observed quantity and the equivalent plots for the generation 3 and 4 configurations of FAMOUS, respectively.Although there is a deterioration in the negative bias around 1000 m depth in northern mid-latitudes, there is a striking improvement in the positive bias at high northern latitudes.The alleviation of the Southern Ocean negative bias present in the third generation model (shown above in Fig. 14) is also significantly improved.
In addition to the global results shown in Fig. 15, Figs.16 and 17 show the results for the Atlantic (70 • W-20 • E) and Pacific (150 • E-290 • E) basins, respectively.In the Atlantic, the agreement at depth in the southern hemisphere is less good in the fourth generation models (c and d), however the northern hemisphere agreement is improved at all depths.For the Pacific Ocean, there is little change in the level of agreement in the southern hemisphere except for a small deterioration in the positive bias around 1000 m depth between the equator and approximately 45 • S. The negative bias at depths below 2000 m is noticeably improved in the northern hemisphere however.

Ocean productivity
Figure 18 shows global and zonally averaged ocean productivity data from observations (Behrenfeld and Falkowski, 1997) and the difference between the generation 3 and 4a configurations of FAMOUS and this observational data.As for the surface nitrate concentration, the generation 4b model is not shown since the results are very similar to those of the generation 4a model.The agreement between simulated and   observed distributions is degraded in the equatorial oceans for the generation 4 configuration compared to the 3rd generation.However, in the extra-tropics, right up to the polar regions, the agreement between the different generations of FAMOUS themselves is striking, although the simulated values are generally less productive than observed.It should be noted that this inter-model agreement in the extra-tropics is not seen for the surface nitrate concentration (Fig. 14) where significant differences are visible in the Southern Ocean for the different model configurations.
The global productivity sum for the observations is 50.8Pg C yr −1 , in contrast to 33.0 and 71.3 Pg C yr −1 for generations 3 and 4a, respectively.This quantity is an often quoted metric in the literature (Palmer and Totterdell, 2001;Behrenfeld and Falkowski, 1997;Cox et al. 2000, for example) and so in this regard, the degradation in agreement between modelled and observed results is only marginal in that the generation 3 model underestimates the global mean value by 35 % and the generation 4 model overestimates by 40 %.

Discussions, conclusions and future work
The two new versions of FAMOUS presented here represent an important increase in model complexity compared to previous versions of the model, with the inclusion of surface tiling into 9 sub-types and the flexibility to include dynamic vegetation response to climate forcings.The carbon cycle parameters of both the land surface and the ocean have been tuned to observations and reanalysis data and the climatologies of the new versions of the model have been shown to be noticeably improved.
Concerning the terrestrial carbon cycle, the use of a large ensemble of 100 climate simulations has enabled the determination of sensible ranges of the parameters varied in the ensemble methodology.It is clear that certain parameters are significantly better constrained than others by this work.For example, the 7 ensemble members which are seen to give the best representation of the Amazon rainforest only account for 30 % of the variation of the parameter controlling the critical soil moisture (V crit,α ), whereas the same 7 simulations encompass 92 % of the parameter range of the T upp parameter which, in part, controls the response of photosynthesis with temperature.This last result concerning T upp suggests that comparisons with land surface coverage do not provide a constraint on the future land carbon cycle uncertainty identified here and in Booth et al. (2012).It does raise the interesting implication, however, that comparisons of land surface coverage between observations and simulations may constrain other land carbon cycle parameters more closely tied to the hydrological response within the model.For the ocean ensemble, it has been shown that the phytoplankton parameters are better constrained than the others with the top six performing ensemble members accounting for approximately just the top 25 % of the parameter variation.
Despite including many elements of the carbon cycle, the work presented here fixes the atmospheric concentration of CO 2 at preindustrial levels.This clearly limits the degree to which the newly modelled carbon cycle processes can influence the large-scale climate of the model.Lifting this restriction whilst maintaining a realistic climate simulation, and assessing the climate and sensitivities of this fully interactive carbon cycle version of FAMOUS is beyond the scope of this paper, and will be addressed in a forthcoming publication.
This work illustrates that many parameters are underconstrained or under-determined by the simulationobservation comparisons presented here.This represents one of the key challenges in model development and is an important factor linked to uncertainty in simulated responses to future climate scenarios (e.g.Booth et al., 2012).The use of simulated-observed comparisons to constrain model parameters is more advanced in the development of atmospheric components of models (e.g.Murphy et al., 2004 where simulations are compared against a very large basket of observational metrics) but even in these cases, with the larger number of observations, many (often key) parameters remain under-determined (Sexton et al., 2012).The development of a comparable set of observational metrics for carbon cycle processes, is in its infancy in comparison.It has been illustrated here that land vegetation cover, specifically Amazon forest extent, and ocean nitrate can both be used to narrow the range of plausible values for some but not all carbon cycle parameters.This is a first step.With the more central role of carbon cycle processes in current global climate models (CMIP5), we will need to identify and develop a broader set of biogeochemical observational metrics as part of the processes of calibrating parameter sets, while recognising that some parameters will always remain under-determined and that this will be linked to an uncertainty in the simulated responses.
The HadOCC model is currently the only biogeochemical model which can be coupled to the Hadley Centre GCM ocean.Future work with this modelling framework will focus on significantly improving the biogeochemical cycling capabilities of the Hadley Centre model.The first stage of this biogeochemical cycling improvement work will aim to include oxygen as a fully prognostic variable and later developments will aim to increase the number of nutrients simulated in the model, which is currently limited to just one, i.e. nitrate.More recent versions of the Hadley Centre model (for example HadGEM2 which is part of the Hadley Centre's contribution to the forthcoming IPCC Fifth Assessment Report) use an upgraded version of the HadOCC model known as diat-HadOCC (Halloran, 2012;Collins et al., 2011), however Additional future model development work with the FA-MOUS biogeochemical scheme aims to incorporate further ocean processes related to long timescale responses of the ocean carbon cycle (such as weathering); analogous to the GENIE Earth system model (e.g.Ridgwell and Hargreaves, 2007).This will enable more realistic multi-centennial climate simulations to be carried out with a view to aiding a better understanding of ocean acidification under climate change, for example.Work is also underway to improve the coupling between FAMOUS and the Glimmer ice sheet model by including a detailed representation of sub-gridscale orography and snowpack behaviour.

Fig. 1 .
Fig. 1.The left-hand figure shows the observed dominant plant functional type for the present-day (Loveland et al., 2000) and the right-hand figure shows the fractional coverage of the dominant type.BT (broadleaf tree), NT (needleleaf tree), C3 and C4 vegetation and S (shrubs) and BS (bare soil).

Fig. 2 .
Fig. 2. The left-hand figure shows the simulated dominant plant functional type for the best performing land surface ensemble member and the right-hand figure shows the fractional coverage of the dominant type.

Fig. 3 .
Fig. 3.Histogram of the fractional agreement between the 100 ensemble members and the observations over the Amazon region for all PFTs.Here, "fractional agreement", gives the fraction of the 28 grid Amazonian grid boxes which are assigned the same PFT in the ensemble members and in observations.

Fig. 4 .
Fig. 4. Dominant grid-box PFT for (a) observations, (b) HadCM3, regridded to the FAMOUS resolution(c-e) the three top performing α7 ensemble members in terms of their fractional agreement with observations over the Amazon region, (f-h) three ensemble members with decreasing Amazon agreement.There is a clear correlation between the agreement over the Amazon region and that over the whole globe.

Fig. 5 .
Fig. 5.The sensitivity of the 100 land surface ensemble members to individual parameters.The α7 simulations are shown with filled symbols and the horizontal lines represent the minimum and maximum values of each parameter covered by them.In decreasing order, the top three performing α7 simulations (in terms of their fractional agreement with the observed dominant PFT in the Amazon region) are shown by upward-facing arrows, downward-facing arrows and squares respectively.

Fig. 6 .
Fig. 6.Difference between the combined-PFT LAI of the mid-Holocene and α7 runs (a-g) and the equivalent residual plots for the LGM and α7 runs (h-n).

Fig. 8 .
Fig. 8.The sensitivity of the ocean ensemble members to individual parameters.The top scoring six (in terms of their AMS) simulations are shown with filled symbols and the horizontal lines represent the minimum and maximum values of each parameter covered by them.

Fig. 9 .
Fig. 9. Annual mean nitrate concentration in mmol per m −3 at 5 m for (a) World Ocean Atlas observations (Garcia et al., 2006) and (b-f) 5 ensemble members with AMS scores varying between the maximum and minimum for the perturbed physics ocean ensemble.

Fig. 10 .
Fig. 10.Air temperature at 1.5 m with respect to HadCM3 for progressively more modern versions of FAMOUS (most recent at the bottom of the figure) for DJF (left) and JJA (right).

Fig. 12 .
Fig. 12. Annual mean total precipitation rate in mm per day for (a) the CMAP climatology (Xie and Arkin, 1997) and the difference between the simulated total precipitation and CMAP for (b) the generation 3 model XFHCC, (c) the generation 4a model XFHCU and (d) the generation 4b model XFHCS.Missing data areas are set to white and the AMS scores for the 3 model generations are given in the subtitles to (b), (c) and (d).

Fig. 13 .
Fig. 13.Difference between simulated and observed precipitation in the northern (a-c) and southern (d-f) mid-latitudes as shown globally in Fig. 12.Note the different contour intervals compared to Fig. 12.

Fig. 14 .
Fig. 14.Annual mean nitrate concentration in mmol per m −3 at 5 m for (a) World Ocean Atlas observations (Garcia et al., 2006) and the difference between the simulated and observed values for (b) the generation 3 model XFHCC and (c) the generation 4a model XFHCU.The AMS scores for simulations are given above (b) and (c).

Fig. 15 .
Fig. 15.Zonal mean-depth plots for annual mean nitrate concentration in mmol per m −3 for (a) World Ocean Atlas observations (Garcia et al., 2006) and the difference between the simulated and observed values for (b) the generation 3 model XFHCC, (c) the generation 4a model XFHCU, (d) the generation 4b model XFHCS.

Fig. 18 .
Fig. 18.(a) Observed primary production(Behrenfeld and Falkowski, 1997) and (b-c) difference between the generation 3 model XFHCC and the generation 4a model XFHCU and observational data.Sub-figures (d-f) show the same data but for zonally averaged quantities.The units of all quantities in this figure are gC per m 2 per day.
. The additional R grow parameter in this work is varied by 50 % either side of its standard value.

Table 2 .
Control structural parameters in the HadOCC ecosystem model.

Table 3 .
Parameter values for the highest scoring ocean carbon cycle ensemble member and their relationship to the respective control value.Also shown is the percentage of the control value which is covered by the top six scoring ensemble members, as shown in Fig.8.

Table 5 .
Regional and seasonal AMS values for different members of the FAMOUS model hierarchy.These are calculated for 1.5 m air temperature with respect to HadCM3.Generation numbers are given in brackets.

Table 6 .
Ozone concentrations in kg kg −1 around the tropopause for the different generations of FAMOUS.

Table 7 .
AMS scores for precipitation for the northern and southern mid-latitudes and the tropics.