Modelling the long-term carbon cycle, atmospheric CO2, and Earth surface temperature from late Neoproterozoic to present day

Abstract Over geological timescales, CO 2 levels are determined by the operation of the long term carbon cycle, and it is generally thought that changes in atmospheric CO 2 concentration have controlled variations in Earth's surface temperature over the Phanerozoic Eon. Here we compile independent estimates for global average surface temperature and atmospheric CO 2 concentration, and compare these to the predictions of box models of the long term carbon cycle COPSE and GEOCARBSULF. We find a strong relationship between CO 2 forcing and temperature from the proxy data, for times where data is available, and we find that current published models reproduce many aspects of CO 2 change, but compare poorly to temperature estimates. Models are then modified in line with recent advances in understanding the tectonic controls on carbon cycle source and sink processes, with these changes constrained by modelling 87 Sr/ 86 Sr ratios. We estimate CO 2 degassing rates from the lengths of subduction zones and rifts, add differential effects of erosion rates on the weathering of silicates and carbonates, and revise the relationship between global average temperature changes and the temperature change in key weathering zones. Under these modifications, models produce combined records of CO 2 and temperature change that are reasonably in line with geological and geochemical proxies (e.g. central model predictions are within the proxy windows for >~75% of the time covered by data). However, whilst broad long-term changes are reconstructed, the models still do not adequately predict the timing of glacial periods. We show that the 87 Sr/ 86 Sr record is largely influenced by the weathering contributions of different lithologies, and is strongly controlled by erosion rates, rather than being a good indicator of overall silicate chemical weathering rates. We also confirm that a combination of increasing erosion rates and decreasing degassing rates over the Neogene can cause the observed cooling and Sr isotope changes without requiring an overall increase in silicate weathering rates. On the question of a source or sink dominated carbon cycle, we find that neither alone can adequately reconstruct the combination of CO 2 , temperature and strontium isotope dynamics over Phanerozoic time, necessitating a combination of changes to sources and sinks. Further progress in this field relies on >10 8  year dynamic spatial reconstructions of ancient tectonics, paleogeography and hydrology. Whilst this is a significant challenge, the latest reconstruction techniques, proxy records and modelling advances make this an achievable target.


Introduction
Atmospheric carbon dioxide appears to have been essential in the maintenance of habitable conditions throughout Earth history by providing additional radiative forcing under a less luminous ancient sun. CO 2 -related climatic stabilization is attributed to the feedbacks between planetary surface temperature and the long-term carbon cycle, which allow atmospheric CO 2 to increase in response to global cooling, and decrease in response to warming, over timescales of N100 kyrs (Walker et al., 1981;Kasting, 1989).
Simple 'box' models of the long-term carbon cycle have been developed to demonstrate and test these ideas against known temperature and CO 2 variations over the Phanerozoic Eon (Berner et al., 1983;Berner, 1991Berner, , 1994Berner and Kothavala, 2001;Royer et al., 2004), and have been combined with models of other elemental cycles to also estimate concentrations of atmospheric oxygen, ocean sulphate and the behaviour of geochemical tracers such as δ 13 C of carbonates and δ 34 S of sulphates (Bergman et al., 2004;Arvidson et al., 2006).
The most recent Earth system box models are powerful predictive tools, used to reconstruct changes in global biogeochemistry and climate for times when proxy estimates are either unavailable or unreliable (Arvidson et al., 2013;Mills et al., 2016;Lenton et al., 2018;Krause et al., 2018), or used as a framework in which to test hypotheses about processes driving climate or biosphere changes over geological timescales (e.g. Falkowski et al., 2005;Mills et al., 2011;Boyle et al., 2014;Schachat et al., 2018;Lenton et al., 2018).
In this paper we return to the core predictions of Earth system box models, for atmospheric CO 2 and global surface temperature. These are among the most easily-testable predictions, as a wealth of pCO 2 estimates exist  and during the Cenozoic at least, surface temperature proxies are able to produce signals beyond the inherent uncertainties and climatic noise (e.g. Zachos et al., 2001;Hansen et al., 2013). In the following sections we attempt to construct broad estimates for both global average surface temperature and atmospheric CO 2 concentration over the Phanerozoic and beyond, given the available information. We then test current Earth system box models against these constraints, and those provided by the geological strontium isotope record, which responds to changes in the carbon cycle. Finally, we modify current models to take into account recent revisions of tectonic forcings, revise the strength of model feedbacks, and extend predictions back into the Neoproterozoic.

Whole Phanerozoic δ 18 O record
Throughout this work we will seek to compile and model the changes to Earth's global average surface temperature, noting that an agreed record of global average surface temperature for the Phanerozoic Eon does not currently exist. The only current continuous Phanerozoic dataset pertaining to planetary temperature is the oxygen isotope composition of marine shells (Veizer et al., 1999;Veizer and Prokoph, 2015). However, the record shows a long-term trend towards more negative values further back in time. Fig. 1A shows low latitude brachiopod and planktonic foraminifera oxygen isotope data from the compilation of Veizer and Prokoph (2015), and the black line in Fig. 1B shows global average surface temperature estimates by assuming a linear relationship between δ 18 O and local seawater temperature (Visser et al., 2003;see Supplementary File), and converting to global average surface temperature by assuming a scaling factor of 1.5 between low latitude and global temperature changes (Hansen et al., 2008).
Despite the trend towards more negative values, and thus higher temperatures, the low latitude oxygen isotope record shows cooling periods that agree very well with the timings of geological evidence for glaciation (Veizer et al., 2000). The record has been de-trended by assuming a gradual evolution of the oxygen isotope composition of seawater over the Phanerozoic following either a linear (Veizer et al., 2000) or quadratic (Veizer and Prokoph, 2015) function, and these corrections are applied in Fig. 1B. Much debate surrounds this dataset (Royer et al., 2004;Shaviv and Veizer, 2004;Grossman, 2012;Veizer and Prokoph, 2015;Bernard et al., 2017), and no clear mechanism has been defined to explain the proposed evolution of ocean δ 18 O values, although it is known that hydrothermal processes can cause fractionation (Gregory and Taylor, 1981), as can changes to ice mass. Here we take the de-trended records as a starting point, and compare these to other independent estimates.

Mesozoic and Cenozoic temperature proxies
Earth surface temperature estimates are a commonly-used and seemingly-reliable metric in studies of Cenozoic climate change (Hansen et al., 2013). A detailed oxygen isotope record from benthic foraminifera (Zachos et al., 2001) has been instrumental in defining climate events such as the Paleocene-Eocene Thermal Maximum (PETM) and Early Eocene Climatic Optimum (EECO) (Zachos et al., 2008). A recent combined record of benthic foraminiferal δ 18 O (Friedrich et al., 2012), including data for the Cretaceous, is reproduced in Fig. 2A. Temperature estimates from the smoothed record are plotted in Fig. 2B, following the formula in Hansen et al. (2013;see Supplementary File), and compare well with the Cenozoic temperature record presented in that paper.
A recently-developed and potentially promising paleothermometer is the organic lipid tetraether index, TEX 86 (Schouten et al., 2002). The relationship between TEX 86 and sea surface temperature has been demonstrated for the present day ocean, although uncertainty remains in the response to dissolved oxygen concentrations (Qin et al., 2015). Fig. 2C plots TEX 86 sea surface temperature reconstructions from low latitudes, as calculated by recent studies (Zhang et al., 2014;Inglis et al., 2015;O'Brien et al., 2017), alongside estimates based on Mg/Ca ratios (Evans et al., 2018). A comparison of the Mesozoic and Cenozoic records with the de-trended Phanerozoic δ 18 O lowlat record is shown in Fig. 2D, and defines three periods of elevated surface temperatures during the mid Cretaceous, early Paleogene and early Neogene. However, the magnitude of these changes is generally less than implied by the de-trended oxygen isotope record. Recent TEX 86 measurements from early Jurassic cores (Robinson et al., 2017) indicate sea surface temperatures in excess of modern values, generally agreeing with the detrended δ 18 O lowlat curve. The generally lower temperatures recorded by TEX 86 may indicate mixing with deeper waters (e.g. Jonas et al., 2017).

Glaciation ice line record
A commonly-used qualitative Earth surface temperature proxy is the paleolatitude of ice caps (Crowley, 1998;Royer et al., 2004). This record helps define the general oscillation between greenhouse (ice free) and icehouse (permanent ice cap) periods during the Phanerozoic and has been critical in the arguments presented for low-latitude 'snowball Earth' glaciations during the Neoproterozoic Era (Kirschvink, 1992;Hoffman et al., 1998). Simple one-dimensional energy balance models have been used to demonstrate the advance of ice caps towards the equator under either a decreasing solar flux, or decreasing concentration of atmospheric carbon dioxide (Budyko, 1969;Ikeda and Tajika, 1999;Hoffman and Schrag, 2002), and demonstrate a clear positive  Veizer and Prokoph, 2015;Veizer et al., 2000). Global average temperature change is assumed to be 1.5× tropical temperature change (Hansen et al., 2008). Local SST (C) Zhang et al. (2014)  relationship between the latitude of the ice line and the global average surface temperature, before reaching the 'snowball' instability. It is clear that such models are extremely parameter-dependent and must make simplified approximations for heat transport between latitude bands that do not equate well to GCM models (Coakley and Wielicki, 1979). Thus, in order to examine the relationship between glaciation paleolatitude and global average surface temperature, we first plot the paleolatitude of glaciation (Crowley, 1998;Cather et al., 2009) and global average surface temperatures derived from benthic δ 18 O for the Cenozoic in Fig. 3. These datapoints show a clear positive relationship and lie on a strong linear fit, allowing us to speculatively apply this metric to pre-Cenozoic ice line latitudes, as shown in Fig. 4. It is important to stress that this extrapolation involves uncertainty due to the differences in continental positions and ocean heat fluxes earlier in Earth history. Nevertheless, it is important to consider the quantitative implications of this proxy, given that it is widely used to define Phanerozoic greenhouse and icehouse climates. We find that the Paleozoic and Mesozoic temperature reconstructions using this method are in reasonable agreement with the area defined by the reconstructions in Sections 2.1 and 2.2.
Combining the ice-line temperature record with the paleothermometer indices described earlier allows us to tentatively define a broad window for global average surface temperature that encompasses all of these estimates (Fig. 4b). Here, the dark shaded area is derived from the δ 18 O lowlat record assuming either a linear or quadratic variation in δ 18 O seawater , in combination with the ice line proxy for long glacial periods and the ±1 st. dev. uncertainty in the benthic δ 18 O record. The lighter shaded area represents the ±1 st. dev. uncertainty in the δ 18 O lowlat record and encompasses the scatter in the TEX 86 record and shorter-lived glacial periods. Much about the long term global temperature record remains uncertain and we see this reconstruction as embodying a multi-proxy 'best guess' that attempts to quantify the generally-understood long-term temperature changes over the Neoproterozoic and Phanerozoic. Broadly, the picture appears to be of temperatures below present day in the Cryogenian (Hoffman et al., 1998), rising to a Cambrian greenhouse, cooling towards the late Ordovician glaciation, hotter climates during the Silurian and early Devonian followed by a long descent into the Late Paleozoic Ice Age by around 300 Ma, then warm but fluctuating temperatures in the Mesozoic (e.g. Dera et al., 2011) and a long term cooling from the Cretaceous towards the present day.

Phanerozoic CO 2 proxies
A number of independent proxies have been developed to estimate atmospheric CO 2 concentrations over the Phanerozoic: The isotopic composition of carbon in carbonate minerals precipitated in paleosols, when combined with an estimate of soil-respired pCO 2 (Breecker et al., 2010), can be used to estimate atmospheric CO 2 concentration via a mass balance calculation (Cerling, 1984). The boron isotope ratio δ 11 B can be used to estimate oceanic pH due to different isotopic compositions of the major boron species, which can then be used to estimate atmospheric CO 2 (Pearson and Palmer, 2000). The carbon isotope composition of carbon in phytoplankton organic alkenone molecules can be used to infer atmospheric CO 2 , as fractionations generated by photosynthesis in the mixed layer depend to some degree on local DIC concentration, thermocline depth and nutrient availability (Jasper and Hayes, 1990;Pagani, 2002;Zhang et al., 2013). Finally, given that plant stomata respond to atmospheric pressure and CO 2 , a Phanerozoic pCO 2 record can be derived from the fossil stomatal index, which measures stomatal density against epidermal cell density (Royer, 2001).
These proxy records have been carefully collated and summarised by , and the reader is directed to that paper for more information, including various critiques of the methods. The individual records from  are reproduced in Fig. 5 with moving averages shown as solid lines and standard deviations shown as light error bars. The lower panel shows a compilation of these CO 2 proxy records for the Phanerozoic, where the uncertainty window is a smoothed fit to the reported error window every 1 Myr. This window has a larger uncertainty range than the multiple-proxy LOESS fit of Foster et al. (2017), reflecting the divergent estimates between individual proxies, but is generally similar in shape.

Relationship between Phanerozoic CO 2 and surface temperature
Having now compiled Phanerozoic CO 2 and temperature estimates, it seems worthwhile to directly investigate the apparent long-term climate sensitivity to changing CO 2 concentrations. The long-term climate sensitivity, or Earth System Sensitivity (ESS; Lunt et al., 2010) differs from the shorter term 'direct' climate sensitivity (e.g. Charney et al., 1979) by considering additional factors such as changes in land surface albedo, vegetation and hydrology. It is therefore expected to be   somewhere above the canonical 3 K per CO 2 doubling for the presentday Earth, and perhaps over 6 K per CO 2 doubling during glacial times (Royer, 2016).
To calculate the ESS for the dataset, we use the temperature function from the GEOCARBSULF and COPSE biogeochemical box models, which takes into account both the changes in solar luminosity and the CO 2 greenhouse see Supplementary File). The results are shown in Fig. 6A. Estimated ESS oscillates around a value of around 5-10 K per CO 2 doubling, consistent with results of Royer (2016). Some very large sensitivities correspond in general with glacial periods, as might be expected due to ice-albedo positive feedback, but these also represent disagreements between the median CO 2 and temperature predictions over timescales b50 Myrs. That said, taking the median values of these two independent datasets confirms a clear positive correlation between CO 2 change and climate warming over the last~420 Myrs (e.g. Royer et al., 2004). An important result is that one can use the CO 2 record to directly estimate temperature, using the GEOCARBSULF/ COPSE function and a fixed ESS of 5 K, and produce results that agree reasonably well with the temperature proxy, as shown in Fig. 6B. The only clear disagreement here is during the Triassic-Jurassic, where proxy CO 2 concentration is high but surface temperature is relatively low. But even here, taking the lower bound of proxy CO 2 concentration produces temperature estimates towards the middle of the defined window. Therefore, there exists the possibility that models such as COPSE or GEOCARBSULF, in which the global average surface temperature is a function of only the solar flux and CO 2 greenhouse, may be able to competently reconstruct both of these metrics.

Carbon cycle processes and fluxes
A number of biogeochemical box models have been designed to calculate the transfer of carbon between the surface system (ocean plus atmosphere) and sediments, based on the scheme originally proposed in the GEOCARB model (Berner, 1991). This system considers a reservoir of surface CO 2 (atmospheric CO 2 plus ocean DIC), and much larger sedimentary reservoirs of buried carbonates (C carb ) and organic carbon (C org ). The major reservoirs and fluxes of the carbon cycle are shown in Fig. 7A, corresponding to the setup used in the most recent iteration of the COPSE model .
In summary, long-term (N100 kyr) carbon inputs to the surface system occur through: direct mantle injection; the tectonic recycling and degassing of carbon-bearing sediments; and via the subaerial weathering of carbonates or fossil organic carbon. Long-term outputs from the surface system to the sediments occur via burial of organic carbon and precipitation of marine carbonates. The blue arrows in Fig. 7A show the supply of cations (e.g. Ca 2+ , Mg 2+ ) from terrestrial silicate weathering and from seafloor hydrothermal alteration ('seafloor weathering'), which contribute to overall marine carbonate burial.
Carbon cycle models aim to reconstruct variations in the surface system CO 2 reservoir by reconstructing the magnitude of all input and output fluxes over geological timescales. Fluxes are estimated by considering 'forcings' and internal variables. Forcings are primarily tectonic and evolutionary changes and are imposed on the model, whilst internal variables are quantities such as pCO 2 and surface temperature, which the model is free to calculate over time. Both forcings and variables may influence model fluxes.
In this paper we will mostly experiment with the COPSE model, but we also run the latest version of GEOCARBSULF . These models are both based on the carbon cycle shown in Fig. 7A, and both also model the long-term geochemical cycles of sulphur and  oxygen, but use very different treatments of the organic components of these cycles: COPSE calculates organic fluxes using built-in nutrient cycles Ingall, 1994, 1996;Lenton andWatson, 2000a, 2000b), where the oceanic concentrations of phosphate and nitrate are free to vary and control primary productivity; GEOCARBSULF derives organic fluxes from the isotopic records of carbon and sulphur using a mass balance approach (Berner, 1987(Berner, , 2001. There are also differences in the choice of model forcings, and the representation of model processes. Detailed reviews of both current models are available Lenton et al., 2018) and the reader is directed to these papers for further discussion and model equations.
Despite the complexities of the carbon cycle, and of individual models, the regulation of atmospheric CO 2 concentration over the Phanerozoic is generally considered to be a balance between the overall CO 2 degassing rate, and the drivers of terrestrial silicate weathering rates (e.g. Berner, 2004). There are reasonably good reasons for this simplification: 1) carbonate (e.g. limestone) weathering is a source of surface system CO 2 , but the products of weathering recombine over long timescales (N1 Myr) to form marine carbonates, 2) steady state for atmospheric oxygen requires steady state of the organic side of the carbon cycle (or an unsustainable imbalance in the sulphur cycle), 3) the rate of seafloor weathering appears to be controlled by ocean temperature and ridge generation rate, leaving it without an independent forcing (a 'slave variable').

Strontium cycle
The cycle of strontium is closely linked to the inorganic carbon cycle: strontium is delivered to the ocean following the weathering of silicates and carbonates, and substitutes for calcium during the formation of marine carbonates (Fig. 7B). For this reason, the Sr cycle has long been seen as a proxy for carbon cycle processes (Kump, 1989;Francois and Walker, 1992). Early studies tied variations in 87 Sr/ 86 Sr ratios to the weathering of different lithologies (e.g. Brass, 1976), but many modern works instead view Sr ratios as a balance between river input from terrestrial weathering (for which the isotope ratio is high) and from the mantle (for which the ratio is low) (Kennedy et al., 2006;Van Der Meer et al., 2014;Torres et al., 2014;Caves et al., 2016). Whilst river input is indeed more radiogenic, this interpretation is complicated by the differential weathering of highly radiogenic lithologies (carbonates, felsic silicates; blue lines in Fig. 7B; e.g. Galy et al., 1999), versus the weathering of less radiogenic basalts (Dessert et al., 2003;Allegre et al., 2010; red lines in Fig. 7B). The COPSE model includes a dynamic strontium cycle as shown in Fig. 7B, allowing it to output estimates of the seawater 87 Sr/ 86 Sr ratio over the Phanerozoic based on weathering rates, changes in lithology and mantle inputs (Francois and Walker, 1992;Mills et al., 2014;Lenton et al., 2018). Following similar reasoning, the GEOCARBSULF model uses the strontium isotope record to calculate the mafic fraction of terrestrial silicate weathering (Berner, 2008). Fig. 8 shows the Phanerozoic CO 2 and temperature reconstructions from the COPSE  and GEOCARBSULF ) models against the proxy estimates collated in earlier sections, and against the 87 Sr/ 86 Sr record (McArthur et al., 2012;Cox et al., 2016) for COPSE. Both models predict similar CO 2 concentrations for the Paleozoic Era, which fit reasonably well with proxies (Fig. 8A). Mesozoic CO 2 reconstructions from COPSE fall within proxy data, whilst GEOCARBSULF reconstructions are generally low. Neither model captures the apparent Paleogene maximum in pCO 2 , with GEOCARBSULF underestimating pCO 2 at this time by N700 ppm, and requiring an extremely high degassing rate (~3× present day levels) to raise model estimates in line with geochemical proxies for this period (Royer et  2014). A number of model parameters have high uncertainties, and it has been shown that a plausible change in the degree to which gymnosperm plants amplify silicate weathering rates may reconcile the GEOCARBSULF predictions for the Mesozoic . In contrast to the CO 2 predictions, the global average surface temperature reconstructions of both models vary considerably less than inferred changes from proxy data (Fig. 8B). Average surface temperature in both models does not exceed 20°C at any point during the Phanerozoic, and whilst generally low model temperatures during the Carboniferous-Permian agree with substantial glaciation at this time, the variation in temperature over the Cretaceous and Cenozoic bears little relation to the available direct estimates. This disagreement is a serious problem for these models, as the most recent~100 Myr of Earth history contains the vast majority of available geochemical evidence. If a model bears no resemblance to the better constrained variations in temperature over the Cenozoic, then its predictions for the operation of the carbon cycle over deeper time should be treated with caution.

Published Phanerozoic reconstructions
In the next section we investigate the possible reasons for the divergence of Earth system box models from available data. Basing our investigation firstly on the COPSE model, then exploring new versions of both COPSE and GEOCARBSULF. We revisit the model tectonic forcings, and the way in which weathering fluxes are linked to global erosion rates and temperature. We also take this opportunity to extend the COPSE model forcings back to 750 Ma. In the subsequent sections, 'unaltered models' refers to the most recent published COPSE  and GEOCARBSULF  models.

Tectonic forcings
Key forcings affecting the long-term carbon cycle are the relative rates of CO 2 degassing, and the relative rates of material uplift and erosion. In both the GEOCARBSULF and COPSE models, degassing rates are assumed to follow the rate of production at mid ocean ridges. This follows the reasoning that long-term plate creation and destruction rates are equal, and thus the ridge production rate is a proxy for both the rate of ridge CO 2 input and for the rate of subduction-recycling at volcanic arcs. GEOCARBSULF  uses the sea-level inversion of Gaffin (1987) to inform ridge production rates, whilst COPSE  uses a simplified version of the Gaffin method  that is based on more recent sea-level reconstructions (Snedden and Liu, 2010;Haq, 2014).
The sea-level inversion method is unable to disentangle climate effects on the sea-level curve (Conrad, 2013), and an alternative method of reconstructing plate creation and destruction rates is through reconstructing the length of subduction zones. Subduction zone lengths have been reconstructed from direct imaging (Van Der Meer et al., 2014), plate tectonic reconstructions (Cao et al., 2017;Mills et al., 2017) and through kinematic modelling (Merdith et al., 2017) and show a general agreement over the last 750 Myrs . Under the assumption of a long-term, near-constant spreading rate, the rate of plate destruction, and therefore the total ridge and arc CO 2 input, should be reasonably approximated by the total global subduction zone length.
Recently it has been proposed that extensional tectonics, such as continental rifting, represents a significant and previously-ignored long-term CO 2 source, and it has been shown that basing the GEOCARBSULF model degassing rate on rift lengths produces a better fit to Paleogene CO 2 proxies than can be achieved using subduction zone lengths (Brune et al., 2017).  Brune et al. (2017). Assuming that~37% of tectonic CO 2 input is from continental rifts (Kelemen and Manning, 2015), and the rest is from ridges and arcs, we combine these curves to modify the CO 2 degassing forcing (assuming constant rift input before 200 Ma).
An additional subduction-related forcing in COPSE and GEOCARBSULF is the relative burial depth of carbonates. The evolution of pelagic calcifiers has likely increased the amount of carbonate entering subduction zones, increasing in turn the overall rate of CO 2 degassing (Volk, 1989). The current GEOCARBSULF model  assumes that this contribution to degassing rates has risen linearly over the period 150-0 Ma, and this formulation is used in the most recent COPSE model . Given that the link between carbonate deposition and degassing is not straightforward (Edmond and Huh, 2003), and that both current models predict low temperature over the last 150 Myrs, we return this forcing to follow its original description (Berner, 1991), where it rises from a value of 0.75 at 150 Ma to reach the modern day value (of 1), by~100 Ma.
Erosion rates in both COPSE and GEOCARBSULF are informed by a long-term polynomial fit to the sediment mass compilation of Ronov (1993) and erosion-loss relationship of Wold and Hay (1990). We extend this forcing back to 750 Ma using a moving average through an updated version of this data which includes Quaternary values (Hay et al., 2006), shown in Fig. 9B.

Effect of erosion on weathering fluxes
Global rates of silicate weathering are estimated in box models from the global average surface temperature following Arrhenius' equation for the influence of temperature on chemical reaction rates, and scaling by the global rate of runoff, which is itself a function of average surface temperature (Berner, 1994). Aside from temperature and runoff, chemical weathering is also influenced by the supply of material from erosion, but the relationship between erosion rates and overall chemical weathering is complex. Low erosion rates limit chemical weathering, and a linear trend between physical and chemical weathering is observed in low-erosion catchments (West et al., 2005). High erosion rates permit greater rates of chemical weathering (Gaillardet et al., 1999), but there is no clear relationship between physical erosion and chemical weathering in high erosion catchments as the reaction rate is controlled by kinetics, rather than material availability (Gabet and Mudd, 2009;West, 2012). Additionally, as might be expected under kinetic control, high erosion rates favour the chemical weathering of more reactive lithologies such as carbonates, over less reactive silicates (e.g. Jacobson and Blum, 2003). Box models aim to represent the global weathering flux in response to changing global variables, and thus the overall erosion rate is considered in terms of the abundance of mountainous weathering environments, rather than simply the erosion rate in a single catchment. The COPSE model therefore links the global rates of silicate and carbonate weathering directly to the overall relative erosion rate. Other models have differentiated between the effects of erosion on silicate and carbonate weathering (Li and Elderfield, 2013;Shields and Mills, 2017) and we follow this interpretation here, allowing carbonate weathering a near-linear relationship with erosion rates, and silicate weathering a weaker relationship (see Supplementary File).

Strength of weathering-temperature feedbacks
The current model reconstructions displayed in Fig. 8 show that both the COPSE and GEOCARBSULF models strongly regulate surface temperature at values close to the present day. This is chiefly because changes in global average surface temperature cause a change in the silicate weathering CO 2 sink, which buffers against the original temperature change by altering atmospheric CO 2 concentrations. Overly-strong temperature regulation in these models may be due to an overestimation of the effect of average surface temperature and runoff changes on globally-integrated weathering rates. This can occur because not all silicate weathering is kinetically limited, and temperature in low latitudes, where much of the present-day runoff and weathering occurs (e.g. Andes, Himalaya, Indonesia; Hartmann et al., 2014), varies less than the average global surface temperature due to polewards heat transport. In order to test weaker temperature feedbacks in COPSE, we define an effective weathering temperature. This is defined to be the temperature in the tropics, using the earlier defined ratio of 1:1.5 between tropical and global temperature changes proposed by Hansen et al. (2008). Fig. 10 shows results of the COPSE model following the updates detailed in the previous sections. Panels A and B show model reconstructions where surface processes are driven by the global average temperature change, and panels C and D show model reconstructions where surface processes are driven instead by low latitude temperature change. We also run the model for a variety of Earth system temperature sensitivities to CO 2 (ESS, coloured lines). Changing the ESS alters the relationship between model CO 2 and surface temperature, and this effect is further enhanced when the CO 2 negative feedbacks are weakened by assuming that most chemical weathering occurs at low latitudes. Taking ESS = 5 K and using the low-latitude temperature change to drive surface reactions (green line in panels C and D) gives predictions of CO 2 and surface temperature that are largely within the combined proxy data, and we take this as the new COPSE model baseline.

GEOCARBSULF model reconstructions for CO 2 and surface temperature
Uncertainty about the operation of the Neoproterozoic carbon cycle currently prevents the GEOCARBSULF model being run beyond the Phanerozoic, as the isotope mass balance approach to calculating organic fluxes fails when the input carbonate δ 13 C is below the mantle value of~−5‰, such as during the enigmatic Shuram excursion (e.g. Rothman et al., 2003). Nevertheless, it is worthwhile to test our modifications to the COPSE model by applying them to GEOCARBSULF (Sections 5.1, 5.2, 5.3). We alter the degassing and erosion rate forcings for GEOCARBSULF to match the updated COPSE forcings, revise the relationship between uplift and weathering fluxes (which includes adding a dependency of carbonate weathering on uplift and erosion), and use the effective weathering temperature at low latitudes. We fix the Earth system temperature sensitivity, Γ, at 5 K per CO 2 doubling, as in COPSE. We also revise the forcing for 'the fraction of land undergoing chemical weathering' (f AW /f A ), to normalise the forcing to the present day (as in Lenton et al., 2018).
In addition to these modifications, we also reconsider some recent changes to GEOCARBSULF: A number of forcings were revised by  from the original Berner (2008) and Berner (2008) version of GEOCARBSULF, which alter model outcomes substantially, creating a larger discrepancy between the model and the proxy record. These forcings are: the total land area (f A ); the global runoff (f D ) due to changing paleogeography; the fraction of land undergoing chemical weathering (f AW /f A ) and the global average continental temperature (GEOG) due to changing paleogeography. Additionally, the time invariant chemical weathering parameters: the rate of chemical weathering of volcanic, compared to non-volcanic, silicate rocks (VNV); the variability of the 87 Sr/ 86 Sr of granites over time (NV); and the rate of the basalt-seawater reaction at present (f B (0)), are also revised by . GEOCARBSULF is very sensitive to the above forcings, and further analysis (not shown) confirms that much of the variation in the two versions above comes from the GEOG and volcanic weathering parameters. In this work, we opt to revert all of the parameters discussed above to their descriptions in the earlier papers of Berner. This should not be seen as a refutation of the work of Royer et al., as we see these differing parameterisations as part of the model uncertainty and our goal here is to determine, within this uncertainty, whether or not the model can produce a reasonable reconstruction of Phanerozoic temperature and CO 2 .  5.6. Comparisons of revised COPSE and GEOCARBSULF models to Phanerozoic CO 2 and surface temperature records Fig. 11 plots comparisons between the unaltered and revised COPSE and GEOCARBSULF models, and compares these to the uncertainty windows defined in this work for Phanerozoic global average surface temperature and atmospheric CO 2 concentration. We also employ some simple statistical tests of the ability of the models to simulate temperature and CO 2 records that are within the proxy limits: %outbound is a calculation of the percentage of the model output which falls outside the highest confidence windows for both temperature and CO 2 (the dark grey windows); Total error is the sum of the distances between the model outputs and the means through the proxy data windows. Both of these statistics are calculated from 1 Myr bins over 423-0 Ma, which defines the availability of the CO 2 proxy data, and covers the more reliable temperature reconstructions. If the model output is entirely contained within the proxy window then %outbound would be zero, and if the model output sits exactly on the data window mean then Total error is also zero. The CO 2 -temperature data comparison in Fig. 6 shows that it is technically possible for %outbound to be zero for both CO 2 and temperature, but it is not possible for Total error to be zero in both cases, as there is some mismatch between the means of the CO 2 and temperature proxy windows.
For COPSE, the revised CO 2 predictions for the Paleozoic are broadly similar to the unaltered model, showing a decline from Cambrian to Carboniferous, followed by a rise to a peak around 250 Ma. CO 2 reconstructions for the Mesozoic and Cenozoic now display broad peaks in the Cretaceous and Paleogene, reasonably consistent with the long-term pattern shown in proxy data, with the difference driven primarily by the consideration of rift activity in the degassing rate of CO 2 (Brune et al., 2017). The statistical metrics plotted in Fig. 11C show no clear difference in the overall ability of the unaltered and revised COPSE models to stay within the data windows for CO 2 , or to match the data means.
However, revised COPSE model temperature reconstructions are significantly closer to proxy data than those of the unaltered model. Around 80% of the model output is now within the stricter temperature range, as opposed to b50% for the unaltered model. This is because the shape of the model CO 2 output now more closely resembles the changes in temperature, and the increased Earth System Sensitivity leads to a stronger relationship between CO 2 forcing and surface temperature. This is especially evident for Cretaceous to present. This level of detailed box modelling has not previously been run for the late Neoproterozoic, and the results here confirm the expected trend of increasing temperature between the Cryogenian and Ediacaran, which is largely due to the increase in CO 2 degassing rates . However, the model does not produce the level of Cambrian warmth implied by the available proxies for this time, and does not show any significant cooling associated with the Hirnantian glaciation in the late Ordovician. The model also produces a relatively warm Cryogenian, but a number of possibilities exist for further cooling at this time, including intense weathering of low-latitude flood basalts (Cox et al., 2016), or biologically-driven chemical weathering . The modified GEOCARBSULF model improves the fit to the proxy data substantially, and shows an evolution of pCO 2 that is almost entirely within the boundaries of the data compilation. For global average surface temperature, the modified GEOCARBSULF model has similar error metrics to the modified COPSE model, but is less able to reproduce the broad humps in temperature during the Cretaceous and Cenozoic. As discussed earlier, our modification of GEOCARBSULF also includes reverting some revisions made to that model by . However, these revisions may represent an increased understanding of the processes themselves, and it is not acceptable to simply use the forcings and relationships that give the closest match to proxies if they are not supported by current research. The version of COPSE used in this paper does however incorporate the updated forcings for fD and f AW /f A , without resulting in very low temperature and CO 2 reconstructions for the Mesozoic and Cenozoic that are seen in the Royer et al. version of GEOCARBSULF. This can be attributed to the increased level of negative feedback in COPSE, which is a result of the dynamic biosphere and nutrient cycling in the model. Thus it appears that some key updated forcings and parameters in GEOCARBSULF could be consistent with the observed operation of the long-term carbon cycle, but only if the model were to also take biotic feedbacks into account. COPSE does not include the GEOG parameter that alters the continental temperature due to paleogeographic changes, and preliminary experiments to include this parameter degrade the model predictions, even with strong biotic feedbacks. Thus it may be the case that these changes to continental temperature do not affect chemical weathering in a way that is easily incorporated into a box model, or that other counterbalancing processes have not been considered.

Strontium cycling and 87 Sr/ 86 Sr from 750 Ma to present
The COPSE model reconstruction of seawater 87 Sr/ 86 Sr (Fig. 12A) is in reasonable agreement with long-term geological trends for the Phanerozoic (as was that in Lenton et al., 2018), and as in the previous model, is driven largely by changes in erosion rates. Fig. 12B shows a breakdown of the strontium fluxes from each model process. The changes in strontium input from the weathering of ancient silicates ('granite') and carbonates (both relatively radiogenic) are driven largely by the changing erosion rates.
The late Neoproterozoic 87 Sr/ 86 Sr reconstruction shows the same general trend as is observed in the data but cannot reproduce the magnitude of the rise between the Tonian and Cambrian. This is potentially due to uncertainty in the rates of basalt weathering for the late Neoproterozoic and Early Phanerozoic. In the model, the weathering of basalts depends on the amount of exposed mafic rocks, which is largely informed by records of Large Igneous Province (LIP) areas (Ernst, 2014;Mills et al., 2014). An apparent paucity of Precambrian and Paleozoic LIPs drives low and fairly stable basaltic area in the model, but may simply be a consequence of the decline in preserved crustal material with age. Future work could seek to correct for this effect. Fig. 12C shows the ratio of strontium river inputs to mantle input. If all terrestrial Sr sources had the same isotopic composition (e.g. 87 Sr/ 86 Sr river is constant) then this quantity would be expected to track the seawater 87 Sr/ 86 Sr reconstruction in Fig. 12A. This is clearly not the case, and especially so when considering only silicate weathering versus mantle input. The riverine strontium signal is a combination of at least three sources, each with their own distinct controls and isotopic signatures, and thus considerable caution is advised when considering any direct back-calculation from ocean 87 Sr/ 86 Sr to silicate weathering rates.

Summary and conclusions
6.1. Links between the long-term carbon cycle, CO 2 , and Earth surface temperature By compiling independent proxy records of global average surface temperature and atmospheric CO 2 concentration for the Phanerozoic, we have shown that when accounting for the solar flux, long-term Phanerozoic surface temperature changes can be clearly related to variations in the CO 2 greenhouse (Fig. 6). CO 2 appears to be a primary driver of climate on geological timescales (e.g. Royer et al., 2004).
Furthermore, the long-term CO 2 concentration derived from direct proxies agrees reasonably well with that predicted by biogeochemical box modelling at the Phanerozoic scale. This finding gives confidence that such models include broadly the right suite of processes, and accurately discern the relative importance of these in driving climate regulation. However, there are a number of instances where the models and proxies disagree, suggesting that either the current forcings remain poorly constrained, or additional processes become important at these times.

Model discrepancies: late Ordovician cooling
A key area of model-proxy disagreement is the apparent cooling and glacial period in the late Ordovician (Hirnantian). It has been proposed that a reduction in CO 2 concentration and temperature may be related to reductions in CO 2 degassing rates (based on lower abundances of  Young et al., 2009;Nardin et al., 2011). A reduction in degassing rates during the late Ordovician is not yet validated by any quantitative tectonic reconstruction (see Mills et al., 2017 for a compilation), and is not included in our model. There is also no clear evidence for a significant increase in volcanic weathering during this time, although this is clearly possible, given the tectonic events of the time (e.g. weathering of the Famatinian arc: Young et al., 2009). Fig. 13 shows the COPSE model reconstruction for the late Ordovician, which predicts generally static and high global average temperatures and static 87 Sr/ 86 Sr (solid lines). Dashed and dotted lines show model runs that explore adding additional volcanic basalt weathering starting in the late Ordovician (from 450 to 430 Ma). The dashed lines show the effects of adding the entire present day basalt weathering area, and the dotted lines show the addition of twice this amount. Such an event might be best explained by the emplacement of a Large Igneous Province (LIP), as hypothesised for this time period (Lefebvre et al., 2010). Enhancing silicate weathering in this manner causes CO 2 and temperature to decline, and 87 Sr/ 86 Sr to fall (e.g. Fig. 7). In order to not exceed the amount of variation observed in ocean 87 Sr/ 86 Sr, global cooling is limited to about 3°C on average, or around 2°C in the tropicsfar below the degree of cooling proposed by examining oxygen isotopes (Trotter et al., 2008). The rise of the first land plants greatly accelerating weathering, particularly of easily-weathered volcanic rocks (Lenton et al., 2012), combined with a 'tipping point' in sea-ice cover on the then ocean-covered Northern Hemisphere (Pohl et al., 2014) may be the best way to explain this sharp cold interval.

Reconciling box modelling with Neogene and Quaternary proxies
A persistent problem with long-term Phanerozoic modelling has been the inability to reproduce the events of the late Cenozoic. Due to the abundance of proxy data for CO 2 and temperature, and nontraditional isotopic proxies such as lithium and beryllium (Misra and Froelich, 2012;Willenbring and von Blanckenburg, 2010), the cooling period in the late Cenozoic has long been the testing ground for hypotheses about the long-term carbon cycle and regulation of Earth surface temperature (Raymo and Ruddiman, 1992).
Our modifications to the COPSE model produce a reasonable fit to Neogene and Quaternary records of global average surface temperature, atmospheric CO 2 concentration and ocean 87 Sr/ 86 Sr (Fig. 14). As noted earlier, they confirm the established (but frequently overlooked) view that late Cenozoic cooling does not require a long-term increase in silicate weathering rates (Kump and Arthur, 1997): The silicate weathering flux is set by the global carbon cycle balance, and increases in erosion result in this global balance being maintained at a lower temperature. A shift towards more positive 87 Sr/ 86 Sr values is the consequence of the weathering of uplifted older terranes and sediments during a period of generally increased erosion rates, but decreasing global rates of silicate weathering (as required by a decreasing CO 2 input rate).

A source and sink driven long-term carbon cycle
Recent studies have asked whether the long-term carbon cycle is primarily controlled by the carbon sources from CO 2 degassing (McKenzie et al., 2016), or by the changing carbon sink due to silicate weathering . The COPSE model is well-placed to investigate this question. Fig. 15  these models can reproduce the general long-term climate trends of the full model, as model development aims to only include the most relevant processes (see Lenton et al., 2018). In agreement with , the sink-only model can explain climate shifts before the Jurassic reasonably well, but not during the Cretaceous and Cenozoic. The source-only model can explain the general trend in temperature increases and decreases, and can do so quite well if the burial depth of carbonates ('B forcing') is ignored. However, the source-only model cannot explain variations in ocean 87 Sr/ 86 Sr. Whilst strontium isotope ratios are not a direct weathering proxy, their variations are primarily controlled by changes in carbon sinks. Thus, a source-driven carbon cycle cannot be made consistent with the 87 Sr/ 86 Sr record.

Spatial weathering regimes and the future of box modelling
Estimating long-term chemical weathering rates in a nondimensional box model is a difficult task. The global rates of chemical weathering on the present-day Earth are spatially heterogeneous (e.g. Hilley and Porder, 2008;Hartmann et al., 2014), and depend on local temperature, and rates of erosion and runoff (Gaillardet et al., 1999;West, 2012). A reasonable marriage of these parameters can be achieved in spatial models in order to estimate global silicate weathering rates and CO 2 drawdown for snapshots of the ancient carbon cycle, with impressive results (Goddéris et al., 2012(Goddéris et al., , 2014, but the number of computer CPU hours required to solve these models means that they have not been used to create continuous 500-million year records of isotope tracers, or represent non-steady-state dynamics, such as the build-up of carbon in the Earth's crust and long-term exchanges with the mantle (Hayes and Waldbauer, 2006). Long integration times also affect model development and testing, limiting the number of experiments that can be run.
There is scope to move beyond the current situation. One approach would be to asynchronously couple between running 'snapshots' of a climate model to derive spatially-explicit weathering fluxes and using a box model to span much longer intervals between the snapshots. More economically, if for a given paleogeography it can be established from the climate model that the spatial pattern of precipitation changes scales with global average temperatureas is the case for the modern climatethen a pattern-scaling approach can be taken (Huntingford et al., 2010;Huntingford and Cox, 2000). In this approach the spatial pattern of precipitation and how it changes with global temperature would be extracted from a climate model for a series of snapshots through the Phanerozoic, and then an elaborated box model would use these scaled patterns to drive weathering fluxes. This would provide a computationally efficient solution, but still poses some issues such as how to incrementally change between different paleo-geographies during the operation of plate tectonics.
In conclusion, recent advances in reconstructing tectonic drivers of the long-term carbon cycle, combined with established box models, can resolve several outstanding anomalies between model predictions and proxies for CO 2 and temperature. To advance further requires an improved modelling approach, for example, a continuous box model that can better represent physical climate and surface processes.

Acknowledgements
BJWM is funded by a University of Leeds Academic Fellowship and the UK Natural Environment Research Council (NERC) (NE/R010129/ 1). AJK is funded by a studentship from the NERC SPHERES Doctoral Training Partnership (NE/L002574/1). GAS and TML acknowledge funding through the NERC Biosphere Evolution, Transitions and Resilience project (NE/P013643/1), and through grants to GAS (NE/ R010129/1) and TML (NE/N018508/1). The authors thank Dana Royer for providing proxy data and the GEOCARBSULF code. We also thank the reviewers of this work for their thoughtful comments. Christopher R. Scotese retired from teaching at the University of Texas, Arlington, Department of Earth and Environmental Sciences in 2013. He is now a Research Associate at the Field Museum of Natural History and an Adjunct Professor in the Department of Earth and Environmental Sciences, Northwestern University. He continues to collaborate with several research groups on topics concerning the history of the Earth System, but his main focus is a book entitled: "Earth History, the Evolution of the Earth System". He is the coauthor of N100 scientific publications, and his maps and animations have been used in numerous geological textbooks, scientific research papers, and are on display in museums worldwide.
Daniel J. Hill is a lecturer in global change modelling in the School of Earth and Environment at the University of Leeds. As a paleoclimate modeller, he investigates the mechanisms and impacts of climate change across a wide range of timescales through Earth history. Having spent much of his career simulating the climate of the Pliocene Epoch, including the ocean, cryosphere, biosphere and paleogeographic change, he is now studying the climatic drivers of the Permo-Triassic mass extinction and the impact of climate on ancient civilizations and many of the climatic changes in between.