A robust initialization method for accurate soil organic carbon simulations

. Changes in soil organic carbon (SOC) stocks are a major source of uncertainty for the evolution of atmospheric CO 2 concentration during the 21 st century. They are usually simulated by models dividing SOC into conceptual pools with contrasted turnover times. The lack of reliable methods to initialize these models, by correctly distributing soil carbon amongst their kinetic pools, strongly limits the accuracy of their simulations. Here, we demonstrate that PARTY SOC , a machine-learning model based on Rock-Eval® thermal analysis optimally partitions the active and stable SOC pools of AMG, a simple and well 20 validated SOC dynamics model, accounting for effects of soil management history. Furthermore, we found that initializing the SOC pool sizes of AMG using machine-learning strongly improves its accuracy when reproducing the observed SOC dynamics in nine independent French long-term agricultural experiments. Our results indicate that multi-compartmental models of SOC dynamics combined with a robust initialization can simulate observed SOC stock changes with excellent precision. We recommend exploring their potential before a new generation of models of greater complexity becomes operational. The 25 approach proposed here can be easily implemented on soil monitoring networks, paving the way towards precise predictions of SOC stock changes over the next decades.


Introduction
Soil organic carbon (SOC) plays an important role in sustaining soil functions and associated soil ecosystem services worldwide (IPCC, 2019).It is the largest terrestrial organic carbon reservoir, with the upper two meters of soil storing 2400 Pg C, three times more carbon than the atmosphere (Jobbagy and Jackson, 2000).A mere 4 per 1000 annual decrease in SOC stocks (ca. 10 Pg C yr −1 ) may double the global annual anthropogenic CO2 emissions, while an equivalent increase may compensate them (Balesdent and Arrouays, 1999).This is the concept behind the 4 per 1000 initiative (Rumpel et al., 2018) that aims at increasing SOC stocks to fight global warming while ensuring food security, two Sustainable Development Goals of the United Nations (UN General Assembly, 2015).This initiative and other political headway have placed the question of managing SOC stocks and assessing the global SOC sequestration potential at the top of political and scientific agendas (Vermeulen et al., 2019;FAO, 2019;Amelung et al., 2020).Despite this particular attention, the prediction of SOC stock changes remains very uncertain, which makes soils a major source of uncertainty for the evolution of atmospheric CO2 concentration (Todd-Brown et al., 2014;He et al., 2016;Shi et al., 2018).Models of SOC dynamics can predict future SOC stock evolution by simulating carbon transfer into the soil mostly through plant organic matter inputs, and microbial SOC mineralization resulting in a CO2 flux from the soil to the atmosphere.They can have structures of various complexities reflecting our mechanistic understanding of SOC dynamics (Shi et al., 2018;Luo et al., 2016).However, most models dedicated to prediction, including those used in Earth System Models, have a simple structure dividing SOC into conceptual pools with contrasted turnover times (Manzoni and Porporato, 2009;He et al., 2016;Todd-Brown et al., 2014).These multi-compartmental models of SOC dynamics are the best option we currently have to foster science-based SOC preservation and sequestration actions, given the strong uncertainty of more complex models (Cécillon, 2021a;Dangal et al., 2021;Lee et al., 2020;Shi et al., 2018;Crowther et al., 2019).Predictions of SOC stocks evolution provided by such simple models are very sensitive to the initial distribution of SOC amongst the different kinetic pools (Luo et al., 2016;Smith and Falloon, 2000;Clivot et al., 2019).This makes the question of SOC kinetic pool partitioning a priority for improving the accuracy of multi-compartmental SOC dynamics models (Luo et al., 2016;Taghizadeh-Toosi et al., 2020).
The most commonly used method to initialize the size of SOC kinetic pools is to run spin-up simulations until a steady-state equilibrium for SOC is reached, eventually matching the initial SOC stock measurement (Wutzler and Reichstein, 2007;Taghizadeh-Toosi et al., 2020).However, this method has two well-known limitations.First, climatic, SOC input, and landuse or land-cover data extending over long time periods required by this approach are highly uncertain.Second, assuming steady-state equilibrium for SOC at the onset of model simulations is often unrealistic.This is due to the history of the simulated sites that often includes disturbances (e.g.fire), as well as previous changes in climate, land-use and soil management that prevent SOC pools with slow turnover times from being at equilibrium (Wutzler and Reichstein, 2007;Herbst et al., 2018;Oberholzer et al., 2014;Poeplau et al., 2011;Clivot et al., 2019).Alternative initialization procedures are needed to address these issues (Wutzler and Reichstein, 2007;Bruun and Jensen, 2002;Taghizadeh-Toosi et al., 2020).
In some models of SOC dynamics, like the AMG model (Clivot et al., 2019), a default initial SOC pool size distribution is prescribed according to basic information on land-use history (i.e.long-term cropland vs. long-term grassland; Clivot et al., 2019).This approach does not take into account the effect of recent changes in land-use or historical soil management practices on SOC pool distribution.To better reflect the effect of the frequent state of non-equilibrium of SOC on its partitioning into conceptual kinetic pools, another approach has been proposed, relating results from SOC fractionation methods with SOC kinetic pool sizes (e.g.Zimmermann et al. (2007a) or Skjemstad et al. (2004) for the RothC model; Dangal et al. (2021) for the DAYCENT model).However, this approach also suffers from important drawbacks.First, SOC fractionation procedures are tedious and cannot be implemented on large-scale studies, though this problem may be solved by using soil infrared spectroscopy or environmental variables and machine-learning (Zimmermann et al., 2007b;Viscarra Rossel et al., 2019;Sanderman et al., 2021;Cotrufo et al., 2019;Lugato et al., 2021;Baldock et al., 2013;Barthès et al., 2008;Lee et al., 2020;Dangal et al., 2021).Second, their reproducibility is questionable (Poeplau et al., 2013(Poeplau et al., , 2018)), and third, their use for initializing model SOC pool sizes has never been properly validated.A proper validation would require showing that (1) the size of measured SOC fractions matches the one of model kinetic pools, and that (2) simulations of SOC dynamics are more accurate using this initialization strategy, compared to default simulations (on independent validation sites while other model parameters remain constant).Reasonably good correspondence between measured or soil-spectroscopy-estimated SOC fractions and modelled SOC conceptual pools has been reported in a number of studies, though with some notable discrepancies (Leifeld et al., 2009b;Herbst et al., 2018;Zimmermann et al., 2007a;Dangal et al., 2021).Conversely, the studies that attempted to initialize model SOC pool sizes using a SOC fractionation scheme generally reported no improvement in the accuracy of simulations of SOC dynamics compared to a default or a spin-up initialization approach (Leifeld et al., 2009a;Nemo et al., 2016;Cagnarini et al., 2019).Only two studies showed that an initialization based on a SOC fractionation scheme yielded more accurate simulations of observed SOC dynamics, but at the cost of modifying the decomposition rate of SOC kinetic pools (Skjemstad et al., 2004;Luo et al., 2014).
An alternative approach using Rock-Eval® thermal analysis has recently been proposedunder the name PARTYSOC model to estimate SOC kinetic pool sizes (Cécillon et al., 2018(Cécillon et al., , 2021)).PARTYSOC is a machine-learning model trained on Rock-Eval® data of soil samples from long-term experiments (LTEs) where the size of the centennially stable SOC fraction can be estimated (e.g.sites including a bare fallow treatment).PARTYSOC incorporates recent key elements of the new understanding of SOC dynamics (Dignac et al., 2017), showing that the centennially stable SOC fraction has specific chemical and energetical characteristics that are measurable quickly (ca. 1 h per sample) and at a reasonable cost (less than USD 60) using Rock-Eval®; it is thermally stable (i.e.high activation energy) and it is depleted in hydrogen (Barré et al., 2016;Hemingway et al., 2019;Gregorich et al., 2015;Poeplau et al., 2019;Chassé et al., 2021).
In this study, we tested if the PARTYSOC machine-learning model, built on a totally independent data set from North-western Europe, could be used to initialize the distribution of SOC pools of the simple AMG model (Clivot et al., 2019) and improve the accuracy of its simulations.The default version of AMG is currently the most accurate model for reproducing the observed SOC stock dynamics in diverse French agricultural LTEs at the pluri-decadal scale (Martin et al., 2019).The efficient use of this model at sites covering an important pedological and climatic variability (including oceanic, continental, and tropical climate) provides further support to its robustness (Levavasseur et al., 2020;Farina et al., 2021;Saffih-Hdadi and Mary, 2008).
In this model, SOC is simply divided into two pools, the "stable SOC (CS)" that is considered as inert at the time scale of the simulation and the "active SOC (CA)" that has a mean turnover time of a few decades.A recent study (Clivot et al., 2019) determined that the optimal initial proportion of stable SOC (CS/C0) can deviate from the model's default value (0.65 in croplands), so that a more precise initialization of the CS/C0 proportion would significantly improve AMG simulations of SOC dynamics.Here, we hypothesized that the SOC pool partitioning as determined by the PARTYSOC machine-learning model (Cécillon et al., 2021(Cécillon et al., , 2018) ) would be close to the mathematically optimal one for the AMG model, therefore, improving the accuracy of its SOC dynamics simulations compared to default initialization.We tested our hypothesis on 32 treatments from nine independent French agricultural LTEs (experiment duration from 12 to 41 years with a median of 21 years) in which the AMG optimal SOC pool partitioning could be determined by ex-post optimization and for which topsoil samples collected at the onset of the experiment were available (Table 1).These LTEs were croplands established in different pedoclimates that have experienced contrasted soil management practices and land-use histories.All available initial topsoil samples were analysed with Rock-Eval® and the results were used in the European version of the PARTYSOC model, PARTYSOCv2.0EU (Cécillon et al., 2021), to compute the centennially stable SOC proportion of each topsoil sample.

Experimental sites
This work was conducted on nine French agricultural LTEs (Supplementary Material Fig. 1).Seven LTEs including 29 treatments were selected from the dataset presented in Clivot et al. (2019), from sites with availability of initial topsoil samples.
Two additional LTEs (Colmar and Feucherolles) including a total of three treatments were obtained from the dataset published in Levavasseur et al. (2020), selecting control treatments without organic amendments and with available initial topsoil samples.Basic site and topsoil characteristics are reported in Table 1 and Supplementary Material Table 1.Information necessary to run AMG simulations on a total of 32 treatments (initial soil physico-chemical properties, detailed information on management practices and observed climatic data during all experiments) were obtained from Clivot et al. (2019) for the 29 treatments of the seven sites and from Levavasseur et al. (2020) for the three treatments of the sites of Colmar and Feucherolles.1975-2010 1970-2011 2000-2018 1977-1989 1998-2019 1989-2008 1978-2005 1975-1992 1976-1997 Number

Archive soil samples from experimental sites
Our final soil sample set included 181 topsoil samples.At each site the soil was sampled to include the whole ploughing depth (Table 1).At all sites, except Boigneville where the soil was sampled in five sublayers, the ploughing layer was sampled as one homogeneous layer.Of the final samples, 71 were from starting dates of the nine LTEs, 24 from LTEs intermediate dates and 86 from LTEs final dates.All samples were air-dried or dried at 40 °C, sieved to < 2 mm and finely ground to < 250 μm using a ball mill (Retsch, Germany).

Rock-Eval® analysis of archive soil samples
All soil samples were analysed using a Rock-Eval 6® Turbo apparatus (Vinci Technologies).The samples were first pyrolyzed in an inert N2 atmosphere, then oxidized under ambient air (O2).The heating routine applied during pyrolysis was as described in Disnar et al. (2003), starting with a three-minute isotherm at 200 °C followed by a heating ramp of 30 °C min −1 up to 650 °C.For the oxidation step, a one-minute isotherm was kept at 300 °C and was directly followed by a heating ramp of 20 °C min −1 until 850 °C was reached, followed by a five-minute isotherm at 850 °C (Baudin et al. 2015;adapted from Behar et al., 2001).
Based on five generated Rock-Eval® thermograms, 18 parameters were calculated for each sample, and were then used as predictors by the random forests model.These include Total Organic Carbon (TOC; in g C kg soil −1 )the amount of organic C released during the analysis as a proportion of sample weight; Pyrolyzed organic Carbon (PC; in g C kg soil −1 )the sum of C released as HC, CO and CO2 during pyrolysis step; the ratio of PC to TOC (PC/TOC); S2 peak area (g C kg soil −1 )the hydrocarbon gas released within the range of the pyrolysis temperature ramp; the ratio of S2 to PC (S2/PC); PseudoS1 peak area (g C kg soil −1 )the sum of C released as HC, CO and CO2 during the first 200 seconds of pyrolysis (after Khedim et al., 2021); Hydrogen Index (HI; in mg HC g TOC −1 )the amount of hydrocarbons released as a ratio of TOC; the ratio of HI to Oxygen Index (HI/OIRE6)where OIRE6 is calculated as the amount of oxygen released as CO and CO2 gases normalized to TOC.Finally, various temperature parameters (T70HC_PYR, T90HC_PYR, T30CO2_PYR, T50CO2_PYR, T70CO2_PYR, T90CO2_PYR, T70CO_OX, T50CO2_OX, T70CO2_OX, T90CO2_OX; in °C) are included in the predictors set.They describe evolution steps, namely at which temperature a specific amount (e.g. 30, 50, 70 or 90%) of a given gas was released according to each thermogram (Cécillon et al., 2018).It is important to note that no pre-treatment of CaCO3-containing samples was necessary before Rock-Eval® analysis.The slow pyrolysis and oxidation steps of the Rock-Eval® method allow distinguishing carbon of organic and mineral form, since the latter is released above a given temperature.For the calculation of all of the above parameters, only the part of each thermogram corresponding to organic carbon was taken into account.For this purpose, upper temperature integration limits for Rock-Eval® temperature parameters were set at 560 °C for the CO and CO2 pyrolysis thermograms, and at 611 °C for the CO2 oxidation thermograms (Cécillon et al., 2018; Supplementary Material Fig. 2).R scripts used for computing Rock-Eval® parameters are available on the Zenodo platform (Cécillon, 2021b).

The PARTYSOC machine-learning model
The most up-to-date European version of this model, calibrated on soils from North-western Europe, used in this study, is described in detail in Cécillon et al. (2021).This model uses the 18 above-mentioned Rock-Eval® thermal analysis parameters as predictors and estimates the centennially stable SOC proportion in a topsoil sample.The model consists of a trained nonparametric machine learning algorithm, using the random forests approach to estimate centennially stable SOC proportions in unknown topsoils from centred and scaled Rock-Eval® parameters.In this study the obtained centennially stable SOC proportion of each topsoil sample, was converted to centennially stable SOC content by multiplying the predicted proportion by the total SOC content.The PARTYSOCv2.0EU model, available on Zenodo (Cécillon, 2021b), was used without any adaptation.

The AMG model of soil organic carbon dynamics
The AMG model (Andriulo et al., 1999) was developed based on the two-compartment SOC model proposed by Hénin and Dupuis (Hénin and Dupuis, 1945).It is characterized by a simple structure consisting of three carbon pools: fresh organic matter, and two SOC fractions, an active and a stable pool (Supplementary Material Fig. 3).The model allows transfer of carbon from the fresh organic matter pool either to the atmosphere through microbial mineralization or into the active pool.
Organic carbon from the active pool is also subject to mineralization, forming a second direct flux of CO2 from the soil into the atmosphere.SOM decomposition follows first order kinetics with a rate defined by the coefficient of mineralization k (yr −1 ), controlled by climatic conditions and soil characteristics.The h coefficient controls the yield of crop residues transformation into active carbon and depends on the type of fresh organic matter.No carbon exchange with the stable SOC pool is possible since it is considered inert and remains unchanged over the simulation period (here from 12 to 41 years; see Table 1).Considering the stable SOC pool as mathematically inert at this time scale is in line with consistent observations of a significant pluri-decadal persistent SOC fraction in long-term bare fallows and C3-C4 vegetation change chronosequences (Barré et al., 2010;Balesdent et al., 2018).
The AMG model can be mathematically described by two simple equations (Clivot et al., 2019): where QC is the total SOC stock (Mg C ha −1 ), QCS is the stable SOC stock (Mg C ha −1 ) defined as a fraction of initial SOC stock QC0 (s.Sect.2.6) constant for a specific treatment, QCA is the active SOC stock (Mg C ha −1 ), t is the time in years, mi is the annual C input from organic residue i (Mg C ha −1 yr −1 ), h is a coefficient representing the fraction of C inputs which is incorporated in SOM after 1 year related to the type of organic residue, and k is the mineralization rate constant associated with the active C pool (yr −1 ).
The AMG parameters (h and k) have been determined by experimental results (Clivot et al., 2019).This approach differs from most multi-compartmental SOC dynamics models for which decay rates of slower pools were calibrated indirectly, assuming an equilibrium state for SOC (Wutzler and Reichstein, 2007).The simple structure of the AMG model and the experimental determination of its decomposition rates make it less susceptible to the problem of equifinality compared to other multicompartmental models of SOC dynamics (Clivot et al., 2019;Luo et al., 2016).Furthermore, AMG has been validated with δ 13 C tracer data of long-term alternative sequences of C4 and C3 crops (Mary et al., 2020).
The version of AMG used in this study was AMGv2, described in detail in (Clivot et al., 2019).Input data necessary to run simulations of SOC stocks with AMG include crop type, annual crop yields and information regarding management of crop residues.These are used to compute annual aboveground and belowground C inputs from plants, here according to the method proposed by Bolinder et al. (2007) and adapted by Clivot et al. (2019).The coefficient of mineralization k (yr −1 ) is calculated according to soil characteristics (clay and carbonate contents, pH and C:N ratio) and climatic conditions (mean annual temperature, precipitation and potential evapotranspiration; Clivot et al., 2019).

Default CS/C0 initialization
Two default values can be used for initialization of SOC pool distribution in AMG, depending on land-use history before the onset of simulations.The initial proportion of CS/C0 equals 0.65 for sites with a long-term arable land-use history.Former long-term grassland sites are expected to have lower CS/C0 and the value of 0.40 was used in previous studies (Saffih-Hdadi and Mary, 2008;Clivot et al., 2019).Since all sites used in this study had been under arable land for at least 20 years before the onset of the experiment, a default value of 0.65 was used.

PARTYSOC-based initialization of CS/C0
The PARTYSOC-based initialization of CS/C0 was derived from data obtained with Rock-Eval® analysis of initial topsoil samples from each LTE.Here, CS/C0 was estimated using the following simple 4-step procedure: first, topsoil samples from the LTE's onset were analysed with Rock-Eval® and the 18 thermal parameters described in Sect.2.3 were calculated for each sample.Second, the thermal parameters were used as input for the PARTYSOC machine-learning model described in Sect.2.4 which was run for this sample set resulting in a sample-specific prediction of centennially stable SOC proportion.Third, the obtained values were averaged per LTE.Fourth, the site mean of the centennially stable SOC proportion was used (as CS/C0) to initialize simulations of SOC stocks for the various treatments of every site (the site standard deviation is reported on Fig. 1 and in Supplementary Material Table 2).Supported by the evident common land-use history shared by the multiple treatments of each site before the onset of simulations and as the SOC stocks and centennially stable SOC contents were very homogeneous amongst each site, we also performed simulations of 17 treatments for which soil samples from the onset of the LTE were not available.In these cases, we considered that the CS/C0 of the treatment was equal to the mean value of the respective site (Supplementary Material Table 1 and 2).

Ex-post optimization of CS/C0
Following a least squares optimization approach, the best fit of the AMG model on observed SOC stocks time series was obtained and the optimal initial SOC pool partitioning (CS/C0) was estimated accordingly for each site (Clivot et al., 2019).In sites with C3-C4 vegetation change chronosequences where δ 13 C long-term monitoring data were available, the model was adapted to simultaneously match the observed evolution of C, C3 and C4 stocks (Clivot et al., 2019) for a given treatment.

Calculation of the centennially stable SOC content
The content of the centennially stable SOC pool of each LTE at initial, intermediate and final dates was estimated through multiplication of the PARTYSOC estimates of the proportion of the centennially stable SOC at a given date by the corresponding total SOC content previously determined using elemental analysis (Clivot et al., 2019;Levavasseur et al., 2020).For example, for the onset of an LTE where t=0: CS = CS/C0•C0, where CS is the stable SOC content (g C kg soil −1 ), and C0 is the total SOC content (g C kg soil −1 ) at time t=0.

Statistics
The fit between PARTYSOC predictions of centennially stable SOC proportion and AMG ex-post optimized stable SOC proportion was assessed by a linear regression model.The same approach was applied for the evaluation of the agreement between centennially stable SOC content and AMG ex-post optimized stable SOC content of initial samples.The evaluation of the performance of the AMG model, for the different SOC pool partitioning initialization methods, was also based on simple linear regressions between simulated and observed SOC stock values.Statistical terms used to express the strength and the statistical significance of the relationships were the coefficient of determination (R 2 ), and the associated probability value (pvalue).Prediction bias and model error were expressed as mean difference (BIAS), and relative mean square error (RMSE).
The relative root mean square error (RRMSE) and the normalized root mean square error (NRMSE) were used to compare the error of different data sets (with a different range of predictions) (Smith et al., 1996;Wallach, 2006;Otto et al., 2018).The observed and simulated total SOC stock change dQC was calculated as follows for each treatment: where QCobs is the observed SOC stock at time t, QCsim is the SOC stock at time t simulated with AMG, t1 indicates the start and t2 the end of simulation period.

Accurate soil organic carbon pool partitioning
Centennially stable SOC proportion values were predicted by the PARTYSOC machine learning model (Cécillon et al., 2021) using Rock-Eval® data measured on initial topsoil samples.The mean value for each independent site was plotted against the stable SOC proportion as determined by AMG ex-post optimization (Fig. 1).The initial centennially stable SOC proportion values predicted with PARTYSOC ranged from 0.44 to 0.74, with a mean value of 0.59, whereas AMG optimal ex-post estimations of stable SOC proportion covered almost the same range, from 0.45 to 0.74, with a mean value of 0.61.The two approaches were strongly correlated (R 2 = 0.63, significant at the p < 0.05 level), with a linear regression slope close to 1 (a = 0.9) and intercept close to 0 (b = 0.04), showing an unbiased relationship between PARTYSOC estimates of the centennially stable SOC proportion and the AMG ex-post optimized stable SOC proportion at the onset of the nine LTEs.Although a slight discrepancy was observed for higher stable SOC proportion values, the results validate our hypothesis showing that centennially stable SOC proportion determined by Rock-Eval® thermal analysis and the PARTYSOC machine-learning model built on fully independent data provides a good estimate of the optimal stable SOC proportion of the AMG model for unrelated French agricultural soils.When expressed as content (g C kg soil −1 ), the fit between the PARTYSOC predictions of the centennially stable SOC determined on initial topsoil samples and the ex-post optimized stable SOC content values was excellent (R² = 0.95; Supplementary Material Fig. 4; optimal stable SOC content ranged from 4.37 to 12.75 g C kg soil −1 across the nine sites).Furthermore, the method appears to be reliable since additional Rock-Eval® measurements on topsoil samples from intermediate and final dates of the LTEs showed that the PARTYSOC predictions of the centennially stable SOC content remained remarkably constant during the experimental period at most sites (Supplementary Material Fig. 5).

More accurate soil organic carbon simulations
In a second step, we investigated if a PARTYSOC-based initialization of the SOC pool partitioning could improve the accuracy of SOC stock simulations of the AMG model.To do so, we compared SOC stock simulations obtained with three different initializations.We first ran AMG using the default initialization method for the SOC pool partitioning (CS/C0 = 0.65 since all LTEs were under cropland for at least two decades before their onset; Table 1).Then, we ran AMG simulations using the PARTYSOC-based initialization method by defining CS/C0 as the site-mean centennially stable SOC proportion determined by the PARTYSOC model.Finally, we ran AMG using the ex-post optimization method to initialize the SOC pool partitioning for each site.For all three initialization procedures, the simulated SOC stock change between the initial and last sampling date for each treatment of each site was plotted against the measured SOC stock change (Fig. 2a-c).Observed SOC stock change ranged from +6 to −24 Mg C ha −1 for the 32 treatments.In spite of a rather good mean agreement (RMSE = 5.95 Mg C ha −1 ), the AMG model initialized with the default procedure provided predictions of SOC stock change rather far from what was observed in two out of nine LTEs (Fig. 2a).Using the PARTYSOC-based initialization method improved AMG simulations compared to the default method, bringing them much closer to the observed SOC stock changes (RMSE = 3.60 Mg C ha −1 ; Fig. 2a, b).PARTYSOC-based initialization of AMG resulted to unbiased simulations (BIAS = 0.06 Mg C ha −1 ) and a strong decrease in the mean error of prediction.Unsurprisingly, AMG initialized using ex-post optimized CS/C0 proportions predicted SOC stock changes very close to the observed ones (RMSE = 2.12 Mg C ha −1 ; Fig. 2c).AMG simulations from ex-post optimized and PARTYSOC-based initializations were remarkably comparable (Fig. 2b, c).The SOC stock simulations produced with AMG for each independent treatment are presented in Supplementary Material Fig. 6.It is noteworthy that the PARTYSOC-based initialization improved the fit between observed and simulated SOC stock change, compared to AMG default initialization, especially for treatments that experienced the greatest SOC stock loss (Fig. 2a, b).In treatments that experienced no SOC stock change or a slight increase in SOC stock, the PARTYSOC-based initialization did not improve the simulations but resulted in highly reliable predictions, similarly to AMG default or optimized initialization methods (Fig. 2a-c).This is likely explained by the history of land cover and soil management practices of the different sites.
Sites presenting treatments with no change or a slight increase in SOC stocks were predominantly sites with a long cropland history (e.g.site of Boigneville; Supplementary Material Table 1), for which the default AMG CS/C0 value of 0.65 is nearly optimal.Conversely, the two sites, Kerbernez and Tartas, where the ex-post optimized CS/C0 values were far below the default value (Fig. 1) have a more complex history of land use and soil management practices.The site of Kerbernez is a former grassland (during the first half of the 20 th century; Supplementary Material Table 1) that was converted into a cropland two decades before the implementation of its arable LTE, in 1958.The site of Tartas was cultivated for a longer time before the LTE onset, however it was turned to grassland for a period in the 19 th century (Supplementary Material Table 1) and received applications of poultry manure for several years before the LTE began.In these two sites, characterized by an optimal AMG CS/C0 much lower than the default value, the PARTYSOC machine-learning model predicted values very close to the optimal CS/C0 values (Fig. 1).

Discussion
Our study demonstrates that the PARTYSOC method based on Rock-Eval® thermal analysis (Cécillon et al., 2018(Cécillon et al., , 2021) ) can estimate the initial SOC pool partitioning of the AMG model of SOC dynamics while improving its accuracy in a series of diverse and independent French LTEs.Contrary to previous studies (Skjemstad et al., 2004;Luo et al., 2014), no modifications of the decomposition rate of SOC kinetic pools were necessary to improve model predictions.The PARTYSOC initialization method never severely affected the model simulations while it strongly improved them at sites where SOC stocks were far from an equilibrium state due to historical changes in soil management or land use.Areas with past changes in land use and soil management represent a large yet poorly known part of arable land in France and Europe (Fuchs et al., 2015;Erb et al., 2017) where SOC stocks and slow-cycling SOC pools are far from equilibrium (Wutzler and Reichstein, 2007;Herbst et al., 2018;Clivot et al., 2019;Taghizadeh-Toosi et al., 2020).Therefore, by accounting for these legacy effects of site history on SOC pool partitioning, the PARTYSOC-based initialization of the AMG model should result in more accurate simulations of SOC dynamics at a national or continental scale.
Our findings combined with results reported in recent ensemble modelling studies (Martin et al., 2019;Farina et al., 2021), suggest that despite its simple structure and when properly initialized (e.g. using the PARTYSOC model) the AMG model is unsurpassed for predicting observed SOC stock changes in French agricultural LTEs, and is amongst the best available modelling frameworks of SOC dynamics in European arable land (Martin et al., 2019;Farina et al., 2021).Our results demonstrate that there is still potential to increase the accuracy of simple multi-compartmental models of SOC dynamics, bringing their simulations very close to the observed values of SOC stock changes.Developing other Rock-Eval®-based initialization methods specifically designed to match the carbon pool design of other multi-compartmental SOC dynamics models such as RothC (Coleman et al., 1997) is a promising research area.More generally, we recommend that the potential of multi-compartmental SOC dynamics models be fully explored and exploited by soil biogeochemists before a new generation of models of increased complexity becomes operational.While new models including the diversity of microbial communities and related processes are emerging (Lehmann et al., 2020;Crowther et al., 2019), the uncertain structure and parametrization of more complex models is hindering their application as robust predictive tools (Shi et al., 2018).At the same time, simple conceptual models of SOC dynamics like AMG combined with novel initialization methods and data-based approaches such as PARTYSOC show promising improvements (Cécillon, 2021a;Dangal et al., 2021;Lee et al., 2020).The low prediction error of the AMG model when its SOC pool distribution is initialized with the PARTYSOC method even challenges the ability of more complex modelling approaches to achieve better performance, given the uncertainty on observed values of SOC stock changes (Schrumpf et al., 2011).
The continental or worldwide implementation of the AMG model with the PARTYSOC-based initialization of SOC pools distribution will require additional work.First, the PARTYSOC machine-learning model (Cécillon et al., 2018(Cécillon et al., , 2021) ) will have to be validated on a wider range of pedoclimates.This method initially built on LTEs coming from North-western Europe (Cécillon et al., 2018), has now been successfully extended to new soil types and a new climate (tropical; Cécillon et al., 2021).
The good agreement between optimal AMG stable SOC proportion values and PARTYSOC predictions reported here suggests that most agricultural LTEs with accurate AMG simulations could be used as reference sites for the PARTYSOC model, lifting an important technical limitation to its geographical expansion (Cécillon et al., 2021).Second, the improved accuracy of model simulations using a PARTYSOC-based initialization will also have to be demonstrated for a wider pedoclimatic range (i.e. worldwide LTEs; such as those referenced by the International Soil Carbon Network; Nave et al., 2015).Third, Rock-Eval® data from the new application areas will be required.Rock-Eval® is a high-throughput technique that is well adapted to the analysis of large soil sample sets provided by large-scale soil monitoring programs.We recommend implementing Rock-Eval® measurements in national and continental soil monitoring networks.

Conclusions
Combining Rock-Eval® thermal analysis with the PARTYSOC machine-learning model should be considered as an emerging key approach with demonstrated ability to improve the accuracy of simulations of SOC dynamics, complementary to other SOC cycling proxies (Bailey et al., 2018;Wiesmeier et al., 2019).The progressive large-scale delivery of these complementary data related to SOC dynamics will strengthen model predictions of SOC stock changes at the national to global scale, necessary for implementing efficient climate change mitigation policies (FAO, 2020).
and S are the observed and simulated values, n is the number of observations,  ̅ and  ̅ are the means of observations and simulations, respectively, and Omax and Omin are the maximum and the minimum value observed.

Figure 1 :
Figure 1: Performance of the PARTYSOC model to predict the centennially stable SOC proportion compared to the AMG ex-post 300

Figure 2 :
Figure 2: Observed vs. simulated change in SOC stocks between the initial and final date of 32 treatments from nine French longterm experiments.The three panels show the performance of the AMG model for three different initialization approaches.Initial SOC kinetic pool sizes were defined using a, the default value for cropland (CS/C0 = 0.65), b, the centennially stable SOC proportion predicted by the PARTYSOC model and c, the AMG ex-post optimized CS/C0 proportion.Statistics refer to the linear regression