A global dataset on phosphorus in agricultural soils

Ringeval, Bruno; Demay, Josephine; Goll, Daniel S.; He, Xianjin; Wang, Ying-Ping; Hou, Enqing; Matej, Sarah; Erb, Karl-Heinz; Wang, Rong; Augusto, Laurent; Lun, Fei; Nesme, Thomas; Borrelli, Pasquale; Helfenstein, Julian; McDowell, Richard W.; Pletnyakov, Peter; Pellerin, Sylvain

doi:10.1038/s41597-023-02751-6

Download PDF

Data Descriptor
Open access
Published: 02 January 2024

A global dataset on phosphorus in agricultural soils

Scientific Data volume 11, Article number: 17 (2024) Cite this article

2318 Accesses
5 Altmetric
Metrics details

Subjects

Abstract

Numerous drivers such as farming practices, erosion, land-use change, and soil biogeochemical background, determine the global spatial distribution of phosphorus (P) in agricultural soils. Here, we revised an approach published earlier (called here GPASOIL-v0), in which several global datasets describing these drivers were combined with a process model for soil P dynamics to reconstruct the past and current distribution of P in cropland and grassland soils. The objective of the present update, called GPASOIL-v1, is to incorporate recent advances in process understanding about soil inorganic P dynamics, in datasets to describe the different drivers, and in regional soil P measurements for benchmarking. We trace the impact of the update on the reconstructed soil P. After the update we estimate a global averaged inorganic labile P of 187 kgP ha⁻¹ for cropland and 91 kgP ha⁻¹ for grassland in 2018 for the top 0–0.3 m soil layer, but these values are sensitive to the mineralization rates chosen for the organic P pools. Uncertainty in the driver estimates lead to coefficients of variation of 0.22 and 0.54 for cropland and grassland, respectively. This work makes the methods for simulating the agricultural soil P maps more transparent and reproducible than previous estimates, and increases the confidence in the new estimates, while the evaluation against regional dataset still suggests rooms for further improvement.

Environmental drivers of increased ecosystem respiration in a warming tundra

Article Open access 17 April 2024

Plant responses to changing rainfall frequency and intensity

Article 09 April 2024

Hotspots of biogeochemical activity linked to aridity and plant traits across global drylands

Article 12 April 2024

Background & Summary

Soil phosphorus (P) availability limits crop yields in many regions of the World^1,2, while anthropogenic application of P in excess triggers aquatic eutrophication in other regions³. While P in unmanaged soils can be predicted to some degree from local soil properties and climate^4,5,6, this approach cannot be used for agricultural systems because humans have altered the P cycle in these ecosystems. E.g. the change in soil P input/output related to farming practices has modified the spatial distribution of P in soils^7,8,9 with long-lasting legacy effects of past management¹⁰. Nevertheless, an analysis of the relationships between crop productivity and soil P availability or between P losses and soil P on regional to global scale requires a spatially explicit agricultural soil P dataset. At the global scale, such a dataset has only recently been developed by Ringeval et al.¹¹ and Zhang et al.¹².

Both Ringeval et al.¹¹ and Zhang et al.¹² combined datasets describing the different drivers of the global spatial distribution of soil P with a process-based model of soil P dynamics to reconstruct the past and present distributions of P in agricultural soils. Hereafter, the term “drivers” encompasses: variables related to soil P input and output (farming practices, deposition, erosion, sludge), land-use change, variables that have an effect on soil P dynamics (soil properties, climate), and biogeochemical background (i.e. the initial soil P content at the time of conversion to cropland). The soil P dynamics model was used to reconstruct how the different soil P pools characterized by different bio-availabilities evolved in time from the biogeochemical background following i) soil P input/output directly (e.g. farming practices) or indirectly (land-use change) derived from some drivers and ii) as function of some drivers that modulate the fluxes between soil P pools (e.g. climate).

Refs. ^11,12 used similar datasets to describe the main drivers. In particular, both used the estimates by Bouwman et al.¹³ for soil P input/output corresponding to farming practices, and unmanaged soil P data by Yang et al.⁴ to approach the biogeochemical background. These dataset have some limitations: e.g. the soil P input/output in grassland provided by Bouwman et al.¹³ was poorly described with the plant uptake forced to be equal to a constant fraction of soil P input. In the work of Yang et al.⁴ how the total soil P is distributed among different soil pools was only a function of soil order while it is now known that the soil order is a poor predictor¹⁴. The representation of soil P pools and fluxes between these pools differed between Ringeval et al.¹¹ and Zhang et al.¹², reflecting the gaps in our understanding of the soil P dynamics and data to constrain process-based models. Zhang et al.¹² simulated only two soil pools (one labile and one stable pool) with poor representation of soil P dynamics, whereas Ringeval et al.¹¹ used a 6-pool P model based on Hedley fractionation method. Despite a more apparent mechanistic representation of soil P transformation, the model used byRingeval et al.¹¹ was based on a very limited knowledge about soil P dynamics: e.g. the soil P buffering capacity was represented as function of soil order only and was poorly constrained by measurements¹⁵.

Because of the above-mentioned limitations, soil P maps simulated by both Ringeval et al.¹¹ and Zhang et al.¹² had low confidence and none of these studies really investigated the realism of the spatial distribution of the soil P simulated: Zhang et al.¹² focused on the P plant uptake while Ringeval et al.¹¹ assessed the contribution of each driver to the spatial variation of soil P. The simulated soil P maps were not evaluated and this was in partly explained by missing large-scale datasets of P measurements at that time. Since the initial release in 2017, the knowledge about soil P dynamics has improved, datasets representing some drivers of soil P have been updated, and data on measured soil labile P in agricultural soils started to be available, offering the opportunities to generate more reliable global soil P maps. In particular, the comparison between different measurements (isotopic exchange kinetics and Hedley fractionation) improves our understanding of the forms extracted^16,17 and our ability to represent the fluxes between these pools¹⁸. Also, parameterizations of exchanges between inorganic P pools have been improved by considering the effects of soil properties on these fluxes¹⁸. Dataset used to describe some drivers has been also improved. In particular, both total soil P and its fractions within different pools have been improved by including more predictors and more sites to train the machine learning algorithms in these studies, improving our understanding of variables driving soil P distributions in unmanaged soils⁶. Finally, regional datasets on measured soil available P based on soil monitoring networks are now available (e.g. LUCAS database¹⁹).

Here we updated the approach described in Ringeval et al.¹¹ with the aim to improve the global soil P dataset in five ways: i) use new theoretical understanding of inorganic soil P dynamics, ii) use new and better constrained datasets to describe the different drivers, iii) benchmark the new soil P maps with available available global and regional estimates, iv) extend the period of simulations over time (simulations ends in 2005 in Ringeval et al.¹¹), v) clarify the different assumptions used throughout all steps of our approach and v) make the whole approach (i.e. scripts to download the datasets, scripts to generate the input from these datasets, soil P dynamic model) available to the community. Points iv) and v) would make further updates easier.

The update of Ringeval et al.¹¹ described in this article provides the simulated size of different soil P pools for the period 1900–2018, with a spatial resolution of 0.5° for both croplands and grasslands. We also provided estimates of the uncertainty related to the description of the different drivers. Spatially gridded estimates of soil P pools will be useful to quantify the current limitation of agricultural production by P and to inform assessments of strategies aiming at reduced fertilizer input by making better use of P present in soils.

Method

General approach

Similar to Ringeval et al.¹¹, the approach used here modelled the global distribution of soil P in croplands and grasslands, with a 0.5° × 0.5° spatial resolution. No vertical discretization was considered as the approach considers only one soil layer, which is the top 0–0.3 m soil layer. For each grid-cell, soil P was distinguished into four land-cover type fractions: (i) cropland (crop), (ii) grassland (grass) defined as managed pasture + rangeland in Chini et al.²⁰, (iii) non-agricultural vegetation (nonagri), and (iv) urban land (urban). While the soil P of the last two land-cover fractions were not explicitly simulated (see below), their consideration was necessary to account for the effect of the land-use and cover change on the cropland/grassland soil P. Land-use and cover change and soil P input/output were prescribed at yearly time-step while soil P dynamics work at daily time-step. Simulations cover the 1900–2018 period.

Our modelling approach combined several global datasets describing the drivers of agricultural soil P with a soil P dynamics model to simulate the temporal evolution of the agricultural soil P from the beginning of the 20th century to the present time period. Drivers are named with four capital letters hereafter. Starting from the soil biogeochemical background used as initial conditions for the year 1900 (BIOG), our modelling approach compute the temporal evolution of soil P pools for cropland and grassland by, i) computing annually the effect of land-use and land-cover change (LUCC), ii) considering the annual soil P input/output corresponding to different drivers (chemical fertilizer, manure, plant uptake and plant residues, encompassed under the term “farming practices”, FARM; input from atmospheric deposition, DEPO; P losses through water erosion, LOSS, P input from sludges, SLUD), iii) simulating at daily time-step the soil P dynamics (i.e. how the different soil P pools exchanges together). Fluxes between soil P forms depends on two drivers: near-surface air temperature and soil temperature/water content (CLIM), and soil properties (SPRO) In total, eight drivers were considered (Tables 1–8). The datasets used to represent the different drivers resulted from the combination of different measurements (satellites, on site measurements, etc.) and/or model simulations. All data sets were regridded to a half-degree resolution.

Table 1 FARM driver and its representation in (data = GPASOIL-v0) and (data = GPASOIL-v1).

Full size table

Table 2 DEPO driver and its representation in (data = GPASOIL-v0) and (data = GPASOIL-v1).

Full size table

Table 3 SLUD driver and its representation in (data = GPASOIL-v0) and (data = GPASOIL-v1).

Full size table

Table 4 LOSS driver and its representation in (data = GPASOIL-v0) and (data = GPASOIL-v1).

Full size table

Table 5 LUCC driver and its representation in (data = GPASOIL-v0) and (data = GPASOIL-v1).

Full size table

Table 6 CLIM driver and its representation in (data = GPASOIL-v0) and (data = GPASOIL-v1).

Full size table

Table 7 SPRO driver and its representation in (data = GPASOIL-v0) and (data = GPASOIL-v1).

Full size table

Table 8 BIOG driver and its representation in (data = GPASOIL-v0) and (data = GPASOIL-v1).

Full size table

Independent of any driver, three other soil properties were used in our approach to perform change in unit, namely the bulk density of the fine earth fraction, the volumetric fraction of coarse fragments ( > 2 mm) and the soil depth. These three variables were either used to convert concentration (i.e. per kg of soil) into quantity per soil surface for both P or water, or to compute soil eroded in fraction of soil mass lost per year. In particular, soil P input/output were mainly given in the different dataset in kgP ha⁻¹ yr⁻¹ and soil P dynamics worked with P concentration in mgP (kg soil)⁻¹. Given the scarcity of dataset about the thickness of the plough layer at the global scale, we considered a globally uniform thickness of 0.3 m as in Ringeval et al.¹¹. This soil layer was also considered as encompassing a major proportion of crop roots. The soil bulk density at 0.5° latitude and longitude resolution from Soilgrids 2.0²¹ was used to approach the density of cropland/grassland of each grid-cell given the unavailability of global datasets focusing on agricultural soils, even though it is known that soil treatment has an effect on soil physical properties²². Same reasoning applied to the fraction of coarse fragments, also provided by Soilgrids 2.0. The uncertainty associated to the bulk density and fraction of coarse fragments was not considered in our study as we restricted the uncertainty analysis to the one of the drivers. In the present study, all P pools were finally expressed in kgP ha⁻¹ for top 0–0.3 m soil layer. Large-scale soil P pools are computed by using cropland and grassland fraction (see LUCC driver below) and grid-cell land area (computed from regriding the land fraction provided by HYDE 3.2²³, https://doi.org/10.17026/dans-25g-gez3).

The new estimates of the soil P distribution in cropland and grasslands are called GPASOIL-v1 and we used the name GPASOIL-v0 to describe the estimates calculated in the previous study of Ringeval et al.¹¹. Estimates result from a coupling between dataset describing drivers (called “data” in the following) and a soil P dynamics model (called “model” hereafter), and v0 or v1 can be attributed to each component (data or model) as each one has been updated from the study of Ringeval et al.¹¹ to the current one.

Soil P dynamics model

Pools and fluxes design

The soil P pools considered in (model = GPASOIL-v1) were named with the following nomenclature: P_a-b with a in {i,o,x} (i for inorganic, o for organic and x for inorganic + organifc (i + o)); and b referring to different types of soil P pools. In total, 7 pools were considered following the merging of Hedley fractions (Table 9, Fig. 1b): P_i-sol (inorganic P in soil solution), P_i-lab (labile inorganic P), P_i-sec (moderately labile inorganic P), P_i-prim (primary inorganic P), P_o-lab (labile organic P), P_o-sta (stable organic P), P_x-occ (occluded P). P_x-tot is the sum of all pools considered.

Table 9 Corresponding between pools name and Hedley fractions in (model = GPASOIL-v0) and (model = GPASOIL-v1).

Full size table

Hereafter, the following nomenclature was used to name the flux mediated by the process “proc” from pool m1 to m2: $f{P}_{proc}^{m1\to m2}$. The superscripts m1 and m2 correspond to soil P pools in case of soil P dynamics or could be used to describe soil P input/output (« out » for outside is used in that case). The name “proc” was used to describe the process that lead to the flux considered and either referred to a soil process (“occl” for occlusion, “deoccl” for deocclusion, “sorp” for sorption, “desorp” for desorption, “weat” for weathering, “mine” for mineralization), or to a soil P input/output (“chem” for chemical fertilizer, “manu” for manure, “resi” for plant residues, “upta” for plant uptake, “loss” for losses through water erosion, “depo” for atmospheric deposition, “slud” for sludges from waste treatment). As examples, $f{\,P}_{mine}^{o-sta\to i-lab}$ corresponded to the mineralization of P_o-sta to P_i-lab while $f{P}_{chem}^{out\to i-lab}$ corresponded to soil P input of chemical fertilizer to P_i-lab. $f{P}_{lucc}$ was used to name the effect of land-use change on soil P (section “Effect of land-use and land-cover change (LUCC) on cropland/grassland soil P”).

In (model = GPASOIL-v1), the equations describing soil inorganic P dynamics between P_i-sec, P_i-lab, P_x-occ, P_i-sol were given by Wang et al.¹⁸. Weathering and mineralization were simulated as in (model = GPASOIL-v0) but other mineralization rates values were here tested (see section “Parameter estimates”). In the present section, fP was expressed in mgP (kg soil)⁻¹ day⁻¹, soil P pools (P_i-sol, P_i-sec, etc.) in mgP (kg soil)⁻¹ and soil water content (W_abs) in L (kg of soil)⁻¹ to make parameters unit consistent to the units used in Wang et al.¹⁸ (Table 10). The different unit conversions were done by using the bulk density (in kg m⁻³), volumetric fraction of coarse fragments (no unit) and spatially constant soil depth (0.3 m). Fluxes and pools varied as functions of year (y), land-cover (lu, either equal to crop or grass) and space (g, for grid-cell) (i.e. fP = fP(y,lu,g)). Parameters involved in Eqs. 1–8 (e.g. ${k}^{i-sec\to x-occ}$ and b) varied as a function of grid-cell only and W_abs as a function of year and grid-cell; but for the sake of readability, we omit these in the below equations.

Table 10 Original parametrizations provided by Wang et al.¹⁸ and modified parametrizations used in this study after i) exclusion of oxalate concentrations from explanatory variablex and ii) setting k parameters as constant in time.

Full size table

As in Wang et al.¹⁸, occlusion/deocclusion are given by Eqs. 1, 2 and sorption/desorption between P_i-sec and P_i-sol are computed with Eqs. 3, 4:

$$f{P}_{occl}^{i-sec\to x-occ}={k}^{i-sec\to x-occ}.{P}_{i-sec}$$

(1)

$$f{P}_{deoccl}^{x-occ\to i-sec}={k}^{x-occ\to i-sec}.{P}_{x-occ}$$

(2)

$$f{P}_{sorp}^{i-sol\to i-sec}={k}^{i-sol\to i-sec}.{\left({P}_{i-sol}/{W}_{{\rm{abs}}}\right)}^{b}$$

(3)

$$f{P}_{desorp}^{i-sec\to i-sol}={k}^{i-sec\to i-sol}.{P}_{i-sec}$$

(4)

where P_i-sol/W_abs corresponds to the P concentration in the soil solution (in mgP L⁻¹), k (in day⁻¹) and b (unitless) are parameters. Sorption follows a Freundlich equation.

In Wang et al.¹⁸, sorption/desorption fluxes between P_i-sol and P_i-lab are computed as follows:

$$f{P}_{desorp}^{i-lab\to i-sol}={k}^{i-lab\to i-sol}.{P}_{i-lab}$$

(5)

$$f{P}_{sorp}^{i-sol\to i-lab}={k}^{i-sol\to i-lab}{\left({P}_{i-sol}/W\right)}^{b}$$

(6)

To simplify the application of the¹⁸ model at the global scale in the current study, an equilibrium between P_i-sol and P_i-lab is assumed at daily time-step, following Wang et al.¹⁵. Thus, ${k}^{i-lab\to i-sol}{P}_{i-lab}={k}^{i-sol\to i-lab}{\left({P}_{i-sol}/{W}_{{\rm{abs}}}\right)}^{b}$ i.e.:

$${P}_{i-lab}=\frac{{k}^{i-sol\to i-lab}}{{k}^{i-lab\to i-sol}}{\left({P}_{i-sol}/{W}_{{\rm{abs}}}\right)}^{b}$$

(7)

Numerical resolution of Eq. 7 was done by substituting P_i-sol,lab = P_i-sol + P_i-lab in Eq. 7 and rearranging to get:

$${k}^{i-sol\to i-lab}{\left({P}_{i-sol}\right)}^{b}+{k}^{i-lab\to i-sol}{\left({W}_{{\rm{abs}}}\right)}^{b}{P}_{i-sol}-{k}^{i-lab\to i-sol}{\left({W}_{{\rm{abs}}}\right)}^{b}{P}_{i-sol,lab}=0$$

(8)

where the unique unknown is P_i-sol. Equation 8 was used to redistribute P_i-sol,lab between P_i-sol and P_i-lab by assuming a steady-state between P_i-sol and P_i-lab. Equation 8 was solved using scipy.optimize.root solver in python3.6.

The Wang et al.¹⁸ model excludes inorganic primary P and organic pools, and their exchanges with other inorganic pools. To model weathering and mineralization in (model = GPASOIL-v1), the same equations as in (model = GPASOIL-v0) were used. The weathering is computed as follows:

$$f{P}_{weat}^{i-prim\to i-lab}={k}^{i-prim\to i-lab}\ast {g}_{1}\left({T}_{soil}\right)\ast {g}_{2}\left({W}_{rel}\right)\ast {P}_{i-prim}$$

(9)

where g₁ and g₂ (unitless) described the sensitivity to soil temperature (T_soil, in °C) and relative soil water content (W_rel, unitless), respectively. As detailed in Ringeval et al.¹¹, g₁(T) is described with a Q₁₀ of 2.4 and a reference temperature of 15 °C and g₂(W_rel) = W_rel. The relative soil water content is defined by:

${W}_{rel}={W}_{{\rm{abs}}}/{W}_{fc}$ with W_fc representing the field water capacity (in L (kg of soil)⁻¹). Dataset used to get values of W_abs, W_fc, T_soil are described in the CLIM driver.

Fluxes from organic to inorganic pools (mainly assimilated to mineralization) are given by:

$$f{P}_{mine}^{o-sta\to i-lab}={k}^{o-sta\to i-lab}\ast {h}_{1}\left({T}_{soil}\right)\ast {h}_{2}\left({W}_{rel}\right)\ast {P}_{o-sta}$$

(10)

$$f{P}_{mine}^{o-lab\to i-lab}={k}^{o-lab\to i-lab}\ast {h}_{1}\left({T}_{soil}\right)\ast {h}_{2}\left({W}_{rel}\right)\ast {P}_{o-lab}$$

(11)

where h₁ and h₂ represent the sensitivity to soil temperature and soil water content and ${k}^{o-sta\to i-lab}$ and ${k}^{o-lab\to i-lab}$ are turnover rates for stable and labile pools (in day⁻¹), respectively. The same functions as the ones used in the ORCHIDEE²⁴ dynamic global vegetation model to characterize the carbon mineralization were prescribed to h₁(T_soil) and h₂(W_rel) (i.e. Q₁₀ = 2 and reference temperature = 30 °C for h₁; ${h}_{2}\left({W}_{rel}\right)=-1.1{\left({W}_{rel}\right)}^{2}+2.4{W}_{rel}-0.29$).

At the 1^st day of each year, soil P input/output (chemical fertilizer, manure, plant residues, plant uptake, atmospheric deposition, sludges, losses by water erosion) were added/removed to/from the different soil P pools. Each soil input/output had a given composition that defines which pools were enriched or depleted, as provided later. The only exception to this rule was P_i-sol which was neither enriched nor depleted by soil P input/output in the 1^st day. Any P fluxes that should have reached or leave P_i-sol was set to occur with the P_i-lab pool instead. For instance, while it is well known that plant only takes up P in soil solution, our modeling approach did not allow to simulate daily plant growth and related P uptake from P in solution. Instead, our methodological choice was to remove annual plant uptake from P_i-lab at the 1^st day of the year, then to make P_i-lab and other soil pools interact for each following day of the rest of the year. Within each time-step (day), the following order of processes was considered: exchanges between pools/forms were first computed, then pools were updated according to these changes and finally, the equilibrium between P_i-sol and P_i-lab was computed.

To avoid over depletion of a given soil P pool, i.e. the net output flux at the model time step exceeding the pool size, we restricted this flux to the pool size. The effect of this restriction will be evaluated in the section « Technical validation ». After removing $f{P}_{upta}^{i-lab\to out}$ from P_i-lab and before computing the equilibrium between P_i-sol and P_i-lab, if (P_i-sol +P_i-lab) was negative, (P_i-sol +P_i-lab) was set to 0 and the uptake not satisfied by the soil P pools simulated was saved to a variable called $f{P}_{upns}$. This variable corresponded to the part of the uptake which was prescribed to the soil P dynamics model but that the model was not able to sustain. In that case, $f{P}_{upta}^{i-lab\to out}$ was corrected to the maximum uptake that was determined by the size of (P_i-sol +P_i-lab).

Parameter estimates

Equations 1–8 are based on Wang et al.¹⁸. To estimate the flux parameters b, ${k}^{i-lab\to i-sol}$, ${k}^{i-sol\to i-sec}$, and ${k}^{i-sec\to x-occ}$, Wang et al.¹⁸ proceed in two steps by first calibrating these parameters against isotopic exchange kinetics experiments from 147 sites. Then, they regressed such optimized parameters against pedo-climatic variables. They found that b, ${k}^{i-lab\to i-sol}$, ${k}^{i-sol\to i-sec}$, ${k}^{i-sec\to x-occ}$ correlate with soil P fractions of different inorganic pools (the term “fraction” is used here to define the ratio between a given pool and the sum of all pools considered in Wang et al.¹⁸, see the variables called f in Eqs. 12–14 and definition in Eq. 15), soil organic carbon and oxalate extractable metal oxide concentrations. Parametrizations found in Wang et al.¹⁸ are given in Table 10. Here, in order to estimate these parameters at the global scale at which oxalate extractable metal oxide concentrations is not available, we re-compute the regressions after excluding oxide concentrations from explanatory variables (Table 10).

Wang et al.¹⁸ derived other parameters (${k}^{i-sol\to i-lab}$, ${k}^{i-sec\to i-sol}$, ${k}^{x-occ\to i-sec}$) from relationships with b, ${k}^{i-lab\to i-sol}$, ${k}^{i-sol\to i-sec}$, and ${k}^{i-sec\to x-occ}$. These relationships were also used in our study:

$$\frac{{k}^{i-sol\to i-lab}}{{k}^{i-lab\to i-sol}}=\frac{{f}_{i-lab}\overline{{W}_{{\rm{abs}}}}}{b{\left({P}_{c,\infty }\right)}^{b-1}{f}_{i-sol}}$$

(12)

$$\frac{{k}^{i-sol\to i-sec}}{{k}^{i-sec\to i-sol}}=\frac{{f}_{i-sec}\overline{{W}_{{\rm{abs}}}}}{b{\left({P}_{c,\infty }\right)}^{b-1}{f}_{i-sol}}$$

(13)

$$\frac{{k}^{x-occ\to i-sec}}{{k}^{i-sec\to x-occ}}=\frac{{f}_{i-sec}}{{f}_{x-occ}}$$

(14)

where ${P}_{c,\infty }$ is the concentration of P in soil solution at steady-state, overlined W_abs means temporal average of W_abs, f_m denotes the ratio (no unit) at steady-state of pool m and the sum of inorganic pools considered in Wang et al.¹⁸ called here P_i-tot\prim, e.g. for m = i-sec,

$${f}_{i-sec}=\frac{{P}_{i-sec;\infty }}{{P}_{i-tot\backslash {\rm{prim}};\infty }}=\frac{{P}_{i-sec;\infty }}{{P}_{i-sec;\infty }+{P}_{i-sol;\infty }+{P}_{x-occ;\infty }+{P}_{i-lab;\infty }}$$

(15)

with subscript ∞ denotes pools at the steady-state. Note that Wang et al.¹⁸ made distinctions between measured (f*) and modeled (f) fractions but this distinction does not make sense in this study where fractions are only simulated. Wang et al.¹⁸ also considered W_abs equal to 10 L (kg of soil)⁻¹ representative to water content of isotopic dilution while we used here the averaged over the whole simulation (1900–2018) of W_abs. The parameter ${P}_{c,\infty }$ that describes the concentration of P in soil solution at steady-state depends only on oxalate in Wang et al.¹⁸ and we prescribed here an arbitrary constant value:

$${P}_{c,\infty }=0.1\;{{\rm{mgPL}}}^{-1}$$

(16)

which is equal to the median value of Helfenstein et al.²⁵. The sensitivity to the value chosen for ${P}_{c,\infty }$ was assessed in the “Technical validation” section. Dataset for calculating soil parameters using the relationships in Table 10 are given in the SPRO driver description (Table 7).

The rate parameters (different k) and parameter b are only grid-cell dependent and do not differ between cropland and grassland. Following parameterizations found in Wang et al.¹⁸, these parameters could theoretically vary in time with temporal variations of the different soil P fractions at equilibrium (f), organic P (P_o-tot) and the sum of total inorganic pools considered in Wang et al.¹⁸ (i.e. ${P}_{i-tot{\rm{\backslash prim}}}$) (see Table 10). We choose here to keep them constant through i) the use of time-invariant value for ${P}_{m,\infty }$ (and thus for f_m) for any soil pool m and, ii) the use of ${P}_{o-tot,\infty }$ and ${P}_{i-tot{\rm{\backslash prim}},\infty }$ in Table 10 instead of ${P}_{o-tot}$ and ${P}_{i-tot{\rm{\backslash prim}}}$, respectively. ${P}_{m,\infty }$ is assumed to have the current value of unmanaged soil P pools ${P}_{m}^{NA}$ given by He et al.¹⁴ for the same grid-cell g, as follows:

$${P}_{m,\infty }\left(g\right)={P}_{m}^{NA}\left(g\right)$$

(17)

in a similar way to the computation of initial conditions (see the section about LUCC).

To summarize, the main differences between the original Wang et al.¹⁸ model and the model used here for inorganic soil P dynamics are the following:

- we assumed an equilibrium between P_i-sol and P_i-lab (Eq. 8) instead of (Eqs. 5-6)

- our parametrization excluded oxalate as explanatory variables (thus simplified equations for b, ${k}^{i-lab\to i-sol}$, ${k}^{i-sol\to i-sec}$, and ${k}^{i-sec\to x-occ}$, as well as a constant in space ${P}_{c,\infty }$; Table 10) and we considered that all k parameters as constant in time.

The weathering rate (${k}^{i-prim\to i-lab}$₎ was set to 2.7e-7 day⁻¹ following Buendia et al.²⁶ as in (model = GPASOIL-v0). Regarding mineralization rates, ${k}^{o-sta\to i-lab}$ = 2.7e⁻⁵day⁻¹ and ${k}^{o-lab\to i-lab}$ = 2.7e⁻⁴day⁻¹ were prescribed for organic stable and organic labile pools, i.e. a residence time of 100 and 10 years, respectively, as in (model = GPASOIL-v0). Given the uncertainty related to such rates²⁷, another values were tested: ${k}^{o-sta\to i-lab}$ = 1.8e⁻⁴day⁻¹ and ${k}^{o-lab\to i-lab}$ = 1.4e⁻³day⁻¹; i.e. a residence time of respectively 15 and 2 yr. Such smaller residence time of those two parameters allow the simulated soil P pool for grassland to become stable during the 1^st half of the 20^th century (see the “Technical validation” section). (model = GPASOIL-v1.0) was used to name the model with ${k}^{o-sta\to i-lab}$ = 2.7e⁻⁵day⁻¹ and ${k}^{o-lab\to i-lab}$ = 2.7e⁻⁴day⁻¹ while (model = GPASOIL-v1.1) was used for the model with ${k}^{o-sta\to i-lab}$ = 1.8e⁻⁴day⁻¹ and ${k}^{o-lab\to i-lab}$ = 1.4e⁻³day⁻¹ (Table 11).

Table 11 Name of the soil P estimates used in this study.

Full size table

Summary of the main differences with (model = GPASOIL-v0)

Soil P pools considered in (model = GPASOIL-v0) were listed in Table 9 and both pools and fluxes considered were plotted in Fig. 1a. Main differences with (model = GPASOIL-v1) were represented in Fig. 1 and listed hereafter:

(i).
(model = GPASOIL-v1) considered an additional inorganic P in soil solution (P_i-sol) that exchanges with both P_i-lab and P_i-sec. P_i-sol is an intermediary pool between P_i-sec and P_i-lab in Wang et al.¹⁸, and thus is key to represent processes of sorption/desorption. P_i-sol would also allow a better coupling with P uptake model in further studies.
(ii).
a Freundlich equation was used to describe the fluxes leaving P_i-sol to either P_i-lab or P_i-sec (vs. a Langmuir equilibrium between P_ISEC and P_ILAB in (model = GPASOIL-v0)). Freundlich equations were assumed to be better (see the introduction of Wang et al.¹⁸).
(iii).
in (model = GPASOIL-v0), all parameters describing soil P fluxes – except those involve in the Langmuir equilibrium, which vary as function of soil orders- were constant in space (Text S1). In (model = GPASOIL-v1), parameters involved in the soil inorganic P dynamics varied in space following Wang et al.¹⁸.
(iv).
the flux from P_x-occ to P_i-sec was omitted in (model = GPASOIL-v0) but is resolved in (model = GPASOIL-v1)
(v).
(model = GPASOIL-v0) was based on yearly time-step, while (model = GPASOIL-v1) was based on daily one, and
(vi).
while soil organic P dynamics was similar between the two model versions, one alternative couple of residence time were tested in (model = GPASOIL-v1) (same residence time as in model = GPASOIL-v0 for GPASOIL-v1.0 while another one for GPASOIL-v1.1, Table 11).

Effect of land-use and land-cover change (LUCC) on cropland/grassland soil P

Within a given grid-cell, the change in the different land-cover fractions from one year to the other modified the soil P of cropland (or grassland), if there were some gross conversions from other land-cover to cropland (grassland respectively). Thus, the effect of LUCC on cropland soil P pools for year y and grid-cell g (called ${P}_{m}\left(y,crop,g\right)$ with subscript m referring to a given soil P pool) was simulated using Eq. 18 as follows:

$$\begin{array}{lll}frac(y,crop,g)\ast {P}_{m}(y,crop,g) & = & [frac(y-1,crop,g)\\ & & -{\sum }_{i\in lu\backslash \{crop\}}{\Delta }_{i}(y-1,crop,g)]\\ & & \ast {P}_{m}(y-1,crop,g)\\ & & +{\sum }_{j\in lu\backslash \{crop\}}({\Delta }_{crop}(y-1,j,g)\\ & & \ast {P}_{m}(y-1,j,g))\end{array}$$

(18)

with frac(y,crop,g) was the fraction of the grid-cell g covered by cropland, Δ_i(y,j,g) was the change in cover area from land-use j to land-use i, lu is the list of land-use considered (i.e. lu = {crop, grass, nonagri, urban}), lu\{n} means the list of land-use lu after removing {n}. A similar equation where replacing crop by grass described the effect of LUCC on grassland soil P pools. The increase/decrease in a given soil pool P_m of a given land-cover (cropland or grassland) through LUCC was called $f{P}_{lucc}$ and was positive when P_m increases from one year to the other.

The definition of frac was the following:

$$frac\left(y,lu,g\right)=Area\left(y,lu,g\right)/land\left(g\right)$$

(19)

with Area for the area of the grid-cell covered by lu and land the total land area in the grid-cell g.

The soil P content of nonagri land-cover fraction was not explicitly simulated and was defined as follows:

$${P}_{m}\left(y,nonagri,g\right)={P}_{m}^{NA}\left(g\right)$$

(20)

where ${P}_{m}^{NA}\left(g\right)$ is the current soil P pool m of natural (unmanaged) soils and grid-cell g. ${P}_{m}^{NA}\left(g\right)$ was provided by He et al.¹⁴ for m within {P_i-lab, P_i-sec, P_i-prim, P_o-lab, P_o-sta, P_x-occ}. ${P}_{i-sol}^{NA}\left(g\right)$ was not provided by He et al.¹⁴ and for this pool, we used the following equation:

$${P}_{i-sol}^{NA}\left(g\right)={P}_{c,\infty }.\overline{{W}_{{\rm{abs}}}\left(g\right)}$$

(21)

with ${P}_{c,\infty }$ defined in (Eq. 16) and $\overline{{W}_{{\rm{abs}}}}$ the soil water content, as defined earlier.

Through Eq. 20, we assumed that the P inherited from natural soils at the conversion to agriculture (the so-called biogeochemical background, BIOG) could be represented by prescribing the current P in unmanaged soils to all soils converted to agricultural soils over the last ~120 years. As explained in Ringeval et al.¹¹, this could be limiting, in particular in regions where shifting cultivation occurred, leading to modification of the P content of soils covered by natural vegetation. We also neglected soil P input corresponding to forest biomass left on soil at the time of conversion²⁸. All these simplifications had to be taken unless we explicitly simulated the change of soil P in non-agricultural areas, which would require the use of global vegetation models that rely on their own assumptions (e.g. Sun et al^.29).

In addition, P in natural soils was also used as initial conditions, i.e.:

$${P}_{m}\left({y}_{0},crop,g\right)={P}_{m}^{NA}\left(g\right)$$

(22)

$${P}_{m}\left({y}_{0},grass,g\right)={P}_{m}^{NA}\left(g\right)$$

(23)

with y₀ corresponds to the 1^st year of the simulation, chosen here equal to 1900.

The soil P content of urban land-cover fraction was not explicitly simulated and was defined by:

$${P}_{m}\left(y,urban,g\right)=0$$

(24)

Assumption of neglected soil P pools for urban land-cover was likely wrong especially for urban green areas but this assumption had no effect on cropland/grassland soil P as conversion from urban areas to agricultural areas was extremely rare.

Drivers of agricultural soil P distribution: datasets and computation

As explained earlier, the term “drivers” encompasses variables related to soil P input and output (farming practices - FARM, atmospheric deposition -DEPO, sludges - SLUD, losses through water erosion - LOSS), land-use change (LUCC), variables that have an effect on soil P dynamics (soil properties - SPRO, climate - CLIM), and biogeochemical background (BIOG). The different drivers and their representations were summarized in Tables 1–8. The datasets used in this study are called (data = GPASOIL-v1) while (data = GPASOIL-v0) is used to name the datasets used in Ringeval et al.¹¹. Tables 1–8 provide the difference between (data = GPASOIL-v0) and (data = GPASOIL-v1) for each driver. BIOG and SPRO were constant in time while other drivers varied in time. Drivers whose the computation required more information than the one given in Tables 1–8 were described below.

FARM

P in manure (for both cropland and grassland)

Two estimates of P in manure applied to cropland soil and two estimates for grassland soil were used to assess the uncertainty associated with this soil input (Section “Error estimate…” and Supplementary Table S1). For each land-use category (cropland or grassland), the first estimate relied on a half-degree resolution dataset for N in manure reaching the soil (Zhang et al.³⁰ for cropland, $f{N}_{manu,Zhang}^{out\to x-tot}$, Eq. 25, and Xu et al.³¹ for grassland, $f{N}_{manu,Xu}^{out\to x-tot}$, Eq. 26). A spatially constant P:N ratio of 0.20 (kgP kgN⁻¹) was used to convert N fluxes to P fluxes of manure.

$$f{P}_{manu,1}^{out\to x-tot}(y,crop,g)=0.20\ast f{N}_{manu,Zhang}(y,crop,g)$$

(25)

$$f{P}_{manu,1}^{out\to x-tot}(y,grass,g)=0.20\ast f{N}_{manu,Xu}(y,grass,g)$$

(26)

$f{N}_{manu,Xu}^{out\to x-tot}$ correspond to the sum of N in what was called deposition by grazing animals and application of manure by Xu et al.³¹.

The second estimate was computed thanks to Eqs. 27, 28 and relied on the P in manure produced by the livestock at the country-scale given by Demay et al.⁹, which combined data on of livestock population and P excretion rates per livestock category, without any distinction between cropland and grassland soils:

$$f{P}_{manu,2}^{out\to x-tot}\left(y,crop,g\right)=f{P}_{manu,1}^{out\to x-tot}\left(y,crop,g\right)\ast \frac{\overline{f{P}_{manu,Demay}\left(crop+grass,country\left(g\right)\right)}}{\overline{f{P}_{manu,1}^{out\to x-tot}\left(crop+grass,country\left(g\right)\right)}}$$

(27)

$$f{P}_{manu,2}^{out\to x-tot}\left(y,grass,g\right)=f{P}_{manu,1}^{out\to x-tot}\left(y,grass,g\right)\ast \frac{\overline{f{P}_{manu,Demay}\left(crop+grass,country\left(g\right)\right)}}{\overline{f{P}_{manu,1}^{out\to x-tot}\left(crop+grass,country\left(g\right)\right)}}$$

(28)

with the overbar there corresponds to a temporal average over the 1950–2017 period (i.e. the years in common for Demay et al.⁹ and $f{P}_{manu,1}^{out\to x-tot}$), country(g) is the country where the grid-cell g can be found, $f{P}_{manu,Demay}^{out\to x-tot}$ is the P manure produced at country scale provided by Demay et al.⁹ and $f{P}_{manu,1}^{out\to x-tot}\left(crop+grass,country\left(g\right)\right)$ is the country average of variables computed thanks to Eqs. 25, 26 without any distinction crop vs grass. For a given country, $f{P}_{manu,2}^{out\to x-tot}$ has the same relative spatial distribution among grid-cells as $f{P}_{manu,1}^{out\to x-tot}$ but its spatial variabilities between countries is different. The ratio of 0.20 used in Eqs. 25, 26 was given by Sun et al.²⁹ and corresponds to a P:N ratio averaged among ruminant manure (mean P:N value of 0.165 between value of 0.15 for cattle and 0.18 for goat and sheep³²) and monogastric manure (mean P:N value of 0.26 between value of 0.24 for chicken and 0.28 for swine³²), weighted by the global amount of manure for each livestock super-category (ruminants: 14.4 TgN yr⁻¹ and monogastrics: 10.1 TgN yr⁻¹) given by FAOSTAT for the year 2000. The use of a spatially and temporal invariant P:N ratio for manure was needed given the lack of gridded data about P manure applied on cropland and grassland soil. It was however a simplistic assumption as N and P were partly decoupled due to N losses in manure through NH₃ volatilization. While applied to the same data of manure application in Sun et al.²⁹ as used here, the ratio of Lun et al.³² characterized manure excretion (i.e. before volatilization). Also, the P:N values used to compute the value of 0.20 was representative to a given country (USA)³² while such ratio could change as function of the country and farming practices, following its relationship with animal nutrition³³. Finally, the use of the same ratio for both cropland and grassland was also an approximation as the proportion of monogastric vs ruminant likely differs between cropland and grassland.

P in chemical fertilizer (for both cropland and grassland)

Given the lack of global dataset about P fertilizer applied to grassland, a spatially constant P:N ratio of 0.22 for chemical fertilizer was used to derive P in chemical fertilizer for grassland ($f{P}_{chem}^{out\to i-lab}\left(y,grass,g\right)$) from N in chemical fertilizer applied to grassland ($f{N}_{chem,Xu}\left(y,grass,g\right)$) given by Xu et al.³¹:

$$f{P}_{chem}^{out\to i-lab}\left(y,grass,g\right)=0.22\ast f{N}_{chem,Xu}\left(y,grass,g\right)$$

(29)

The ratio of 0.22 corresponds to the ratio of global P chemical fertilizer for managed grassland in 2002–2010 (0.4 TgP yr⁻¹ in Lun et al.³²) and the global N chemical fertilizer for grassland for same years (1.8 TgN yr⁻¹ in Xu et al.³¹). This ratio was computed and used in Sun et al.²⁹.

P in chemical fertilizer for cropland ($f{P}_{chem}^{out\to i-lab}\left(y,crop,g\right)$) was derived from Lu and Tian³⁴, which assumed that all chemical fertilizer was applied to cropland (i.e. grassland did not receive any chemical fertilizer). Here, we corrected the chemical fertilizer applied to cropland by subtracting the chemical fertilizer applied on grassland given by Eq. 29 from Lu and Tian³⁴ estimates, i.e.:

for any grid-cell g and year y:

$$\begin{array}{lll}f{P}_{chem}^{out\to i-lab}(y,crop,g) & = & 1/frac(y,crop,g)\ast (f{P}_{chem,LuTian}(y,crop,g)\\ & & \ast frac(y,crop,g)-f{P}_{chem}^{out\to i-lab}(y,grass,g)\\ & & \ast frac(y,grass,g))\end{array}$$

(30)

with frac is the grid-cell fraction occupied by cropland or grassland, and $f{P}_{chem,LuTian}$ was taken from Lu and Tian³⁴ (in kgP ha⁻¹ yr⁻¹). This correction lead to very small change (at the global scale, $f{P}_{chem}^{out\to i-lab}$ over 2008–2018 is 18.1 Tg.yr⁻¹ for cropland vs 0.5 Tg.yr⁻¹ for grassland).

P uptake and P residues for grassland

P uptake ($f{P}_{upta}^{i-lab\to out}$) and P in plant residues ($f{P}_{resi}^{out\to i-lab}$) in grassland were derived from the actual NPP, the part of NPP which is human-appropriated through grazing/mowing (the so-called grazing intensity, GI), as well as from some spatially constant parameters as follows:

$$\begin{array}{ccc}f{P}_{upta}^{i-lab\to out}(y,grass,g) & = & \frac{1}{\gamma }\ast \frac{1}{dry}\ast (\frac{{P}_{{\rm{ \% }},above}}{100}\ast \frac{lifespa{n}_{above}}{L}\ast NP{P}_{above}(y,grass,g)\\ & & +\frac{{P}_{{\rm{ \% }},below}}{100}\ast \frac{lifespa{n}_{below}}{L}\ast NP{P}_{below}(y,grass,g))\end{array}$$

(31)

$$f{P}_{resi}^{out\to i-lab}(y,grass,g)=(1-GI(y,grass,g))\ast f{P}_{upta}^{i-lab\to out}(y,grass,g)$$

(32)

where γ is the carbon content of dry matter (in kgC (kgDM)⁻¹), dry is the ratio between dry-matter biomass and fresh-matter biomass (kgDM (kgFM)⁻¹), P_% is the P concentration of grass biomass (in gP (100gFM)⁻¹), lifespan is the life duration of plant organs (in year), L is the length of the growing season (in years), NPP is the annual NPP (expressed here in kgC ha⁻¹ yr⁻¹), GI is the so-called grazing intensity, corresponding to the part of NPP which is grazed/mowed (no unit). A distinction aboveground vs belowground was done for P_%, lifespan, and NPP. Contrary to estimates based on carbon stock (such as used for cropland below), NPP-based estimates should consider a recycling through biomass turnover³⁵, that is why we rely here on leaf lifespan (lifespan_above, in yr) and root lifespan (lifespan_root, in yr) derived from observations divided by the length of the growing season (L).

Lifespan for root show large variations among climatic zones and root diameter (fine vs coarse)³⁶. Here, we used a lifespan_below of 0.7 yr, close to the value provided by Zhang and Wang³⁷ for graminoids. The parameter lifespan_above was set to 0.8 yr slightly below the leaf longevity used in absence of stress in LPJ-GUESS for C3-C4 grasses (1 yr in Smith et al.³⁸). We set the length of the growing season (L) equal to 1 yr. γ is set to 0.45 kgC (kgDM)⁻¹, as commonly used (see e.g Monfreda et al.³⁹). The parameter dry is set to 0.20 kgDM (kgFM)⁻¹ which is the value of dry fraction of economic yield given by Monfreda et al.³⁹ for the crop category « alfalfa ».

Terms involved in Eq. 31 are prone to uncertainty, especially NPP⁴⁰ and P concentrations. Here we used the spatial distribution of the actual NPP_tot (defined as NPP_tot = NPP_above + NPP_below) given by Kastner et al.⁴¹ as they also provided the spatial distribution of GI used in the current study to estimate the P in residues. Kastner et al.⁴¹ provided different estimates and we used here one of them (the so-defined run number 1_3_2_1_1_1 in the Readme file of Kastner et al.⁴¹). In the chosen run, LPJ-GUESS without nitrogen limitation is used to approach NPP_tot and its global average reaches ~460 gC m⁻² yr⁻¹, well above the value found in Wang et al.⁴² for temperate/C3 grassland (273 gC m⁻² yr⁻¹) but well below global value derived from fields (979 ± 78 gC m⁻² yr⁻¹) found in Sun et al.⁴⁰. Thus, we used for each grid-cell g, two estimates for NPP_tot:

$$NP{P}_{tot,1}\left(y,grass,g\right)=NP{P}_{Kastner}\left(y,g\right)$$

(33)

$$NP{P}_{tot,2}\left(y,grass,g\right)=NP{P}_{Kastner}\left(y,g\right)\ast \frac{\overline{NP{P}_{Sun}}}{\overline{NP{P}_{Kastner}}}$$

(34)

with $\overline{NP{P}_{Sun}}$ and $\overline{NP{P}_{Kastner}}$ equal to 979 and 460 gC m⁻² yr⁻¹, respectively. Globally averaged GI is 0.12. NPP_below and NPP_above are computed as 0.54*NPP_tot and 0.46*NPP_tot, respectively, as found in Sun et al.⁴⁰.

Lun et al.⁴³ provided a value of 1.5e⁻¹ gP (100gFM)⁻¹ for the P concentration of the economic yield for the crop category « other crops » encompassing « forage and silage (maize, grasses nes, alfalfa, clover,…) », which is close to the value provided by Smil⁴⁴ for the harvest of the crop category « Forages » (2.0e⁻¹ gP (100gFM)⁻¹) but much higher than the P concentration found in GOLUM⁴² (and taken from Zechmeister-Boltenstern et al.⁴⁵) or found in Spohn et al.⁴⁶. Wang et al.⁴² provided a molar P:C ratio for foliage of 1/753 for temperate C3 grass and 1/1728 for tropical C4 grass (Table S1 of Wang et al.⁴²), i.e. a C3-C4 averaged molar P:C ratio of 1/1241, very close to the molar P:C ratio of 1/1260 given by Spohn et al.⁴⁶ for senescent leaves of grassland. These values correspond to a P concentration of 2.5e⁻² gP (100gFM)⁻¹, following the below equation:

$${P}_{{\rm{ \% }},above}=100\ast \gamma \ast dry\ast \frac{M(P)}{M(C)}\ast {P}_{molar,above}$$

(35)

with P_molar,above the P:C molar ratio (no unit), P_% the P concentration in gP (100gFM)⁻¹, M(P) and M(C) the atomic mass of P and C. In our study, we used this value (2.5e⁻² gP (100gFM)⁻¹) as lower boundary and the Lun et al.⁴³ value (1.5e⁻¹ gP (100gFM)⁻¹) as upper boundary, thus a mean value of 8.8e⁻² gP (100gFM)⁻¹.

Finally, we assumed the following relationship between P_%,above and P_%,below, based on the order of magnitude of leaf:root ratio of molar P:C ratio used in⁴⁵ for C3 or C4 grassland:

$${P}_{{\rm{ \% }},below}=\frac{1}{2}{P}_{{\rm{ \% }},above}$$

(36)

P uptake and P residues for cropland

P uptake at grid-cell scale for the cropland fraction ($f{P}_{uptake}^{i-lab\to out}\left(y,crop,g\right)$) was computed as the sum of the uptake by different crops weighted by the harvested area of each crop:

$$f{P}_{upta}^{i-lab\to out}(y,crop,g)=\frac{1}{{\sum }_{c\in lis{t}_{crop}}Area(y,c,g)}.{\sum }_{c\in lis{t}_{crop}}(Area(y,c,g)\ast f{P}_{upta}^{i-lab\to out}(y,c,g))$$

(37)

P in plant residues ($f{P}_{resid}^{out\to x-tot}$) was computed as:

$$f{P}_{resi}^{out\to x-tot}(y,crop,g)=\frac{1}{{\sum }_{c\in lis{t}_{crop}}Area(y,c,g)}.{\sum }_{c\in lis{t}_{crop}}(Area(y,c,g)\ast f{P}_{resi}^{out\to x-tot}(y,c,g))$$

(38)

with c a crop within the list of crop considered in Monfreda et al.³⁹ (called here list_crop) and Area(y,c,g) the area covered by the crop c (in ha). Among 172 crops listed in list_crop, 14 were not considered as we did not find any corresponding crop in FAOSTAT. It is particularly the case for some crops of the group « forage » in Monfreda et al.³⁹ (alfalfa, beets for fodder, clover, rye grass for forage and silage, etc.) but not for all, as we made some corresponding between non-forage and forage crop in FAOSTAT. E.g. we used the “maize” crop in FAOSTAT to get some information needed to be combined with the crop “maize for forage” of Monfreda et al.³⁹. The corresponding between non-forage and forage crop was disputable but it concerned here only the relative change in yield and area as compared to the year 2000 (Eqs. 43, 46). We also excluded carob as carob area in Spain shows very weird pattern in FAOSTAT (from ~130 000 ha on average over 1961–1989 to 450 ha on average over 1990–2017).

For each crop, P uptake and P residues are computed as follows:

$$\begin{array}{lll}f{P}_{uptake}^{i-lab\to out}(y,c,g) & = & \frac{Yield(y,c,g)}{100}.[{P}_{{\rm{ \% }},root}(c).\frac{dr{y}_{harvest}(c)}{dr{y}_{root}(c)}.\frac{RSR(c)}{HI(c)}\\ & & +{P}_{{\rm{ \% }},abov\backslash harvest}(c).\frac{dr{y}_{harvest}(c)}{dr{y}_{abov\backslash harvest}(c)}.\left(\frac{1}{HI(c)}-1\right)+{P}_{{\rm{ \% }},harvest}(c)]\end{array}$$

(39)

$$\begin{array}{ccc}f{P}_{resid}^{out\to x-tot}(y,c,g) & = & \frac{Yield(y,c,g)}{100}.[{P}_{{\rm{ \% }},root}(c).\frac{dr{y}_{harvest}(c)}{dr{y}_{root}(c)}.\frac{RSR(c)}{HI(c)}\\ & & +fra{c}_{resid}(c).{P}_{{\rm{ \% }},abov{\rm{\setminus }}harvest}(c).\frac{dr{y}_{harvest}(c)}{dr{y}_{abov{\rm{\setminus }}harvest}(c)}.\left(\frac{1}{HI(c)}-1\right)]\end{array}$$

(40)

with P_%,m the P concentration (in gP (100gFM)⁻¹) for the organ m with m being {root, abov\harvest, harvest} where « abov\harvest » defines the aboveground biomass excluding the harvest (also called total residues in Smil⁴⁴), dry_m the dry proportion of organ m expressed in fresh matter (in gDM gFM⁻¹), frac_resid the fraction of the aboveground biomass excluding the harvest that remains on the soil (the rest being exported from the field), Yield the economic yield (expressed in kgFM ha⁻¹), RSR the root:shoot ratio (no unit), HI the harvest index (no unit). Text S2 shows how we get these equations as well as how we get corresponding equations for crops whose the harvested part being root. Following Bentsen et al.⁴⁷, we assumed that at harvest, the different organs have the same dry proportion (i.e. dry_grain = dry_abov\harvest = dry_root). For all crops but root crops, we considered P_{%,abov\harvest} equal to the P concentration of aboveground residues given by Lun et al.⁴³ (and completed for few crop categories by Smil⁴⁴). P_%,harvest is provided by Lun et al.⁴³. RSR is derived from the aboveground fraction (frac_above) given in Monfreda et al.³⁹ (RSR = 1/frac_above −1) and HI is directly given by Monfreda et al.³⁹. frac_resid is equal to 0 for forage crop (group « forage » in Monfreda et al.³⁹) and equal to 1/2 for other crops, following Smil⁴⁴. We did not find any database available providing the P concentration of roots for different crop categories. Instead we assumed, following Ye et al.⁴⁸ (focusing on rice), that for any crop c:

$${P}_{{\rm{ \% }},root}\left(c\right)=0.75\ast {P}_{{\rm{ \% }},abov\backslash harvest}\left(c\right)$$

(41)

The specificities of root crop computation are given in Text S2.

The spatially explicit yield ($Yield\left(y,c,g\right)$ in kgFM ha⁻¹) for the crop category c is computed as follows:

$$Yield\left(y,c,g\right)=Yiel{d}_{Monfreda}\left(2000,c,g\right).\frac{Yiel{d}_{FAO,bis}\left(y,c,country\left(g\right)\right)}{Yiel{d}_{FAO}\left(2000,c,country\left(g\right)\right)}$$

(42)

with Yield_Monfreda the spatially explicit (half-degree resolution) distribution of yield given by Monfreda et al.³⁹ for years 2000 (and expressed as weight of economic yield, i.e. with standardized water content), Yield_FAO the country-scale FAO yield and Yield_FAO,bis defined as follows:

$$Yiel{d}_{FAO,bis}\left(y,c,country\left(g\right)\right)=Yiel{d}_{FAO}\left(y,c,country\left(g\right)\right)\;{\rm{if}}\;{\rm{y}} > ={\rm{1961}}$$

(43)

$$Yiel{d}_{FAO,bis}(y,c,country(g))=Yiel{d}_{FAO}(1961,country(g),c)\ast \frac{pop(y,country(g))}{pop(1961,country(g))}\;{\rm{if}}\;{\rm{y}} < {\rm{1961}}$$

(44)

with pop the country-scale population defined by HYDE 3.2²³. With Eq. 42, we combined the spatially explicit yield distribution of Monfreda et al.³⁹ (but only one value for year 2000) with the temporal varying FAOSTAT yield (but at country-scale). For each crop, the relative intra-country distribution of yield given by Monfreda et al.³⁹ for year 2000 was kept constant in time. With Eq. 44, we scaled the country-scale yield to the country-scaled human population before 1961, as it was assumed in Bouwman et al.¹³. As country spatial boundaries can change in time, we made some computations to allow consistency between each grid-cell and its belonging to a country whose the boundaries can change in time (Text S3).

The harvested area ($Area\left(y,c,g\right)$ in ha) for the crop category c for any grid-cell g and any year y is computed as follows:

$$Area\left(y,c,g\right)=are{a}_{Monfreda}\left(2000,c,g\right).\frac{Are{a}_{FAO,bis}\left(y,c,country\left(g\right)\right)}{Are{a}_{FAO}\left(2000,c,country\left(g\right)\right)}$$

(45)

with Area_Monfreda the spatially explicit distribution of harvested area given by Monfreda et al.³⁹ for years 2000, Area_FAO the country-scale FAO harvested area and Area_FAO,bis defined as follows:

$$Are{a}_{FAO,bis}\left(y,c,country\left(g\right)\right)=Are{a}_{FAO}\left(y,c,country\left(g\right)\right)\;{\rm{if}}\;{\rm{y}} > ={\rm{1961}}$$

(46)

and

$$Are{a}_{FAO,bis}\left(y,c,country\left(g\right)\right)=Are{a}_{FAO}\left(1961,c,country\left(g\right)\right)\;{\rm{if}}\;{\rm{y}} < 1961$$

(47)

Composition of manure and residues for cropland and grassland

The composition (inorganic labile, organic labile, organic stable) of manure and residues were considered constant in space and among cropland/grassland:

$$f{P}_{manu}^{out\to i-lab}(y,lu,g)=fra{c}_{i-lab,manu}\ast f{P}_{manu}^{out\to x-tot}(y,lu,g)$$

(48)

$$f{P}_{manu}^{out\to o-lab}(y,lu,g)=fra{c}_{o-lab,manu}\ast f{P}_{manu}^{out\to x-tot}(y,lu,g)$$

(49)

$$f{P}_{manu}^{out\to o-sta}(y,lu,g)=(1-fra{c}_{i-lab,manu}-fra{c}_{o-lab,manu})\ast f{P}_{manu}^{out\to x-tot}(y,lu,g)$$

(50)

$$f{P}_{resi}^{out\to i-lab}(y,lu,g)=fra{c}_{i-lab,resi}\ast f{P}_{resi}^{out\to x-tot}(y,lu,g)$$

(51)

$$f{P}_{resi}^{out\to o-lab}(y,lu,g)=fra{c}_{o-lab,resi}\ast f{P}_{resi}^{out\to x-tot}(y,lu,g)$$

(52)

$$f{P}_{resi}^{out\to o-sta}(y,lu,g)=(1-fra{c}_{i-lab,resi}-fra{c}_{o-lab,resi})\ast f{P}_{resi}^{out\to x-tot}(y,lu,g)$$

(53)

with lu in {crop,grass}, y the year and g the grid-cell considered, frac_m,l corresponding to the fraction of $f{P}_{l}^{out\to x-tot}$ reaching the pool m. Following Ringeval et al.¹¹, we prescribed: $fra{c}_{i-lab,manu}=0.8$, $fra{c}_{o-lab,manu}=0.1$, $fra{c}_{i-lab,resid}=0.4$ and $fra{c}_{o-lab,resid}=0.4$. We considered an uncertainty associated to the composition of residues. To do so, we use a random value between $fra{c}_{i-lab,resi}-50{\rm{ \% }}$ and $fra{c}_{i-lab,resi}+50{\rm{ \% }}$ as well as as between $fra{c}_{o-lab,resi}-50{\rm{ \% }}$ and $fra{c}_{o-lab,resi}+50{\rm{ \% }}$ (see the section about error estimates).

DEPO

Different classes of atmospheric deposition of P are often considered based on the sources of P in the atmosphere: mineral dust, sea salt, primary biogenic aerosol particles (PBAP), natural combustion and anthropogenic combustion⁴⁹. We considered that P deposition input enriches both the inorganic labile pool ($f{P}_{depo}^{out\to i-lab}$) and the primary inorganic pool ($f{P}_{depo}^{out\to i-prim}\left(y,lu,g\right)$), as follows:

$$\begin{array}{lll}f{P}_{depo}^{out\to i-lab}(y,lu,g) & = & fra{c}_{i-lab,dust}\ast {D}_{dust}(y,g)+fra{c}_{i-lab,other}\ast ({D}_{seasalt}(y,g)\\ & & +{D}_{PBAP}(y,g)+{D}_{natcomb}(y,g)+{D}_{anthcomb}(y,g))\end{array}$$

(54)

$$\begin{array}{lll}f{P}_{depo}^{out\to i-prim}(y,lu,g) & = & (1-fra{c}_{i-lab,dust})\ast {D}_{dust}(y,g)+(1-fra{c}_{i-lab,other})\\ & & \ast ({D}_{seasalt}(y,g)+{D}_{PBAP}(y,g)\\ & & +{D}_{natcomb}(y,g)+{D}_{anthcomb}(y,g))\end{array}$$

(55)

with D_dust, D_seasalt, D_PBAP, D_natcomb and D_anthcomb are atmospheric P deposition fluxes (in kgP ha⁻¹ yr⁻¹) of mineral dust, sea salt, PBAP, natural combustion and anthropogenic combustion, respectively. frac_lab,dust and frac_lab,other are labile fractions of atmospheric deposition of P (no unit) for respectively dust and all other sources (sea salt, PBAP, natural and anthropogenic combustion). frac_lab,dust was chosen to be equal to 0.1 while frac_lab,other was chosen equal to 0.5⁵⁰.

For any year y of the simulation and for any grid-cell g, we computed ${D}_{dust}\left(y,g\right)$, ${D}_{seasalt}\left(y,g\right)$, ${D}_{PBAP}\left(y,g\right)$, ${D}_{natcomb}\left(y,g\right)$ and ${D}_{anthcomb}\left(y,g\right)$ based on variables provided by Wang et al.^49,51. Wang et al.⁴⁹ provided, for any grid-cell g, ${D}_{dust}^{A}\left(\overline{2000-2011},g\right)$, ${D}_{seasalt}^{A}\left(\overline{2000-2011},g\right)$, ${D}_{PBAP}^{A}\left(2000,g\right)$, ${D}_{totcomb}^{A}\left(\overline{1960-2007},g\right)$ and for any World regions reg, ${E}_{anthcomb}^{A}\left(1960\le y\le 2007,reg\right)$ and ${E}_{natcomb}^{A}\left(1960\le y\le 2007,reg\right)$ while Wang et al.⁵¹ provided, for any grid-cell g, ${D}_{all}^{B}\left(1997\le y\le 2013,g\right)$, with D: atmospheric deposition of P, E: P emissions to the atmosphere, the upper letter corresponding to the version of the dataset of deposition (A: Wang et al.⁴⁹ and B: Wang et al.⁵¹), the line over years means that only the data for an averaged time-period is available while a ≤ y ≤ b means that data are available for each year between a and b, totcomb means (natcomb + anthcomb) while all means: (dust + (seasalt + PBAP + natcomb + (anthcomb).

To summarize our computation, the strategy was to consider D_dust, D_seaslat, D_PBAP, D_natcomb as static in time, and temporal average given by Wang et al.⁴⁹ were used for these variables. D_natcomb (respectively D_anthcomb) were derived from D_totcomb and the ratio between natcomb and totcomb (resp. anthcomb and totcomb) in emissions. ${D}_{anthcomb}\left(y,g\right)$ was considered varying in time and its interannual variability was estimated from the interannual variability of emissions from anthropogenic combustion. For years after 2007, Wang et al.⁵¹ were used and corrected to ensure equality with Wang et al.⁴⁹ for the years in common (1997–2007). Equations are given in Text S4.

SLUD

P in sludges from sewage treatment that reaches cropland soils P ($f{P}_{slu}^{out\to x-tot}$ in kgP ha⁻¹ year⁻¹) is estimated by combining human P excretions with the fraction of sewage sludge that is treated and the P removal efficiency of treatments plants following van Puijenbroek et al.⁵² and Demay et al.⁹:

$$\begin{array}{lll}f{P}_{slud}^{out\to x-tot}(y,crop,g) & = & \sum _{type=1,2,3}(fra{c}_{treat}(y,country(g),type)\ast NR(type))\\ & & \ast huma{n}_{excr}(y,reg(country(g)))\ast pop(y,country(g))\\ & & /Area(y,crop,country(g))\end{array}$$

(56)

with human_excr the total P in human excretion (in kgP capita⁻¹ year⁻¹) for big World regions, frac_treat is country-scale fraction of total population (no unit) with access to different types of wastewater treatment installations (types 1, 2 or 3), each one being characterized by a different removal fraction of P (NR, no unit with NR = 0.10, 0.45, 0.90 for respectively wastewater treatment installations type 1, 2, 3), pop the country-scale human population (in inhabitants), Area the country-scale cropland area (in ha).

NR, human_excr, frac_treatment are provided by van Puijenbroek et al.⁵², pop is provided by HYDE 3.2²³ and Area at country scale is computed from the dataset used in LUCC. The variable human_excr is available for the years 1970 and 2010. The variable frac_treat is available for years 1990, 2000, 2010 in van Puijenbroek et al.⁵² and we computed it for the year 1970 (see Text S5 for further technical details). Then, linear regressions are used to derive human_excr and frac_treat for any year between 1970 and 2010. Values for year 1970 (resp. 2010) were used for any years before (resp. after) 1970 (resp. 2010). The corresponding between big World regions and countries was provided by the IMAGE framework region classification (https://models.pbl.nl/image/index.php/Region_classification_map).

The Eq. 56 relies on the following assumptions:

- P in sludges is solely sourced from human food consumption (thereby excluding P release from detergents)

- all sludges produced are used in agriculture. This is a strong assumption as a few countries have specific rules banning the use of sludge in agriculture (e.g. Switzerland incinerated sludges⁵³,) but this assumption was required as we did not find any database that compiled this information per country at the global scale

- all sludges used in agriculture are spread on croplands (not on grasslands).

For the composition of P in sludges, we used the same labile vs. stable contribution as for manure without considering any uncertainty related to the composition (Supplementary Table S1).

LOSS

The annual lateral flux of P lost from soil erosion by water (in kgP ha⁻¹ yr⁻¹) was computed by combining the fraction of soil mass of the top 0–0.3 m layer lost each year through erosion (frac_loss, in yr⁻¹) with each soil P pools (kgP ha⁻¹):

$$f{P}_{loss}^{m\to out}\left(y,lu,g\right)={P}_{m}\left(y,lu,g\right)\ast fra{c}_{loss}\left(y,lu,g\right)$$

(57)

with m in {i-lab, o-lab, o-sta, i-prim, i-sec, i-occ}, lu in {crop, grass}, and P_m the soil P content of pool m simulated by our approach.

frac_loss was computed by combining the gross soil losses by water erosion with a cropland/grassland distinction (loss, in (kg of soil) ha⁻¹ yr⁻¹), the soil bulk density of the fine earth fraction (bdod) and the volumetric fraction of coarse fragment (cfvo) to estimate the soil mass of the top 0–0.3 m layer:

$$fra{c}_{loss}(y,lu,g)=\frac{loss(2000,lu,g)}{bdod(g)\ast (1-cfvo(g))\ast \Delta z\ast {10}^{+4}}$$

(58)

with bdod the bulk density for the top 0–0.3 m soil layer (in (kg of fine soil) m⁻³), cfvo is the volumetric fraction of coarse fragment (without unit), Δz = 0.3 m and 10⁺⁴ a converting factor in m² ha⁻¹. The variable loss was computed following the approach described in Borrelli et al.⁵⁴ but here applied to the land-use dataset described for year 2000 in LUCC. The geo-statistical approach proposed by^{Borrelli et al.54} allowed for an accurate spatial definition of the land uses (native resolution at 250 m cell size at the equator aggregated to 0.5° latitude × 0.5° longitude) and the consideration of the effects of the different regional cropping systems.

As in Ringeval et al.¹¹, Eq. 58 is based on the following assumptions. First, we assumed that the soil losses of Borrelli et al.⁵⁴ corresponded to a loss of topsoil (i.e. 0–0.3 m soil layer). Second, the estimates of losses for year 2000 (in (kg of soil) ha⁻¹ yr⁻¹) were considered representative of erosion rates for the whole century. Last, the estimates from Borrelli et al.⁵⁴ correspond to a gross erosion rate and not to the net flux, which results from both mobilization and processes, such as deposition, storage and burial. However, we considered that these later processes were not relevant to the system represented here, either because they largely occur in non-agricultural fractions of the grid-cell (e.g. deposition on river banks) or because they concerned soil horizons below 0.3 m (burial).

LUCC

The following land-use categories are considered in our approach: cropland (crop), grassland (grass), non-agricultural vegetation (nonagri), and urban (urban). For each year y and grid-cell g, we needed the fractions of cropland ($frac\left(y,crop,g\right)$) and of grassland ($frac\left(y,grass,g\right)$) as well as the transitions between the above mentioned 4 land-use categories (i.e. ${\Delta }_{i}\left(y,j,g\right)$) for the conversion from j to i with both i and j in {crop, grass, nonagri, urban}). These variables were estimated based on Chini et al.²⁰ after we made the corresponding between our land-use categories (left hand side) and the ones considered in Chini et al.²⁰ (right hand side):

$$cropland=c3ann+c3nfx+c3per+c4ann+c4per$$

(59)

$$grassland=pastr+range$$

(60)

$$nonagri=primf+primn+secdf+secdn$$

(61)

$$urban=urban$$

(62)

with the following meaning for categories in Chini et al.²⁰: c3ann: C3 annual crops, c3nfx: C3 nitrogen-fixing crops, c3per: C3 perennial crops, c4ann: C4 annual crops, c4per: C4 perennial crops, past: managed pasture, range: rangeland, primf: forested primary land, primn: non-forested primary land, secdf: potentially forested secondary land, secdn: potentially non-forested secondary land, urban: urban land.

CLIM

We used the simulations performed by land-surface models for the CMIP6 exercise⁵⁵ to get information about the soil liquid water content (W_rel, in fraction of the field capacity and W_abs, in L(kg of soil)⁻¹) and the soil temperature (T_soil, in °C) for the top soil layer considered in our approach (0–0.3 m) over the historical period. All CMIP-6 simulations providing the following variables at the basis of our computation of W_rel, W_abs and T_soil have been used: mrsll (liquid water content of soil layers, in kg.m-2), mrsofc or fldcapacity (field capacity, in kg m⁻²), tsl (soil temperature, in °C). The choice of mrsofc or fldcapacity to approach the field capacity of a given land-surface model depends if the field capacity varies with depth in this land-surface model. Finally, the average among 9 simulations (combination between 4 land-surface models: CNRM-ESM2-1, CNRM-CM6-1, IPSL-CM6A-LR, MIROC6; and 3 climate data used as input of the land-surface models for the historical period: land-hist, land-hist-cruNcep, land-hist-princeton; see Table 6 for the reference of each simulation) has been computed and used. The period 1850–2012 was common to the 9 simulations. For years after 2012, to prevent any bias due to the lack of some simulations, we used the 1993–2012 climatology each year.

The annual average of each variable (W_abs, W_rel and T_soil) was used in our approach. The influences of T_soil and W_rel on weathering and mineralization were modelled using different functions (h₁ and g₁ for sensitivity to T_soil, h₂ and g₂ for sensitivity to W_rel) in both (model = GPASOIL-v0) and (model = GPASOIL-v1). W_abs was used in (model = GPASOIL-v1) in some parametrizations (Eqs. 12, 13) and to translate P_i-sol into soil solution P concentration (Eqs. 3, 7). In addition, the near-surface atmospheric temperature needed in one parametrization of (model = GPASOIL-v1) (T_a in Table 10) was taken from climate data sets used as input of the land-surface models quoted above.

Error estimate from uncertainty associated with the datasets to describe the different drivers

While we recognized the existence of different sources of uncertainty (dataset, model structure, parametrization⁵⁶), we mainly focus here on how the uncertainty related to the representation of the different drivers propagated to the simulation of current cropland/grassland soil P pools. Nevertheless, the uncertainty associated to SPRO has an effect on the model parameterizations. The uncertainty associated to mineralization rates (model parameter) was investigated separately.

To investigate the propagation of the uncertainty associated with the driver datasets, we performed 100 simulations by using a random value for each variable related to a given driver (see the list of variables in the 3^nd line of Tables 1–8) with all grid-cells considered as independent. For each driver, how the random value was defined was described in Supplementary Table S1 and summarized here. The uncertainty associated with the driver LUCC was not considered given the difficulty to properly quantify it by keeping consistency between the different variables involved in LUCC (fraction and transitions between the different land-use). The coefficient of variation computed over the 100 simulations was used hereafter as uncertainty estimate.

For a given driver, when two estimates are considered (called hereafter E1 and E2), we choose to use a normal distribution with 4σ between E1 and E2, which means that ~96% of the random values chosen will fall between the two estimates. This means that we considered E1 and E2 as lower/upper boundaries and we considered the range they defines as encompassing most values possible. Instead of a normal distribution, a uniform distribution was considered for variable that we considered very uncertain (P in manure for cropland and grassland, residues composition) as we considered each estimate (E1 or E2) as reasonable as the other estimate. For a few variables related to some drivers for which only one estimate was available (called hereafter E) and no uncertainty was provided (i.e. P in chemical fertilizer for grassland, P uptake for cropland, residues composition, P in sludge), we derived E1 and E2 as (1 − x)*E and (1 + x)*E with x either equal to 30% or 50% depending on how we consider the driver uncertain, i.e. a standard-deviation equal to 15% or 25% of the mean, respectively.

Random values can lead to unmeaning values for few grid-cells. To prevent this, we set the minimum value allowed for soil input/output to zero. For variables related to other drivers (BIOG, CLIM, SPRO), we prescribed the minimum/maximum values by using the global spatial minimum/maximum of the mean value (Supplementary Table S1). For temporal varying drivers, we consider the uncertainty related to the different time-steps as independent.

In the following, we focused on the simulation performed with the mean value for each driver instead of the mean among the 100 simulations performed to assess the uncertainty. Both are close except for grid-cells with very small soil P pools. In these grid-cells, if the combination of random soil input/output lead to negative values of P_i-lab, P_i-lab is set to 0 and thus, the mean of the 100 simulations was artificially increased, leading to values larger than the simulation performed with mean drivers. As consequence, the coefficient of variation (instead of the standard deviation) among the 100 simulations was used to assess the uncertainty in soil P pools.

One additional set of 100 simulations was performed to estimate the role played by the uncertainty in BIOG on the uncertainty of P_i-lab. In this set, we used the mean value for all variables related to BIOG and considered uncertainty for all other drivers. Difference between the CV computed with this set of 100 simulations and the one computed in the set where all drivers are uncertain allows to distinguish the role played by BIOG.

Data Records

One tgz file called GPASOIL-v1_output.tgz was made available on recherche.data.gouv.fr⁵⁷ and contains the soil P pools simulated with (data = v1) and (model = v1.1), as described earlier, in a netcdf format. These soil P pools correspond to GPASOIL-v1.1 (cf. Table 11). To assess the uncertainty related to mineralization rates, we also provided the GPASOIL-v1.0 soil P pools, i.e. the pools simulated with (data = v1) and (model = v1.0). Concerning the coordinate systems, all variables are referenced to the World Geodetic System (WGS84). The cell size is 0.5 decimal degrees. Fluxes of P corresponding to soil P input/output and to soil P dynamics are also given in the same file. Output are provided for the period 1900–2018 at annual time-step for both cropland and grassland. Fluxes were expressed in kgP ha⁻¹ yr⁻¹ and pools were expressed in kgP ha⁻¹, and are both representative to the top soil layer 0–0.3 m. Two files are provided: one corresponding to the simulation with mean estimate for each driver and another one corresponding to the coefficient of variation to assess the effect of the uncertainty in driver estimate on the simulated soil P pools. README_output_upload.txt describes the netcdf files. An additional netcdf file with the land area was also provided to allow users to compute global averages of soil P pools.

Technical Validation

Our technical validation evaluated first each component of our approach, namely i) the driver estimates based on published datasets and ii) the model of soil P dynamic. Then it focused on how the update of each component (dataset, model) modified the soil P maps simulated from GPASOIL-v0 to GPASOIL-v1. This is part of the technical validation as it allows us to trace the impact of the update of each component on the reconstructed soil P. In this section, different combinations (data x model) were used (Table 11). Then, we discussed about how our best estimates (GPASOIL-v1.1) compared to previous published modelling products Zhang et al.¹² (McDowell et al.⁵⁸ and GPASOIL-v0). Finally, we compared the spatial variability simulated by GPASOIL-v1.1 to the one of published regional datasets on measured soil P.

Evaluation of the computation of some selected drivers

The estimate of the drivers considered in our approach (BIOG, FARM, etc. see Tables 1–8) was based on published datasets. While the estimate was straightforward from published datasets for some drivers (e.g. BIOG), some additional computations were required for others, in particular for P uptake (with strong differences in the computation between cropland and grassland), P in manure and P deposition. In the current section, we checked that our own computation performed well, i.e. that we succeed in applying the equations given in the section named “Drivers of agricultural soil P distribution: datasets and computation”. If possible, we compared some intermediary variables (e.g. yield per crop category, which is an intermediary variable of the computation of P uptake for cropland) or final P-related variables (e.g. P in manure) to other independent datasets.

In Fig. 2, we compared our estimates of P in biomass grazed/mowed from grassland to the estimate computed by Demay et al.⁹. P in biomass grazed/mowed from grassland was defined as the difference $f{P}_{upta}^{i-lab\to out}(grass)-$ $f{P}_{resi}^{out\to x-tot}(grass)$, or following Eq. 32 as $GI\ast f{P}_{upta}^{i-lab\to out}\left(grass\right)$. The two estimates used to assess the uncertainty ($GI\ast f{P}_{upta,1}^{i-lab\to out}\left(grass\right)$, $GI\ast f{P}_{upta,2}^{i-lab\to out}\left(grass\right)$) and their mean ($GI\ast f{P}_{upta,mean}^{i-lab\to out}\left(grass\right)$) were plotted in Fig. 2. P in biomass grazed/mowed from grassland in Demay et al.⁹ (called P harvest from grassland in this latter reference) was computed at country scale based on forage P demand of livestock (Eq. S4 of Demay et al.⁹). We found that the independent estimate of Demay et al.⁹ was very close to the mean of our estimates (black curve in Fig. 2).

P uptake for cropland was based on the spatially explicit temporal yield per cropland that we estimated using Eq. 42. Figure S1 shows how the Eq. 42 provided temporal estimates of yield for maize at grid-cell scale (Yield in kgFM ha⁻¹, line 2 of Fig. S1) by combining Yield_Monfreda, the spatially explicit yield given by Monfreda et al.³⁹ for circa the year 2000 (line 4) with Yield_FAO, the country-scale annual FAO yield (line 3). Within each country, the relative spatial variability was kept constant for any year and equal to one of Yield_Monfreda. Figure S1 focuses on the temporal averages over three time-periods (1982–1992, 1993–2003, 2004–2014) as it allows to compare our estimates with estimates of Iizumi and Sakai⁵⁹ (line 1) available over the 1982–2014 time-period. Overall, a relatively good agreement is found despite some discrepancies between our estimate and⁵⁹, explained by some mismatch in country-scale variability (e.g. USA) or global distribution of the crop area (e.g. India) between Monfreda et al.³⁹ and Iizumi and Sakai⁵⁹. Note that Iizumi and Sakai⁵⁹ assumed no temporal change in harvested area and focused only on few crops, that is why we only used this dataset for comparison in the current study.

Figure S2 focuses on the temporal variation at country scale of Yield and Area, the spatially explicit and crop-specific yield and harvested area, respectively. Yield and Area are computed thanks to Eqs. 42, 45. As Fig. S1, Figure S2 focuses on maize (c = maize in Eqs. 42, 45). We compared Yield to the dataset of Monfreda et al.⁵⁹ and Area to the dataset used to characterize LUCC²⁰ for the broad crop category encompassing maize in LUCC (namely “c4ann” in Chini et al.²⁰). LUCC is used in our study but not within the computation of the cropland P uptake and thus is independent to its computation. Figure S2 shows that our procedure allows both Area and Yield to match Monfreda et al.³⁹ at country-scale for the year 2000 and to mimic the year-to-year variability of FAOSTAT (left columns of Fig. S2). When we put together all crops belonging to the category « c4ann » of LUCC, we found that the simulated year-to-year variability of area differ to the one given by LUCC (except for Brazil) but that the order of magnitude of area per country relatively matches the one of LUCC. Crop composition of total P uptake is given in Fig. S3.

In our approach, P in manure applied to cropland and grassland soils was estimated following two different methods. While the 1^st estimate was based on half-degree resolution dataset for N in manure and N:P ratios, the 2^nd estimate was based on country-scale estimates of Demay et al.⁹ with the same inner-country spatial distribution and temporal variability as the 1^st estimate. Figure 3 shows the estimate 1 and estimate 2 for few countries and at the global scale. Demay et al.⁹ did not distinguished cropland from grassland and we plotted in Fig. 3 the sum of P in manure for the two land-covers. By construction, the country-averages of estimate 2 and Demay et al.⁹ over 1950–2017 are equal, and the temporal variability of estimate 2 is similar to the one of estimate 1. Figure 3 shows that estimates 1 and 2 greatly differed, with, e.g. 1.4 kgP/ha for estimate 1 vs 3.9 for estimate 2, over 1961–2017 at global scale (Fig. 3g). On Fig. 3, we also compared our estimates to the following independent country-scale computation (fP_{manu,indirect}, in kgP (ha of cropland + grassland)⁻¹) based on a livestock budget:

$$\begin{array}{lll}f{P}_{manu,indirect}(y,crop+grass,country) & = & 1/Area(y,crop+grass,country)\\ & & \ast [GI(y,grass,country)\\ & & \ast f{P}_{upta,mean}^{i-lab\to out}(y,grass,country)\\ & & \ast Area(y,grass,country)\\ & & +grai{n}_{feed}(y,country)\\ & & +mine{r}_{feed}(y,country)\\ & & -produc{t}_{animal}(y,country)]\end{array}$$

(63)

with $GI(y,grass,country)\ast f{P}_{upta,mean}^{i-lab\to out}(y,grass,country)\ast Area(y,grass,country)$ the P in biomass withdraw from grassland (either mowed or grazed) to feed the livestock (in kgP), gain_feed and miner_feed the grain and mineral used to feed livestock, respectively, and product_animal the P in animal products (P in milk, egg and slaughtered animals). The variables gain_feed, miner_feed and product_animal are expressed in kgP. The country-scale data required to compute grain_feed, minera_feed and product_animal were provided by FAOSTAT and Demay et al.⁹. Parameters needed to compute product_animal (average carcass weight, carcass yield over live weight, P content of animal live weight, P concentration of egg and milk) were found in Senthilkumar et al.⁶⁰. We found that $f{P}_{manu,indirect}$ was close to our mean estimate in both temporal average value (2.6 kgP ha⁻¹ for our mean estimate over 1961–2018 at global scale vs 2.2 for $f{P}_{manu,indirect}$) and temporal correlation (e.g. pearson correlation of 1.0, p < 0.05). Interestingly, the year-to-year variability of our estimates and the one of Demay et al.⁹ show some consistent patterns for China, India and Russia but less for other countries.

Figure S4a shows that our procedure to compute P deposition allowed to get a consistent global P deposition from anthropogenic combustion over the whole time-period.

Application of the Wang et al. soil P inorganic dynamics parametrizations at the global scale

(model = GPASOIL-v1) was based on Wang et al.¹⁸ parametrization for soil inorganic P dynamics. Parameters used in our approach follow the relationships with pedo-climatic variables found in Wang et al.¹⁸ with some modifications (all k parameters constant in time, oxalate excluded from parametrization leading to both simplified equation for k parameters and ${P}_{c,\infty }$ constant in space). Soil P fractions used in the relationships to compute the different parameters were given by the initial conditions chosen for the soil P pools (i.e. ${P}_{c,\infty }$ = 0.1 mgP L⁻¹ for P_i-sol and P of unmanaged soil P given by He et al.¹⁴ for other pools). Figure 4 shows that the spatial distribution of parameters involved in (model = GPASOIL-v1) fall well within the range provided by Wang et al.¹⁸ (blue boxplot vs. the range in red in Fig. 4). Nevertheless, we found that the numerical values of some parameters would sometimes require shorter time-step than daily one used here (blue boxplot vs. magenta box in Fig. 4). Indeed, magenta box in Fig. 4 shows the upper value behind which the net flux is larger than the pool it leaves at daily time-step. The issue was particularly true for ${k}^{i-sol\to i-sec}$ whose the global median in (model = GPASOIL-v1) (12.6 mgP (kg soil)⁻¹ day⁻¹ (mg P/L)^−b) is well above the value consistent with a daily time-step ($\overline{{W}_{{\rm{abs}}}}{\left({P}_{C,\infty }\right)}^{1-b}$ has a median value of 0.05 mgP (kg soil)⁻¹ day⁻¹ (mg P/L)^−b) (Fig. 4). As an equilibrium was assumed between P_i-sol and P_i-sec, this issue does not exist for neither ${k}^{i-sol\to i-lab}$ nor ${k}^{i-lab\to i-sol}$. To solve this issue, a shorter time-step should be chosen, but the time-step needed (e.g. ~5 minutes based on median values given above) is not consistent with global application. Our strategy was to not put any constraints on the parameters themselves nor on each gross flux but we prevent a net flux to be larger than the pool from which this net flux leaves.

Figure S5 shows the sensitivity to the time-steps chosen (1-day or 6-hours time-steps) on a given grid-cell. In such sensitivity tests, the flux parameters were modified by multiplying them by the ratios of the different time-steps. The effect is particularly visible on time-steps with positive soil P budget (Figs. S5, S6) for which the inconsistency between a long time-step (1 day) and ${k}^{i-sol\to i-sec}$ reduces the net flux from P_i-sol to P_i-sec, letting the soil P input within P_i-lab. The effect of the time-step chosen could be significant at grid-cell scale (e.g. + 17% larger P_i-lab for 1-day time-step than for 6-hours time-step in 2010, panels (b) and (c) of Fig. S5) or regionally (−15% lower P_i-lab in China for 1-day time-step than for 2-days time-step, Fig. S6a) but remains small at the global scale (few percent of difference in the global average of both P_i-sec and P_i-lab between 1-day time-step vs 2-days time-step, Fig. S6).

Effect of the update of each component (data and model) from GPASOIL-v0 on P_i-lab simulated in 2018

Effect of the update of the dataset to describe the drivers (from (data = GPASOIL-v0; model = GPASOIL-v0) to (data = GPASOIL-v1; model = GPASOIL-v0))

Here we focus on the effect of using new datasets to describe the different drivers with using the same model as the one of Ringeval et al.¹¹ (i.e. model = GPASOIL-v0). Figure S7 shows first that (data = GPASOIL-v1) modified the simulations of P_i-lab and P_x-tot for both cropland and grassland (comparison between 1^st and last line of Fig. S7). In particular, P_i-lab of grassland is strongly reduced with (data = GPASOIL-v1) (20 kgP ha⁻¹ at the global scale in 2005) as compared to (data = GPASOIL-v0) (208 kgP ha⁻¹). Sensitivity tests (Fig. S7) show that BIOG explains the most the difference between (data = GPASOIL-v1) and (data = GPASOIL-v0) for P_x-tot. Conversely, FARM is the main driver of the difference between (data = GPASOIL-v1) and (data = GPASOIL-v0) for P_i-lab of grassland, and a combination of BIOG and FARM explains the difference for P_i-lab of cropland. The key roles played by BIOG and FARM in the spatial distribution of simulated soil P pools were already demonstrated in Ringeval et al.¹¹.

The biggest difference in FARM between (data = GPASOIL-v0) and (data = GPASOIL-v1) for grassland concerns the P uptake which is much larger in (data = GPASOIL-v1) than in (data = GPASOIL-v0) (Fig. S8). In (data = GPASOIL-v0), the plant uptake of grassland was constrained to be equal to ~90% of the total P input¹³.

The spatial extent of simulated P_i-lab is larger in (data = GPASOIL-v1) (~36000 grid-cells and 1481 millions of ha for cropland; ~41000 grid-cells and 3237 millions of ha for grassland) than in (data = GPASOIL-v0) (~15000 grid-cells and 1048 millions of ha for cropland; ~20000 grid-cells and 2215 millions of ha for grassland). This is explained by a Boolean treatment of cropland in Bouwman et al.¹³ (with grid-cells with small cropland fraction not considered) and resulting to exclusion of around 1/3 of the real cropland area in GPASOIL-v0.

Effect of the update of soil P dynamic model (from (data = GPASOIL-v1; model = GPASOIL-v0) to (data = GPASOIL-v1; model = GPASOIL-v1))

Here, we prescribed (data = GPASOIL-v0) for the whole section and assessed the effect of using (model = GPASOIL-v1) instead of (model = GPASOIL-v0) (Figs. 5, 6 for cropland and grassland, respectively). Overall, for cropland, compared to (model = GPASOIL-v0), (model = GPASOIL-v1) tends to increase vs. decrease P_i-lab in grid-cells where values were very low vs. very high, respectively, making the spatial soil P distribution more homogeneous (Fig. S9). For grassland, (model = GPASOIL-v1) tends to increase P_i-lab almost everywhere as compared to (model = GPASOIL-v0) (38 vs 23 kgP ha⁻¹, respectively) (Fig. 6). Note that, while P_i-lab is not defined equally in (model = GPASOIL-v0) and (model = GPASOIL-v1) (as P_i-sol is only distinguished from P_i-lab in (model = GPASOIL-v1)), the comparison between the two models can nevertheless focus on P_i-lab as the maximum value for the ratio P_i-sol/P_i-lab reaches only 0.2% in (model = GPASOIL-v1). We found that (model = GPASOIL-v1) allows soil P pools simulated to be more consistent with the P plant uptake prescribed (last column of Figs. 5, 6), especially for grassland (Fig. 6).

Soil exchanges with P_x-occ play a key-role in the difference between (model = GPASOIL-v0) and (model = GPASOIL-v1). This in particularly true for grassland for which, the model compensates negative soil P budget by P taken from P_x-occ (3^rd column of Fig. 6) while de-occlusion is not allowed in (model = GPASOIL-v0). The amount of P transferred from P_x-occ to other pools (P_i-sec then P_i-lab) in (model = GPASOIL-v1) depends on the mineralization rates of organic pools (P_o-lab and P_o-sta) used in the model. Sensitivity tests showed that the P_x-occ for global grassland decreased much less in time if we increased the mineralization rates (i.e. decrease the organic residence times) (Fig. 7). Our soil P dynamic model for organic P is too simplistic (no biochemical mineralization, no link with carbon, no microbial pools²⁷) and building a more realistic model based on measurements would require a specific study. As it is likely that the temporal variations of global grassland soil P were small during the 1^st half of the 20^th century, we chose mineralization rates that allow to approach this global pattern in our simulations, i.e. ${k}^{o-sta\to i-lab}$ = 1.8e⁻⁴day⁻¹ and ${k}^{o-lab\to i-lab}$ = 1.4e⁻³day⁻¹ (i.e. a residence time of 15 yr and 2 yr, respectively) (Fig. 7). Such configuration is called GPASOIL-v.1.1. Even with increased mineralization, a decrease of P_x-occ over time remained, partly attributed to losses through erosion and to transfers to sustain the prescribed P uptake (Fig. 7c). The effect on the global average of the change in mineralization rate is large (from 149 kgP ha⁻¹ for cropland and 38 for grassland with GPASOIL-v1.0 to 187 for cropland and 91 for grassland with GPASOIL-v1.1).

We also found that the global averaged soil P pools were sensitive to the value of ${P}_{c,\infty }$ chosen. The choice of ${P}_{c,\infty }$ has a small effect on the global average for common values of ${P}_{c,\infty }$ (global averages of cropland P_i-lab of 149 kgP ha⁻¹ for default value of ${P}_{c,\infty }$ (0.1 mgP L⁻¹) vs 174 and 148 for kgP ha⁻¹ for ${P}_{c,\infty }$ equal to the 1^st and 3^rd quartile of ${P}_{c,\infty }$ found in Helfenstein et al.²⁵, respectively; 2^nd and 3^rd line of Fig. S10) but a large effect for lowest values of ${P}_{c,\infty }$ such as values that could be found in Andosols or Ferralsols (global averaged P_i-lab of 239 kgP ha⁻¹ for ${P}_{c,\infty }$ = 0.005 mgP L⁻¹, 1^st line of Fig. S10). This suggest that more work is needed to constrain further spatial variation of ${P}_{c,\infty }$ (see the “Usages Notes” section).

Best simulations, uncertainty associated and comparison to previous estimates

In the following, we focused on the GPASOIL-v1.1 simulations as we considered them as our best estimates. Figure 8 shows the spatial distribution of the soil P budget (soil P input - output) for P_i-lab. Figure 9 shows the input/output for P_i-lab (including soil P input/output and fluxes resulting from soil P dynamics) and soil P pools simulated for GPASOIL-v1.1 for both cropland and grassland for few countries and at the global scale. It underlines the transfer from P_i-lab to P_i-sec (through P_i-sol) for countries as France (in 1970s) or China (in the current time period) in cropland, when soil P input are larger than soil P output. Flux from P_i-sec to P_i-lab was simulated for cropland in Russia or for many countries in grassland. Such flux also occurred for France in the current time-period in cropland but with a smaller magnitude (average of 0.9 kgP ha⁻¹ yr⁻¹ over 2010–2018 in cropland for France), which corresponds to transfer from P inherited from past fertilizer applications to current plant uptake.

Figure 10 shows the comparison, for the top 0–0.3 m soil layer, between GPASOIL-v1.1 and other published modelling products available: GPASOIL-v0, Zhang et al.¹², and McDowell et al.⁵⁸. In parallel to the studies based on process-based model of soil P dynamics mentioned in the introduction^11,12, we also included a comparison against a global dataset of soil plant available P based on a statistical model trained on a database of measured soil P concentration⁵⁸. McDowell et al.⁵⁸ did not focus on agricultural soils but their dataset encompassed these soils, and we filtered them in the scope of our comparison. GPASOIL-v1.1, GPASOIL-v0 and Zhang et al.¹² provide soil P pools for top 0–0.3 m in kgP ha⁻¹ for cropland (and grassland) while McDowell et al.⁵⁸ provide soil P concentration for top 0–0.2 m in mgP (kg of soil)⁻¹ for any land-use categories at 1 km² resolution. We used the land-use information used in McDowell et al.⁵⁸ to filter their estimate and computed soil P for cropland and grassland only, before regriding them to half-degree regular resolution. McDowell et al.⁵⁸ provide soil P pools for cropland, rangeland, improved grassland, forest and non-productive areas. According to our definition of grassland, we gathered ‘improved grassland’ and ‘rangeland’ classes of McDowell et al.⁵⁸ to compare to our grassland soil P pools estimates. Then, we convert soil P concentration in kgP ha⁻¹ for top 0–0.3 m by using the bulk density used in our study and by assuming that the P concentration provided by McDowell et al.⁵⁸ is also representative to 0–0.3 m.

Global average of P_i-lab decreased for both cropland and grassland from GPASOIL-v0 to GPASOIL-v1.1. The relative decrease was even larger for grassland (from 208 to 91 kgP ha⁻¹) than for cropland (from 253 to 187 kgP ha⁻¹). P_i-lab reached 238 kgP ha⁻¹ in Zhang et al.¹² for cropland, and 62 and 32 kgP ha⁻¹ in McDowell et al.⁵⁸ for cropland and grassland, respectively. P_i-lab provided by McDowell et al.⁵⁸ is difficult to compare to other estimates used here as it is based on Olsen P extraction vs. pool design based on Hedley fractionation used in other studies. For cropland, we simulated larger P in South America in GPASOIL-v1.1 than in GPASOIL-v0, a lower P content in Africa in GPASOIL-v1.1 than in GPASOIL-v0 while some patterns remains stable in the two versions (high soil P in India and China, low P content in West Europe and Russia).

Differences in the global averaged soil P between GPASOIL-v1.1 and McDowell et al.⁵⁸ are likely to be caused by the shallower depth used in McDowell et al.⁵⁸ and the different P fractions measured (Olsen P) or estimated (P_i-lab). The effect of difference in soil depth considered is not straightforward, depending on the farming practices (tillage, fertilization, etc.⁶¹). However, we do know the likely effect of the Olsen-P vs P_i-lab. Labile inorganic P from Hedley is close to the Colwell P test used in Australia; both use a 16 hours soil extraction with 0.5 M ${{\rm{HCO}}}_{{3}^{-}}$. There are equations to estimate Colwell P from Olsen-P while these equations differ according to soil P sorption capacity (e.g., Colwell-P is 2.869 × Olsen-P - 2.93 for non-calcareous soils and 1.376 × Olsen-P + 8.80 for calcareous soils)⁶². After converting the values derived from McDowell et al.⁵⁸ for (more common) non-calcareous soils, we get 165 and 81 kgP ha⁻¹, for cropland and grassland respectively; meaning that the global values calculated in McDowell et al.⁵⁸ are within the same range as P_i-lab simulated here.

As the biogeochemical background was found to be a major driver of the spatial distribution of P_x-tot¹¹, and because the same biogeochemical background was considered for both cropland and grassland, P_x-tot was similar between cropland and grassland for GPASOIL-v1.1. The same reasoning could be applied to GPASOIL-v0. P_x-tot for cropland shows also some similarities between Zhang et al.¹² and GPASOIL-v0 as both relied on the same dataset to approach the initial soil P content⁴. Differences in P_x-tot between GPASOIL-v0 and GPASOIL-v1.1 were linked to differences in P_x-tot of unmanaged soil used in each study (Yang et al.⁴ for v0 vs. He et al.¹⁴ for v1).

The coefficients of variation for GPASOIL-v1.1 for P_i-lab and P_x-tot were plotted in Fig. 11 for both cropland and grassland. These coefficients were due to uncertainty in drivers estimates only. Uncertainty was larger for P_i-lab than for P_x-tot as the uncertainty in BIOG explained exclusively the uncertainty in P_x-tot while all drivers contributed to the uncertainty in P_i-lab. Uncertainty of P_x-tot was similar for both cropland and grassland (global averaged coefficient of variation of 0.17) while much larger for grassland (0.54) than for cropland (0.22) for P_i-lab. This was explained by the large uncertainty in plant uptake due to uncertainty in both grassland NPP and P concentration of grass. Using mean values for variables related to FARM (instead of random ones) makes the global averaged coefficients of variation of P_i-lab to decrease up to 0.14 for cropland and 0.24 for grassland, underlying the large contribution of FARM uncertainty in uncertainty of simulated P_i-lab.

Evaluation of the simulated soil P pools at different scales (country, continental/watershed, global scale)

The evaluation of predicted soil P is challenging due to lack of data compiling Hedley P measurements from cropland/grassland soils at the global scale. Different chemical extraction protocols are used worldwide with difficulties to find correspondence among different extractions. In addition, some chemical extraction in cropland/grassland soils at long-term field sites can be found in the literature but these sites usually encompass different P treatments (varying within their amount/rate/nature of fertilizer) with difficulties to link with farming practices occurring in the region/grid-cell including the sites. Here we relied on 3 regional datasets based on soil monitoring networks that carried out measurements on soil samples: RMQS over France⁶³, LUCAS over Europe¹⁹, and STS over USA/Canada⁶⁴. They differ in the chemical extraction used, the spatial extent and resolution, and the distinction (or not) between cropland and grassland. Characteristics of each dataset is given in Table 12.

Table 12 Information about the regional dataset used to evaluate the soil P distribution simulated in our study (GPASOIL-v1.1).

Full size table

For both RMQS and LUCAS, we averaged all sites within each 0.5 grid-cell separately for cropland and grassland, by considering land-uses types of each dataset (Table 12). For both RMQS and LUCAS, a detection limit for P-Olsen measurement was given in the dataset description (5 mgP (kg of soil)⁻¹ for RMQS, 10 mgP (kg of soil)⁻¹ for LUCAS) but we did not exclude the sites below this threshold to prevent bias in our grid-cell averages. All dataset (RMQS, LUCAS, STS) provided soil P in terms of concentration and we converted them in kgP ha⁻¹ by using the half-degree grid-cell bulk density and volumetric fraction of coarse fragments used previously in our study and by considering the P concentration given in each dataset representative to the top 0–0.3 m. The soil P content at half-degree resolution in kgP ha⁻¹ from the 3 datasets were provided in Figs. S11–S13.

As the datasets and our simulation did not rely on the same chemical extraction (pools representative to Hedley extraction in our simulation vs Olsen P for RMQS and LUCAS vs Bray and Kurtz P1 Equivalent for STS, Table 12), we focused our evaluation on the comparison of the relative spatial distribution. To do so, we first mapped our simulation and each dataset by using deciles (Figs. 12–14). As the number of grid-cells considered vary between our GPASOIL-v0 and GPASOIL-v1.1, we masked the grid-cells in the dataset that are excluded from our simulations (Figs. 12–14). Spearman correlation was computed on the absolute value (i.e. not in decile but in kgP ha⁻¹) and provided in Table 13. In the scope to compare our simulation to STS, we computed first state/province scale values from our simulation before computing relative distribution with decile. We excluded from the comparison the states/provinces for which our simulation does not provide 75% of the land in farm for the considered state/province. The treatment of our simulation to compare it to STS was plotted in Fig. S14. The year used in our simulation for the comparison to the different datasets was the exact year of the dataset for GPASOIL-v1.1 (2002–2009 average for RMQS, 2015 for LUCAS, 2015 for STS) and 2005 for GPASOIL-v0 (as it is the only one year available in this latter). The comparison distinguished cropland and grassland when possible (RMQS and LUCAS).

Table 13 Spearman correlations computed between the different dataset (RMQS, LUCAS, STS) and P_i-lab simulated by McDowell et al.⁵⁸, Zhang et al.¹², GPASOIL-v0 or GPASOIL-v1.1.

Full size table

Overall, we found that GPASOIL-v1.1 had higher correlation than GPASOIL-v0 for grassland over France when using RMQS (+0.32 for GPASOIL-v1.1 vs non-significant correlation for GPASOIL-v0), for cropland over Europe (+0.18 for GPASOIL-v1.1 vs non-significant correlation for GPASOIL-v0, both by using LUCAS) and for Canada/USA (no distinction between cropland and grassland) (+0.32 for GPASOIL-v1.1 vs non-significant correlation for GPASOIL-v0). In particular, GPASOIL-v1.1 was able to catch the higher soil P content in grassland of West of France (Fig. 12). However, a similar agreement was not found between GPASOIL-v1.1 and LUCAS after extracting French sites from LUCAS (Table 13 and Fig. S15). This can be explained by a relatively low spearman correlation between RMQS and LUCAS over France for grassland (0.47; p-value = 0.00). GPASOIL-v1.1 was also able to catch the high soil P content in states/provinces of the north centre and south centre of the USA and Canada.

The update from GPASOIL-v0 to GPASOIL-v1.1 did not improve the correlation with datasets neither for cropland in France nor for grassland in Europe (Table 13). In particular, GPASOIL-v1.1 simulated a low soil P content in cropland of north of France contrary to what was found in RMQS. A low soil P content was also found in the north of France in GPASOIL-v0 (Fig. 12). In GPASOIL-v1.1, this pattern was explained by a large P uptake of root crops (sugar beet, potatoes) in that region. Overall, the soil P pools simulated in north of Europe are sensitive to root crop parameters related to uptake and residues. GPASOIL-v1.1 has some difficulties to catch the increasing soil P along a south-east to north-west gradient found in both cropland and grassland at the European scale in LUCAS (Fig. 13). In particular, the spearman correlation between LUCAS and GPASOIL-v1.1 was significantly negative for grassland (−0.13, p-value = 0.00, Table 13). GPASOIL-v1.1 had also difficulties to reproduce low soil P in East and West of Canada (Fig. 14).

Correlations between McDowell et al.⁵⁸ and regional soil P datasets are higher than correlations found with GPASOIL-v1.1 for both RMQS and LUCAS, but not for STS. The high correlations between McDowell et al.⁵⁸ and LUCAS is explained by the fact that LUCAS was one the data source used to train the statistical model in McDowell et al.⁵⁸. Also, the fact that both RMQS and McDowell et al.⁵⁸ are based on Olsen P can contribute to explain the high correlation between them. Difficulties of McDowell et al.⁵⁸ to match STS (that did not provide separate values for cropland vs grassland) could be partly explained by the fact that we considered all “rangeland” pixels in McDowell et al.⁵⁸ as grassland, contributing to unrealistic contribution of rangeland soil P pools to agricultural pools at states/provinces scale.

At that stage, it is difficult to understand if the mismatch between GPASOIL-v1.1 and the different soil P dataset was explained by misrepresentation of some drivers (in particular FARM or SPRO), incorrect soil P dynamics modeling or uncertainties related to the use of the different dataset for the comparison (see next Section).

Usages Notes

The soil P pools available in the GPASOIL-v1.1 dataset are model estimates and not free from error due to imperfect modeling (e.g. soil organic P dynamics), inaccurate/uncertainties in inputs (e.g. use of dataset about N for manure to derive P in manure, bulk density not representative to agricultural soils, etc.), limitation in the estimate of some drivers (e.g. difficulty to represent P uptake in grassland), misreporting in agricultural census statistics at the basis of few input (e.g. crop P uptake), and constant in space information (e.g. soil plough layer). Only few sources of uncertainty are considered in our approach (uncertainty related to the datasets describing the drivers to provide the coefficients of variation, uncertainty related to the mineralization rates for GPASOIL-v1.0 and GPASOIL-v1.1).

Caution is particularly required if the user wants to focus on a restricted geographical area (e.g. ≤ continent) of the estimates.

The variables used as input of the soil P dynamic model (the so-called drivers) provided with the soil P pools simulated thanks to our approach should not be used separately to the soil P pools and the users should instead use original dataset if he.she wants to focus on a given driver.

It is worth noting that GPASOIL-v1.1 does not exactly match the spatial distribution given by regional soil P dataset. This mismatch can be explained by misrepresentation of some drivers (in particular FARM or SPRO), incorrect soil P dynamics modeling or uncertainties related to the use of the different dataset for the comparison. About the misrepresentation of some drivers, we made use of the fact that both RMQS and LUCAS provide, in addition to soil P used in the evaluation of our simulations, information about pH, soil organic carbon, and soil texture, i.e. soil properties that are involved in our soil P dynamics parameterization (SPRO driver). Figure S16 shows the comparison for these soil properties between Soilgrids 2.0 (used in our approach), RMQS and LUCAS. Some spatial patterns are not consistent between Soilgrids 2.0, RMQS and LUCAS (e.g. pH). Also, the comparison lets suggest that we would need to distinguish cropland vs grassland for soil parameters involved in the soil P dynamics model.

To improve our understanding of the mismatch between GPASOIL-v1.1 and soil P dataset, it could also be particularly interesting to investigate if the correlation between GPASOIL-v1.1 and LUCAS would be improved in case of simulations over Europe where both soil P input/output and soil dynamics parameterization were improved. European simulation of GPASOIL-v1.1 could be forced by soil P input/output of Panagos et al.⁶⁵. How the correlation with LUCAS evolved in sensitivity tests where some crop-specific parameters related to uptake and residues (in particular for root crops) are made varying should also be studied. Soil P dynamics could be improved for European simulations as LUCAS provide information about soil oxides, which can allow to use initial parameterizations of Wang et al.¹⁸ (instead of modified ones as descried in Table 10) and in particular spatially varying concentration of P in soil solution at steady-state (${P}_{c,\infty }$). Also, it could be interesting to evaluate if the soil P dynamic model, even without being fed by oxide concentrations, can match the known global distribution of soil phosphorus retention potential⁶⁶. Finally, the extension of dataset such as LUCAS or RMQS with Hedley fractionation measurements would also allow a more straightforward comparison to modelling studies.

Code availability

The scripts of the modelling approach described in this paper was made available at recherche.data.gouv.fr⁵⁷. Our code is available within the GPASOIL_scripts.tgz file along the tgz file that contain the GPASOIL-v1.0 and GPASOIL-v1.1 soil P pools.

GPASOIL_scripts.tgz contains the whole file tree and associated scripts that were used to generate GPASOIL-v1_output.tgz. The directory GPASOIL/ contains few sub-directories:

- GrabData_and_PrepDriver/ that contains all scripts (bash and python) used to generate the input of the soil P dynamic model. These input correspond to the different drivers described in the current manuscript. The input files generated (in netcdf format) are within the directory output_prepDriver/ and are here provided. The procedures to download the original dataset (if this dataset is available on the web) used to generate the input files and information are given for each driver in specific README.txt files.

- main/ contains the soil P dynamic model. Different version are provided: (model = v0), (model = v1.0) and (model = v1.1). The 2nd one (i.e. (model = v1.0)) was used to generate GPASOILv1.0 while the latter (i.e. (model = v1.1)) was used to generate GPASOILv1.1

- evaluation/ contains the script to compare the model output to regional databases. The databases are not provided. General information about the procedure to get the regional databases are provided in a specific README.txt file.

- output_main/ is the directory that receives the different soil P maps simulated.

A container generated with Singularity 3 was also provided and allows users to run the different scripts on a server or local computer without issue of incompatibility about Ubuntu distribution or Python packages. Modeling and analysis were performed in using Python (Python Software Foundation. Python Language Reference, version 3.6.10., available at http://www.python.org, last access: January 2020).

References

Hou, E. et al. Global meta-analysis shows pervasive phosphorus limitation of aboveground plant production in natural terrestrial ecosystems. Nat. Commun. 11, (2020).
Kvakić, M. et al. Quantifying the Limitation to World Cereal Production Due To Soil Phosphorus Status. Glob. Biogeochem. Cycles https://doi.org/10.1002/2017GB005754 (2018).
Carpenter, S. R. Eutrophication of aquatic ecosystems: Bistability and soil phosphorus. Proc. Natl. Acad. Sci. 102, 10002–10005 (2005).
Article CAS PubMed PubMed Central ADS Google Scholar
Yang, X., Post, W. M., Thornton, P. E. & Jain, A. The distribution of soil phosphorus for global biogeochemical modeling. Biogeosciences 10, 2525–2537 (2013).
Article CAS ADS Google Scholar
Achat, D. L., Pousse, N., Nicolas, M., Brédoire, F. & Augusto, L. Soil properties controlling inorganic phosphorus availability: general results from a national forest network and a global compilation of the literature. Biogeochemistry 127, 255–272 (2016).
Article CAS Google Scholar
He, X. et al. Global patterns and drivers of soil total phosphorus concentration. Earth Syst. Sci. Data 13, 5831–5846 (2021).
Article ADS Google Scholar
Elser, J. & Bennett, E. A broken biogeochemical cycle. Nature 478, 29–31 (2011).
Article CAS PubMed ADS Google Scholar
MacDonald, G. K., Bennett, E. M., Potter, P. A. & Ramankutty, N. Agronomic phosphorus imbalances across the world’s croplands. Proc. Natl. Acad. Sci. USA 108, 3086–3091 (2011).
Article CAS PubMed PubMed Central ADS Google Scholar
Demay, J., Ringeval, B., Pellerin, S. & Nesme, T. Half of global agricultural soil phosphorus fertility derived from anthropogenic sources. Nat. Geosci. 16, 69–74 (2023).
Article CAS ADS Google Scholar
Sattari, S. Z., Bouwman, A. F., Giller, K. E. & van Ittersum, M. K. Residual soil phosphorus as the missing piece in the global phosphorus crisis puzzle. Proc. Natl. Acad. Sci. USA 109, 6348–6353 (2012).
Article CAS PubMed PubMed Central ADS Google Scholar
Ringeval, B. et al. Phosphorus in agricultural soils: drivers of its distribution at the global scale. Glob. Change Biol. 23, 3418–3432 (2017).
Article ADS Google Scholar
Zhang, J. et al. Spatiotemporal dynamics of soil phosphorus and crop uptake in global cropland during the 20th century. Biogeosciences 14, 2055–2068 (2017).
Article ADS Google Scholar
Bouwman, L. et al. Exploring global changes in nitrogen and phosphorus cycles in agriculture induced by livestock production over the 1900–2050 period. Proc. Natl. Acad. Sci. USA 52, https://doi.org/10.1073/pnas.1012878108 (2013).
He, X. et al. Global patterns and drivers of phosphorus fractions in natural soils. Biogeosciences 20, 4147–4163 (2023).
Article ADS Google Scholar
Wang, Y. P., Law, R. M. & Pak, B. A global model of carbon, nitrogen and phosphorus cycles for the terrestrial biosphere. Biogeosciences 7, 2261–2282 (2010).
Article CAS ADS Google Scholar
Helfenstein, J. et al. Combining spectroscopic and isotopic techniques gives a dynamic view of phosphorus cycling in soil. Nat. Commun. 9, 3226 (2018).
Article PubMed PubMed Central ADS Google Scholar
Helfenstein, J. et al. Estimates of mean residence times of phosphorus in commonly considered inorganic soil phosphorus pools. Biogeosciences 17, 441–454 (2020).
Article CAS ADS Google Scholar
Wang, Y. et al. Toward a Global Model for Soil Inorganic Phosphorus Dynamics: Dependence of Exchange Kinetics and Soil Bioavailability on Soil Physicochemical Properties. Glob. Biogeochem. Cycles 36, (2022).
Jones, A., Fernandez-Ugalde, O. & Scarpa, S. LUCAS 2015 topsoil survey: presentation of dataset and results. vol. https://doi.org/10.2760/616084, JRC121325 (EUR 30332 EN, Publications Office of the European Union: Luxembourg, 2020).
Chini, L. et al. Land-use harmonization datasets for annual global carbon budgets. Earth Syst. Sci. Data 13, 4175–4189 (2021).
Article ADS Google Scholar
Poggio, L. et al. SoilGrids 2.0: producing soil information for the globe with quantified spatial uncertainty. SOIL 7, 217–240 (2021).
Article CAS ADS Google Scholar
Bronick, C. J. & Lal, R. Soil structure and management: a review. Geoderma 124, 3–22 (2005).
Article CAS ADS Google Scholar
Klein Goldewijk, K., Beusen, A., Doelman, J. & Stehfest, E. Anthropogenic land use estimates for the Holocene – HYDE 3.2. Earth Syst. Sci. Data 9, 927–953 (2017).
Article ADS Google Scholar
Krinner, G. et al. A dynamic global vegetation model for studies of the coupled atmosphere-biosphere system. Glob. Biogeochem. Cycles 19, n/a–n/a (2005).
Article Google Scholar
Helfenstein, J., Jegminnat, J., McLaren, T. L. & Frossard Soil solution phosphorus turnover: derivation, interpretation, and insights from a global compilation of isotope exchange kinetic studies. Biogeosciences 15, 105–114 (2018).
Article CAS ADS Google Scholar
Buendía, C., Kleidon, A. & Porporato, A. The role of tectonic uplift, climate, and vegetation in the long-term terrestrial phosphorous cycle. Biogeosciences 7, 2025–2038 (2010).
Article ADS Google Scholar
McConnell, C. A., Kaye, J. P. & Kemanian, A. R. Reviews and syntheses: Ironing out wrinkles in the soil phosphorus cycling paradigm. Biogeosciences 17, 5309–5333 (2020).
Article CAS ADS Google Scholar
Beck, M. A. & Sanchez, P. A. Soil phosphorus movement and budget after 13 years of fertilized cultivation in the Amazon basin. Plant Soil 184, 23–31 (1996).
Article CAS Google Scholar
Sun, Y. et al. Global evaluation of the nutrient-enabled version of the land surface model ORCHIDEE-CNP v1.2 (r5986). Geosci. Model Dev. 14, 1987–2010 (2021).
Article CAS ADS Google Scholar
Zhang, B. et al. Global manure nitrogen production and application in cropland during 1860–2014: a 5 arcmin gridded global dataset for Earth system modeling. Earth Syst. Sci. Data 9, 667–678 (2017).
Article ADS Google Scholar
Xu, R. et al. Increased nitrogen enrichment and shifted patterns in the world’s grassland: 1860–2016, 13, Earth Syst. Sci. Data, 11, 175–187, https://doi.org/10.5194/essd-11-175-2019 (2019).
Lun, F. et al. Global and regional phosphorus budgets in agricultural systems and their implications for phosphorus-use efficiency. Earth Syst. Sci. Data 10, 1–18 (2018).
Article ADS Google Scholar
Velthof, G. L., Bannink, A. & Oenema, O. Relationships between animal nutrition and manure quality.
Lu, C. & Tian, H. Global nitrogen and phosphorus fertilizer use for agriculture production in the past half century: shifted hot spots and nutrient imbalance. Earth Syst. Sci. Data 9, 181–192 (2017).
Article ADS Google Scholar
Sun, Y. et al. Diagnosing phosphorus limitations in natural terrestrial ecosystems in carbon cycle models. Earths Future 5, 730–749 (2017).
Article CAS PubMed PubMed Central ADS Google Scholar
Wang, J. et al. Vegetation type controls root turnover in global grasslands. Glob. Ecol. Biogeogr. 28, 442–455 (2019).
Article CAS Google Scholar
Zhang, X. & Wang, W. The decomposition of fine and coarse roots: their global patterns and controlling factors. Sci. Rep. 5, 9940 (2015).
Article CAS PubMed PubMed Central ADS Google Scholar
Smith, B. et al. Implications of incorporating N cycling and N limitations on primary production in an individual-based dynamic vegetation model. Biogeosciences 11, 2027–2054 (2014).
Article ADS Google Scholar
Monfreda, C., Ramankutty, N. & Foley, J. A. Farming the planet: 2. Geographic distribution of crop areas, yields, physiological types, and net primary production in the year 2000: GLOBAL CROP AREAS AND YIELDS IN 2000. Glob. Biogeochem. Cycles 22, n/a–n/a (2008).
Article Google Scholar
Sun, Y. et al. Field‐Based Estimation of Net Primary Productivity and Its Above‐ and Belowground Partitioning in Global Grasslands. J. Geophys. Res. Biogeosciences 126, (2021).
Kastner, T. et al. Land use intensification increasingly drives the spatiotemporal patterns of the global human appropriation of net primary production in the last century. Glob. Change Biol. gcb.15932, https://doi.org/10.1111/gcb.15932 (2021).
Wang, Y. et al. GOLUM-CNP v1.0: a data-driven modeling of carbon, nitrogen and phosphorus cycles in major terrestrial biomes. Geosci. Model Dev. 11, 3903–3928 (2018).
Article CAS ADS Google Scholar
Lun, F. et al. Influences of international agricultural trade on the global phosphorus cycle and its associated issues. Glob. Environ. Change 69, 102282 (2021).
Article Google Scholar
Smil, V. Phosphorus in the environment: natural flows and human interferences. Annu. Rev. Energy Environ. 25, 53–88 (2000).
Article Google Scholar
Zechmeister-Boltenstern, S. et al. The application of ecological stoichiometry to plant–microbial–soil organic matter transformations. Ecol. Monogr. 85, 133–155 (2015).
Article Google Scholar
Spohn, M. Increasing the organic carbon stocks in mineral soils sequesters large amounts of phosphorus. Glob. Change Biol. 26, 4169–4177 (2020).
Article ADS Google Scholar
Bentsen, N. S., Felby, C. & Thorsen, B. J. Agricultural residue production and potentials for energy and materials services. Prog. Energy Combust. Sci. 40, 59–73 (2014).
Article Google Scholar
Ye, Y. et al. Carbon, Nitrogen and Phosphorus Accumulation and Partitioning, and C:N:P Stoichiometry in Late-Season Rice under Different Water and Nitrogen Managements. PLoS ONE 9, e101776 (2014).
Article PubMed PubMed Central ADS Google Scholar
Wang, R. et al. Significant contribution of combustion-related emissions to the atmospheric phosphorus budget. Nat. Geosci. 8, 48–54 (2015).
Article CAS ADS Google Scholar
Mahowald, N. et al. Global distribution of atmospheric phosphorus sources, concentrations and deposition rates, and anthropogenic impacts. Glob. Biogeochem. Cycles 22, n/a–n/a (2008).
Article Google Scholar
Wang, R. et al. Global forest carbon uptake due to nitrogen and phosphorus deposition from 1850 to 2100. Glob. Change Biol. 23, 4854–4872 (2017).
Article ADS Google Scholar
van Puijenbroek, P. J. T. M., Beusen, A. H. W. & Bouwman, A. F. Global nitrogen and phosphorus in urban waste water based on the Shared Socio-economic pathways. J. Environ. Manage. 231, 446–456 (2019).
Article PubMed Google Scholar
Donatello, S. & Cheeseman, C. R. Recycling and recovery routes for incinerated sewage sludge ash (ISSA): A review. Waste Manag. 33, 2328–2340 (2013).
Article CAS PubMed Google Scholar
Borrelli, P. et al. An assessment of the global impact of 21st century land use change on soil erosion. Nat. Commun. 8, (2017).
van den Hurk, B. et al. LS3MIP (v1.0) contribution to CMIP6: the Land Surface, Snow and Soilmoisture Model Intercomparison Project – aims, setup and expected outcome. Geosci. Model Dev. 9, 2809–2832 (2016).
Article ADS Google Scholar
Valade, A. et al. Modeling sugarcane yield with a process-based model from site to continental scale: uncertainties arising from model structure and parameter values. Geosci. Model Dev. 7, 1225–1245 (2014).
Article ADS Google Scholar
Ringeval, B. GPASOIL-v1, a global dataset on phosphorus in agricultural soils. Recherche Data Gouv. https://doi.org/10.57745/XZTW7Z (2023).
McDowell, R. W., Noble, A., Pletnyakov, P. & Haygarth, P. M. A Global Database of Soil Plant Available Phosphorus. Sci. Data 10, 125 (2023).
Article CAS PubMed PubMed Central Google Scholar
Iizumi, T. & Sakai, T. The global dataset of historical yields for major crops 1981–2016. Sci. Data 7, 97 (2020).
Article PubMed PubMed Central Google Scholar
Senthilkumar, K., Nesme, T., Mollier, A. & Pellerin, S. Conceptual design and quantification of phosphorus flows and balances at the country scale: The case of France. Glob. Biogeochem. Cycles 26, GB2008 (2012).
Article ADS Google Scholar
Tian, J. et al. Accumulation and distribution of phosphorus in the soil profile under fertilized grazed pasture. Agric. Ecosyst. Environ. 239, 228–235 (2017).
Article CAS Google Scholar
Moody, P. W., Speirs, S. D., Scott, B. J. & Mason, S. D. Soil phosphorus tests I: What soil phosphorus pools and processes do they measure? Crop Pasture Sci. 64, 461 (2013).
Article CAS Google Scholar
Jolivet, C., Boulonne, L. & Ratié, C. Manuel du réseau de Mesures de la Qualité des Sols. Unité InfoSol - INRA Orléans - Fr. (2006).
TFI. Soil Test Levels in North America, 2020 Summary Update. (2020).
Panagos, P. et al. Improving the phosphorus budget of European agricultural soils. Sci. Total Environ. 853, 158706 (2022).
Article CAS PubMed ADS Google Scholar
Batjes, N. H. Global distribution of soil phosphorus retention potential.
Bouwman, A. F. & Kram. Integrated modelling of global environmental change: an overview of IMAGE 2.4. (Netherlands Environmental Assessment Agency, 2006).
Xu, R. et al. Half-degree gridded manure and fertilizer nitrogen inputs in global grassland systems during 1860–2016, PANGAEA, https://doi.org/10.1594/PANGAEA.892940 (2018).
Demay, J. Supplementary Information - Half of global agricultural soil phosphorus fertility derived from anthropogenic sources. https://doi.org/10.57745/LEPJCS (2023).
Kastner, T. et al. Data supplement for ‘Land use intensification increasingly drives the spatiotemporal patterns of the global human appropriation of net primary production in the last century’, https://doi.org/10.5281/ZENODO.5519104 (2022).
Lu, C. & Tian, H. Half-degree gridded nitrogen and phosphorus fertilizer use for global agriculture production during 1900-2013, https://doi.org/10.1594/PANGAEA.863323 (2016).
Zhang, B. et al. Manure nitrogen production and application in cropland and rangeland during 1860 - 2014: A 5-minute gridded global data set for Earth system modeling, https://doi.org/10.1594/PANGAEA.871980 (2017).
Klein Goldewijk, Dr. ir. C.G.M. Anthropogenic land-use estimates for the Holocene; HYDE 3.2. DANS. https://doi.org/10.17026/dans-25g-gez3 (Utrecht University, 2017).
Van Oost, K. et al. The Impact of Agricultural Soil Erosion on the Global Carbon Cycle. Science 318, 626–629 (2007).
Article PubMed ADS Google Scholar
Hurtt, G. C. et al. Harmonization of land-use scenarios for the period 1500–2100: 600 years of global gridded annual land-use transitions, wood harvest, and resulting secondary lands. Clim. Change 109, 117–161 (2011).
Article ADS Google Scholar
Chini, L. et al. Land-Use Harmonization Datasets for Annual Global Carbon Budgets. https://essd.copernicus.org/preprints/essd-2020-388/essd-2020-388.pdf, 10.5194/essd-2020-388 (2021).
Decharme, B., Martin, E. & Faroux, S. Reconciling soil thermal and hydrological lower boundary conditions in land surface models: LOWER BOUNDARY CONDITIONS OF SOIL IN LSM. J. Geophys. Res. Atmospheres 118, 7819–7834 (2013).
Article ADS Google Scholar
He, X. et al. Global patterns and drivers of phosphorus pools in natural soils version 2.0. 24653728 Bytes https://doi.org/10.6084/M9.FIGSHARE.16988029.V2 (2023).
Yang, X. & Post, W. M. Phosphorus transformations as a function of pedogenesis: A synthesis of soil phosphorus data using Hedley fractionation method. Biogeosciences 8, 2907–2916 (2011).
Article CAS ADS Google Scholar
Panagos, P. et al. European Soil Data Centre 2.0: Soil data and knowledge in support of the EU policies. Eur. J. Soil Sci. 73 (2022).
Tóth, G., Guicharnaud, R.-A., Tóth, B. & Hermann, T. Phosphorus levels in croplands of the European Union with implications for P fertilizer use. Eur. J. Agron. 55, 42–52 (2014).
Article Google Scholar
Sattari, S. Z., Bouwman, A. F., Martinez Rodríguez, R., Beusen, A. H. W. & van Ittersum, M. K. Negative global phosphorus budgets challenge sustainable intensification of grasslands. Nat. Commun. 7, 10696 (2016).
Article CAS PubMed PubMed Central ADS Google Scholar

Download references

Acknowledgements

This research was financially supported by INRAE (Institut national de recherche pour l’agriculture, l’alimentation et l’environnement), the Agroecosystems division of INRAE and by the French funding account CASDAR (Compte d’Affectation Spécial “Développement Agricole et Rural”) under grant agreement n° 20AIP1626879 (PhosphoBio project). We thank NASA as the source of the material used for the LUCC driver. For the driver CLIM, we acknowledge the World Climate Research Programme, which, through its Working Group on Coupled Modelling, coordinated and promoted CMIP6. We thank the climate modeling groups for producing and making available their model output, the Earth System Grid Federation (ESGF) for archiving the data and providing access, and the multiple funding agencies who support CMIP6 and ESGF. We thank Mark Irvine for his help with the computing aspects, Tovo Rabemanantsoa for his help with the datasets, Bertrand Decharme for his help with the CMIP6 output, Christelle Aluome for her help with the licence of the different datasets, and Francesca Degan (Arvalis) for helpful discussion. This work has been realized with the support of MESO@LR-Platform at the University of Montpellier. DSG and XH acknowledge support from the EJP SOIL ICONICA project, DSG acknowledges support from the ANR CLAND Convergence Institute.

Author information

Authors and Affiliations

ISPA, Bordeaux Sciences Agro, INRAE, 33140, Villenave d’Ornon, France
Bruno Ringeval, Josephine Demay, Laurent Augusto, Thomas Nesme & Sylvain Pellerin
Université Paris Saclay, CEA-CNRS-UVSQ, LSCE/IPSL, Gif-sur-Yvette, France
Daniel S. Goll & Xianjin He
CSIRO Oceans and Atmosphere, Aspendale, Vic., Australia
Ying-Ping Wang
Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China
Enqing Hou
Institute of Social Ecology, University of Natural Resources and Life Sciences, Vienna, Austria
Sarah Matej & Karl-Heinz Erb
Department of Environmental Science and Engineering, Fudan University, Shanghai, 200438, China
Rong Wang
College of Land Science and Technology, China Agricultural University, Beijing, 100193, China
Fei Lun
Department of Science, Roma Tre University, 00146, Rome, Italy
Pasquale Borrelli
Department of Biological Environment, Kangwon National University, Chuncheon, 24341, Republic of Korea
Pasquale Borrelli
Soil Geography and Landscape Group, University of Wageningen, Wageningen, 6700AA, The Netherlands
Julian Helfenstein
AgResearch, Lincoln Science Centre, Private Bag 4749, Christchurch, 8140, New Zealand
Richard W. McDowell
Faculty of Agriculture and Life Sciences, Lincoln University, Lincoln, PO Box 84, 7647, Christchurch, New Zealand
Richard W. McDowell & Peter Pletnyakov

Authors

Bruno Ringeval
View author publications
You can also search for this author in PubMed Google Scholar
Josephine Demay
View author publications
You can also search for this author in PubMed Google Scholar
Daniel S. Goll
View author publications
You can also search for this author in PubMed Google Scholar
Xianjin He
View author publications
You can also search for this author in PubMed Google Scholar
Ying-Ping Wang
View author publications
You can also search for this author in PubMed Google Scholar
Enqing Hou
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Matej
View author publications
You can also search for this author in PubMed Google Scholar
Karl-Heinz Erb
View author publications
You can also search for this author in PubMed Google Scholar
Rong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Augusto
View author publications
You can also search for this author in PubMed Google Scholar
Fei Lun
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Nesme
View author publications
You can also search for this author in PubMed Google Scholar
Pasquale Borrelli
View author publications
You can also search for this author in PubMed Google Scholar
Julian Helfenstein
View author publications
You can also search for this author in PubMed Google Scholar
Richard W. McDowell
View author publications
You can also search for this author in PubMed Google Scholar
Peter Pletnyakov
View author publications
You can also search for this author in PubMed Google Scholar
Sylvain Pellerin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.R. designed the study, wrote the different scripts, and performed the first analysis of the results. J.D., X.H., E.H., S.M., K.E., R.W., F.L., P.B., R.W.M., P.P. provided help about their respective dataset, D.S.G., Y-P.W., L.A. provided help with the soil P dynamic model. All co-authors participated in the analysis of the results and the writing.

Corresponding author

Correspondence to Bruno Ringeval.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Table S1

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ringeval, B., Demay, J., Goll, D.S. et al. A global dataset on phosphorus in agricultural soils. Sci Data 11, 17 (2024). https://doi.org/10.1038/s41597-023-02751-6

Download citation

Received: 13 February 2023
Accepted: 14 November 2023
Published: 02 January 2024
DOI: https://doi.org/10.1038/s41597-023-02751-6

Subjects

Abstract

Similar content being viewed by others

Environmental drivers of increased ecosystem respiration in a warming tundra

Plant responses to changing rainfall frequency and intensity

Hotspots of biogeochemical activity linked to aridity and plant traits across global drylands

Background & Summary

Method

General approach

Soil P dynamics model

Pools and fluxes design

Parameter estimates

Summary of the main differences with (model = GPASOIL-v0)

Effect of land-use and land-cover change (LUCC) on cropland/grassland soil P

Drivers of agricultural soil P distribution: datasets and computation

FARM

P in manure (for both cropland and grassland)

P in chemical fertilizer (for both cropland and grassland)

P uptake and P residues for grassland

P uptake and P residues for cropland

Composition of manure and residues for cropland and grassland

DEPO

SLUD

LOSS

LUCC

CLIM

Error estimate from uncertainty associated with the datasets to describe the different drivers

Data Records

Technical Validation

Evaluation of the computation of some selected drivers

Application of the Wang et al. soil P inorganic dynamics parametrizations at the global scale

Effect of the update of each component (data and model) from GPASOIL-v0 on Pi-lab simulated in 2018

Effect of the update of the dataset to describe the drivers (from (data = GPASOIL-v0; model = GPASOIL-v0) to (data = GPASOIL-v1; model = GPASOIL-v0))

Effect of the update of soil P dynamic model (from (data = GPASOIL-v1; model = GPASOIL-v0) to (data = GPASOIL-v1; model = GPASOIL-v1))

Best simulations, uncertainty associated and comparison to previous estimates

Evaluation of the simulated soil P pools at different scales (country, continental/watershed, global scale)

Usages Notes

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Table S1

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links

Effect of the update of each component (data and model) from GPASOIL-v0 on P_i-lab simulated in 2018