Datasets on bulk density and coarse fragment content from the French soil quality monitoring network

Various stakeholders, such as modelers, policy makers, farmers, and environmental regulators need reliable soil bulk density and coarse fragment content data. These two soil parameters are necessary to calculate soil carbon and nutrients stocks, to estimate water availability for plants, or to assess soil compaction. However, measuring these two parameters is labor intensive and time consuming. Therefore, many agricultural and environmental studies often miss these two soil parameters. Here, we provide four datasets, one with bulk density and coarse fragment contents of topsoil and subsoil, measured in two campaigns of the French Soil Quality Monitoring Network (RMQS for its acronym in French), a second one with the average values for bulk density and coarse fragments of the two campaigns at 0–30 cm and 30–50 cm. The third and the fourth ones are the raw data needed to calculate the two first datasets divided by campaign. In addition, the R script for calculating the depth-weighted values per soil layer is provided

As part of the French Soil Quality Monitoring Network (RMQS; Réseau de Mesure de la Qualité des Sols in French), two sampling campaigns for bulk density and coarse fragment content measurements were conducted in soil profiles distributed in a systematic sampling grid covering the French mainland territory.The samples were collected mainly by the ring and the excavation methods (a few sites were measured by alternative methods).In both campaigns, six samples were distributed down to 50 cm.Moreover, three other samples were taken from 50 cm down to 1 m depth only during the second campaign.The samples were dried in oven at 105 °C, and soil mass, fine earth, and coarse fragment content were measured.Data source location • Institution: INRAE • City/Town/Region: Orléans, Val de Loire.

Data accessibility
Repository name: GisSol Dataverse https://entrepot.recherche.data.gouv.fr/dataverse/gissolFour distinct datasets have been made available, each with a unique digital object identifier (DOI).Each dataset is accompanied by its unique access link, through which the data can be retrieved.The R script of the function to calculate the weighted means is shared within a version control system.

Value of the Data
• Soil physical parameters such as bulk density (BD) and coarse fragment (CF) content are important in soil science because they are needed to calculate stocks, for example soil organic carbon in the context of climate change and the assessment of soils capacity to mitigate carbon dioxide emissions; trace metals in the context of soil pollution assessment, or nutrient stocks [1][2][3].• Bulk density and CF content data are also useful to estimate available water capacity for plants using various pedotransfer functions [ 4 ].Bulk density is also often used as an indicator of soil compaction [ 5 ], whereas CF may be useful to assess limitations to soil tillage [ 6 ], or to cultivate some crops, such as potatoes, carrots, beetroots, etc. [ 7 ].
• Measuring these two parameters is labor intensive and time consuming [ 8 ].Soil monitoring programs very often miss these two soil physical parameters [ 9 ].• Currently, gathering data on these two parameters has become increasingly important due to the need of calculation of soil organic carbon (SOC) or nutrient stocks, of carbon and nutrients balances and fluxes, and of relevant indicators for environmental policies.• Therefore, there is an increasing demand of reliable data for both BD and CF contents from various stakeholders, such as modelers, policy makers, farmers, and environmental regulators.

Background
Soil BD is defined as the ratio between the dry mass of soil and its volume.BD is usually expressed in g cm −3 or Mg m −3 .There are various methods for measuring BD.The cylinder and the excavation methods are among the most commonly used.The selection of a technique for a specific type of soil depends on the occurrence of CF and root density.The cylinder method is advisable for soils characterized by a low CF content and low root density, whereas the excavation method is employed in soils with a high CF content and a substantial presence of thick roots [ 8 ].The CF content is defined as the soil fraction with a size larger than 2 mm [ 10 ].While the fraction smaller than 2 mm is known as the fine earth fraction where SOC is stored [ 3 ].The CF fraction may include roots larger than 2 mm but these are not considered as SOC but as plant biomass instead [ 11 ].Often, the largest part of CF is formed by mineral coarse fragments which are considered to be free of organic carbon.Thus, correcting soil BD for the CF content is important to avoid overestimations of SOC stocks [ 11 ].

Data Description
Four datasets are provided ( Fig. 1 ).The first dataset (file RMQS1_bulk_density_raw_data.tab) contains the raw data with the results of the individual soil physical samples for campaign 1.The second dataset (file Raw_data_rmqs_2.tab)contains the results of the individual soil physical samples for campaign 2 (see below for details on both campaigns).The third dataset (file Dataset_A.tab)provides the weighted average of BD, fine earth (FE), and CF contents for the topsoil and subsoil in the two campaigns.These weighted mean values are calculated in way that they match the thickness of the composite layers of the RMQS program for each campaign in order to compute stocks using the chemical analyses (see details in experimental design, materials and methods section).The fourth dataset (file Dataset_B.tab)includes the average values of BD, CF, and fine earth (FE) mass per total volume from both campaigns, calculated separately for the 0-30 cm and 30-50 cm layers.The data of the second campaign comprises the year 2016 until 2022.

Raw data
The raw data of the first and second campaigns are made available separately.The raw data of the first campaign can be found through a DOI and contains all the physicochemical properties measured in the first RMQS campaign.While the raw data of the second campaign is provided in a dataset exclusively dedicated to the soil physical data of campaign 2. The data structure of both datasets, containing the raw data for the respective campaign, is consistent.The two datasets have information about the campaign, the cell number within the RMQS grid, identification of each sampling site within a cell, the spatial coordinates, the sampling date, the identifier of the soil layer of the composite sample, land use categories, the inferior and upper limits of each individual sample, the upper and lower limits of the layer which the individual samples are attached to, the methods employed, and the values of BD, CF, and FE.More details can be found in the readme file accompanying each dataset.A includes weighted means of bulk density, coarse fragment contents, and fine earth in topsoil and subsoil in campaign 1 and 2. Dataset B contains the average values of campaigns 1 and 2 of bulk density, coarse fragment contents, and fine earth at 0-30 cm and 30-50 cm.In addition, the project repository includes the R script of the function used to calculate the weighted average for a specific soil layer.

Dataset A
The summary statistics of Dataset A are shown in Table 1 and the spatial representation in Figs.2-4 .For campaign 1, some sites were not sampled due to inaccessibility or absence of sampling soil in the given area (e.g. military lands, mountainous rocky soils, strongly-anthropized area, steep and unsafe access; Fig. 2 ).For both campaigns there were more sites with data for the topsoil layer than for the subsoil ( Figs. 2 -4 ).One reason is that for most of the forest sites, during the first campaign, the BD sampling was conducted only for the topsoil due to time and budget constraints.Another reason is that a rather large proportion of sites simply did not have a developed subsoil layer ( Fig. 3 ).

Dataset B
The summary statistics of Dataset B are shown in Table 2 and the spatial representation in Fig. 5 .

Soil sampling sites
The French soil quality monitoring network (RMQS) is comprised of approximately 2170 locations regularly distributed across the mainland territory, following a systematic random sampling grid with a resolution of 16 km [ 12 ].The sampling sites were initially chosen to be at the central point of each grid cell.However, in instances where accessibility was a challenge (in the event, for example, of refusal by the land owner), alternative sites within a 1-km radius of the cell center were selected.The first RMQS campaign was conducted between 20 0 0 and 20 09, and the second campaign is currently ongoing, scheduled from 2016 to 2027.The sampling during the first campaign was carried out by zone, successively sampling departments and/or regions according to the arranged contracts.The strategy for the sampling process in the second campaign was completely revised to become globally annualized: the sites sampled each year are now distributed geographically in a regular way throughout the national territory.At each of these sampling sites, two types of samples were obtained.First, a composite sample was obtained by aggregating 25 soil cores following an unaligned sampling scheme using a 20-m by 20-m grid ( Fig. 6 A) divided into 100 quadrats of 2 m by 2 m.The quadrats with the number 1 were used for sampling in the first campaign, the ones with number 2 in the second campaign, while the ones with numbers 3 and 4 will be used in the third and fourth campaigns, respectively ( Fig. 6 A).Adjacent to the 20-m by 20-m grid, a soil pit was dug to a depth of 1 m when possible or until parent material for profile description and for the second type of samples, i.e. volumetric sampling, specifically for BD measurement and CF content assessment.The location of the soil pit was changed from the southern side of the grid during the first campaign to the western side in the second campaign, as illustrated in Fig. 6 A. This resulted in a distance of approximately 20 m between the two soil pits dug at each campaign.
Three soil physical samples were collected from the topsoil layer (0-30 cm) and an additional three from the subsoil layer, totaling three samples for each composite layer up to 50 cm (when soil profile was deep enough) for both campaigns ( Fig. 6 B).In the second campaign, the three soil physical samples per composite layer were specifically confined to pedogenetic soil horizons, ensuring that samples did not span across different horizons.Composite sampling was conducted within two soil layers: topsoil at 0-30 cm and subsoil at 30-50 cm.In some cases, there was a gap between the lower limit of the topsoil layer and the upper limit of the subsoil layer due to the presence of plowing horizons detected in the profile description conducted in the soil pit, especially in croplands.Thus, the inferior limit of the topsoil layer and the upper limit of the subsoil layer was not always 30 cm.
The composite samples were used for analyzing various soil chemical parameters, including soil organic carbon, pH, soil nutrients, heavy metals, and more.The current work does not include the data obtained from the composite samples.However, this reference is made to the Fig. 5. Average bulk density, fine earth, and coarse fragment content of campaigns 1 and 2 of the French Soil Quality Monitoring Network (RMQS) at 0-30 cm and 30-50 cm.Upper panel: bulk density at 0-30 cm and 30-50 cm.Middle panel: fine earth at 0-30 cm and 30-50 cm.Lower panel: coarse fragment content at 0-30 cm and 30-50 cm.Blank spaces represent sites with no formed soil.This figure was obtained using dataset B. Grey points represent sites with missing or incomplete data.composite layers because the BD and CF values are calculated to match the thickness of each of the layers for later computation of stocks of elements, for instance, whose concentration is done on the composite samples.

Methods for bulk density measurement
The methods employed to determine soil BD were the use of the ring or cylinder and excavation methods, using either water or sand (as per NF ISO 11272, 2017; NF X31-503, 1992).The cylinder or ring method involved the use of a 500 cm 3 ring, which was either applied at the soil surface or within the walls of the soil pit, particularly in cases involving deeper parts of the soil profile.In cases where the cylinder was employed for deeper layers, a horizontal surface was carefully prepared using a trowel, followed by the insertion of the cylinder with the aid of a rubber hammer.Fig. 6.A) Systematic unaligned grid used for composite sampling, and two soil pits used for soil profile description.The position of the soil pit was changed from the southern side of the grid in the first campaign to the west side in the second campaign.B) Right: representation of a soil profile described in the soil pits for the second campaign.For the first campaign, physical measurements were done up to 50 cm only.Center: distribution of the soil physical samples within the soil profile.Right: Composite layers sampled in the grid.Taken from Jolivet et al. [ 12 ].
On the other hand, the excavation method was adopted for soils characterized by high contents of rocks, gravel, roots, or in terrains with a sloping topography.When used in subsurface layers, a flat surface was created with a trowel, and a metallic rounded template with an inner diameter of approximately 13 cm was used to ensure the surface's evenness.Subsequently, this template served as a guide for excavation.A cavity, ranging from about 10 0 0 to 150 0 cm 3 in volume, was carefully dug with a trowel, and all the soil was collected in a plastic bag.Another plastic bag was then fitted against the cavity ʼs walls, and its volume was estimated by filling it with a known quantity of either water or calibrated sand.The single difference between excavation using water and sand lies in the fact that the latter can be applied when achieving a completely flat surface is not possible.
The collected soil samples were transported to the European Conservatory of Soil Samples in Orléans, France, where the moist mass was recorded.After weighing the moist mass, the samples were subjected to oven-drying at 105 °C, followed by weighing and sieving through a 2 mm mesh.The weight of the coarse fragments ( > 2 mm), after oven-drying at 105 °C, was also recorded.In some cases of the second campaign, the CF were not weighed.The decision of not weighing the CF was adopted when the cylinder method was used in both campaigns and when the mass of each of the three samples the considered layer was < 50 g in the first campaign.

Fine earth, bulk density, and coarse fragment fraction
The calculation of bulk density for each sample involved dividing the total soil mass ( m tot ), comprising dried fine earth mass ( m fine ) and coarse fragment mass ( m coarse ), by the volume of the sample ( V tot ): The density of the fine earth (FE) was calculated as the ratio of m fine divided by V tot : To avoid confusion, please note that the definition of FE is different from the bulk density of the fine soil (BD fine soil ) as defined in Eq. ( 3) of the article by Poeplau et al. [ 11 ].
The coarse fragment fraction ( coarse mass ) was calculated as the ratio between the mass of the coarse fragments and m tot :

Data treatment of physical measurements
In dataset A, depth-weighted averages of BD, FE, and coarse mass were calculated for topsoil and subsoil for both campaign 1 and campaign 2. The thickness of both layers corresponds to the composite layers sampled from the 20-m by 20-m grid.In most of the cases the thickness of topsoil was 30 cm, and 20 cm for the subsoil.As mentioned above, in some cases the thickness of both layers was different.Thus, for dataset A we refer to topsoil and subsoil and avoid referring to a general value for the upper and lower limits of each composite layer.For the calculation of the weighted averages, we redefined the depth intervals in cases where there was an overlap of individual soil physical samples collected from the same depth ranges ( Fig. 7 ).This adjustment was made to prevent an over-representation of depth intervals covered by multiple soil samples.For each of these newly defined depth intervals, we computed an average, and subsequently, the weighted mean was calculated, with weights based on the length of each respective depth interval ( Fig. 7 ).For dataset B, we redefined the depth intervals from 0 to 50 cm using the individual samples of campaign 1 and 2. Next, the weighted averages were calculated for the layers 0-30 cm and 30-50 cm.The R code of the function that was used to calculate the weighted averages is given in the project repository, as well as the R scripts to calculate datasets A and B.

Limitations
The mass (and volume) of living roots is neglected in the current datasets.As discussed by Poeplau et al. [ 11 ], roots are considered as part of plant biomass and not as SOC.In addition, CF ≥ 20 cm are not measured in the RMQS program, hence this is a factor of overestimation of the fine earth mass in some cases.

Fig. 1 .
Fig. 1.Structure of the project repository.Raw data of campaign 1 & 2 are needed to calculate datasets A and B. DatasetA includes weighted means of bulk density, coarse fragment contents, and fine earth in topsoil and subsoil in campaign 1 and 2. Dataset B contains the average values of campaigns 1 and 2 of bulk density, coarse fragment contents, and fine earth at 0-30 cm and 30-50 cm.In addition, the project repository includes the R script of the function used to calculate the weighted average for a specific soil layer.

Fig. 2 .
Fig. 2. Bulk density in topsoil and subsoil in France in campaigns 1 and 2 of the French Soil Quality Monitoring Network (RMQS).Blank spaces represent sites with no developed soil.This figure was obtained using dataset A. Grey points represent sites with no soil physical data or incomplete data.

Fig. 3 .
Fig. 3. Coarse fragment content (expressed as mass percentage) in France in topsoil and subsoil in campaigns 1 and 2 of the French Soil Quality Monitoring Network (RMQS).Blank spaces represent sites with no developed soil.Grey points represent sites with missing or incomplete coarse fragment content data.This figure was obtained using dataset A.

Fig. 4 .
Fig. 4. Fine earth density in topsoil and subsoil in France in campaigns 1 and 2 of the French Soil Quality Monitoring Network (RMQS).Blank spaces represent sites with no developed soil.Grey points represent sites with missing or incomplete fine earth data.This figure was obtained using dataset A

Fig. 7 .
Fig. 7. Diagram of the calculation of the weighted average of the soil physical data.The depth intervals are recalculated when there is an overlap of individual samples.Next a weight is attributed to the new depth intervals to allow the calculation of the values for the topsoil and subsoil layers.
calculate the two first datasets divided by campaign.In addition, the R script for calculating the depth-weighted values per soil layer is provided © 2024 The Authors.Published by Elsevier Inc.This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )

Table 1
Mean values, standard deviation, minimal values, first, second and third quantiles, maximal values and number of observations of bulk density, fine earth and coarse fragment content in topsoil and subsoil in campaigns 1 and 2 of the French Soil Quality Monitoring Network (RMQS).

Table 2
Mean values, standard deviation, minimal values, first, second and third quantiles, maximal values and number of observations of bulk density, fine earth and coarse fragment content at 0-30 cm and 30-50 cm.The values are the average of campaigns 1 and 2 of the French Soil Quality Monitoring Network (RMQS).This table was produced using Dataset B.