Semantics about soil organic carbon storage: DATA4C + , a comprehensive thesaurus and classiﬁcation of management practices in agriculture and forestry

. Identifying the drivers of soil organic carbon (SOC) stock changes is of the utmost importance to contribute to global challenges like climate change, land degradation, biodiversity loss, or food security. Evaluating the impacts of land use and management practices in agriculture and forestry on SOC is still challenging. Merging datasets or making databases interoperable is a promising way, but still has several semantic challenges. So far, a comprehensive thesaurus and classiﬁcation of management practices in agriculture and forestry has been lacking, especially while focusing on SOC storage. Therefore, the aim of this paper is to present a ﬁrst comprehensive thesaurus for management practices driving SOC storage (DATA4C + ). The DATA4C + thesaurus contains 224 classiﬁed and deﬁned terms related to land management practices in agriculture and forestry. It is organized as a hierarchical tree reﬂecting the drivers of SOC storage. It is oriented to be used by scientists in agronomy, forestry, and soil sciences with the aim of uniformizing the description of practices inﬂuencing SOC in their original research. It is accessible in Agroportal (http://agroportal.lirmm.fr/ontologies/DATA4CPLUS, last access: 24 March 2022) to enhance its ﬁndability, accessibility, interoperability, and reuse by scientists and others such as laboratories or land managers. Future uses of the DATA4C + thesaurus will be crucial to improve and enrich it, but also to raise the quality of meta-analyses on SOC, and ultimately help policymakers to identify efﬁcient agricultural and forest management practices to enhance SOC storage.


Introduction
Soil organic carbon (SOC) represents about 25 % of the potential of natural climate solutions (NCSs) to mitigate climate change (Bossio et al., 2020).Maintaining or increasing SOC stocks can play a significant role to tackle global challenges like climate change, but also land degradation, biodiversity loss, or food security (IPCC, 2019).Identifying and addressing the drivers of SOC stock changes is therefore crucial to contribute to Sustainable Development Goals (e.g., SDGs 2, 13, and 15) adopted by the United Nations in 2015 (UN General Assembly, 2015).Wiesmeier et al. (2019) reported a large number of drivers at various scales, from climate to soil physico-chemistry, including land use and management practices.Land use and management practices shape carbon inputs and outputs at the plot scale, quality of carbon inputs, and may modify the turnover of soil organic matter (SOM) and SOC stocks (e.g., Fujisaki et al., 2018;Paustian et al., 2016;Poeplau et Don, 2015;Powlson et al., 2016).Evaluating the efficiency of management practices (e.g., no tillage, organic amendments) and improving our understanding of processes involved in SOC storage is still challenging and discussed (Chenu et al., 2019;Erb et al., 2017).Consequently, large datasets are necessary to make a statistically robust analysis of SOC storage and its drivers.In that perspective, the number of systematic reviews or meta-analyses is growing (e.g., Beillouin et al., 2021;Bolinder et al., 2020;Cardinael et al., 2018;Fujisaki et al. 2018).Data-driven soil research and the inference of soil knowledge directly from data by using computational tools and modeling techniques are becoming more and more popular (Wadoux et al., 2020).Merging datasets or making databases interoperable to have global datasets is another promising way forward (e.g., Lawrence et al., 2020;Malhotra et al., 2019;Wieder et al., 2020).Open science (OCDE, 2015) and FAIR, i.e., findability, accessibility, interoperability, reusability-guiding principles (Wilkinson et al., 2016), offer opportunities to explore this path.
However, two conditions for drivers, such as land use and management practices, are compulsory for systematic reviews, meta-analyses, or interoperability of databases on SOC storage.They have to (1) have standard definitions and (2) be homogeneously described.Harden et al. (2018) highlighted the need for harmonized description of land use and management practices.Todd-Brown et al. (2022) emphasized the role that semantics should play to overcome the challenges above.Indeed, there are currently two major limitations for these drivers of SOC change: subjectivity of the semantics and limited scope of the terms.Many globalscale studies do not always clearly define the management practices and use subjective terms like "improved management", or "best management practices" (Batjes, 2019;Paustian et al., 2016;Smith et al., 2020). Consequently, compar-isons between studies might be impossible, as improvement or best management practices are highly context-dependent (i.e., agronomic, climatic, socioeconomic, or time context) (Rosenstock et al., 2016).Reversely, meta-analyses or original studies that evaluate the effect of specific land management practices on SOC storage provide detailed description of the land use and management practices, but their scope is generally limited to one land cover type, one broad category of land management practice, or focus on a climatic zone, a region, or a country (Cardinael et al., 2018;Corbeels et al., 2019;Li et al., 2018;Poeplau and Don, 2015;Maillard and Angers, 2014).
Several standards are available for the description of land cover (e.g., Food and Agriculture Organization (FAO) Land Cover Classification System, System of Environmental-Economic Accounting (SEEA, 2012), LUCAS (Eurostat, 2015)) and more recently of land use (e.g., Intergovernmental Panel on Climate Change, SEEA) (Jansen and DiGregorio, 2002;Pesce et al., 2018).Three standards for farming practices are listed by the Agrisemantics map of data standards (Pesce et al., 2018) As far as we know, there has been no attempt to deal with these shortcomings to be able to understand, quantify, or extrapolate processes and drivers of SOC storage in agriculture and forestry using large databases.Therefore, the objectives of this study were: (i) to compile a comprehensive thesaurus, i.e., a list of standards and specifically defined terms, for management practices driving SOC storage; (ii) to keep such a thesaurus easy to use for non-scientists such as soil test laboratories or land managers; and (iii) to define a classification of these drivers to further enhance interoperability of databases on SOC.The aim of this paper is to present a first comprehensive thesaurus and classification of management practices in agriculture and forestry with a focus on soil organic carbon, called DATA4C+.

Identification of SOC drivers related to land management practices
In the present work, land management practices covered croplands, grasslands, and forestry practices established at the field scale, without any change in land use.We identified land management practices which are recognized in scientific literature to influence SOC change.The literature search was conducted based on expert knowledge.A first list of metaanalyses was established by the authors, allowing the identification of relevant land management practices (e.g., Cardinael et al., 2018;Mayer et al., 2020;Smith et al., 2020, see Supplement 1 for some examples and Supplement 2 for the full list).Focus was put on meta-analyses as homogeneous definitions are a prerequisite to conduct such analyses.Besides, the list of land management practices gathered from the meta-analyses was completed thanks to technical and institutional reports (e.g., Chotte et al., 2019;Pellerin et al., 2020;Sanz et al., 2017;Smith et al., 2007), which are hardly referenced in search engines like Scopus, Web of Science, or Google Scholar.Finally, this list of practices was extensively discussed among the group of authors, resulting in the selection of other practices than the initial ones.
Only land management practices explicitly described were retained.Therefore, management practices labeled as "improved" were discarded.Agroforestry was considered in this study as a land management practice, since it is defined as an agroecosystem where "forest species of trees and other wooded plants are purposely grown on the same land as agricultural crops or livestock, either concurrently or in rotation" (FAO, 2015).

Definition of drivers
Definitions of land cover classes, land-use classes, and land management practices were found in data standards (e.g., World Census of Agriculture, FAO, 2015), thesaurus (e.g., Agrovoc), and scientific literature collected at the former step of driver identification.In case a definition was lacking in the primary data source, it was collected through thematic glossaries (e.g., IPCC, 2019; "Landmark Glossary"; "WOCAT Glossary").

Classification of land management practices
As there is currently no comprehensive thesaurus for land management practices which directly or indirectly affect SOC dynamics, we classified the single management practices gathered in the previous steps into a hierarchical tree.This hierarchical tree was built thanks to existing classifications of land management practices found in literature.These classifications usually rely on the manipulation of several components of the agroecosystem which often affect C inputs and C outputs from soils, such as the plant management, water management, or soil tillage management for example (Supplement 1).We considered, in the hierarchical tree, only single land management practices.Integrated land management practices (e.g., conservation agriculture, organic agriculture) were not included as a whole but described by their single components (e.g., conservation agriculture means no tillage, permanent soil cover, rotation/crop diversification).

Design and quality control of the thesaurus
From October 2019 to October 2020, participants of the project DATA4C+ (https://www.data4c-plus-project.fr/en, last access: 24 March 2022) carried out the editing phase of the thesaurus.Participants were junior and senior scientists from three French research institutions (i.e., Cirad, IN-RAE, IRD) that joined their expertise about organic carbon dynamics in temperate and tropical soils.A first version of the thesaurus and classification was shared and discussed among them in October 2020.The consolidation phase was carried out from November 2020 to June 2021.A second version of the thesaurus and classification was shared, discussed, and validated among participants of the project in July 2021.From July 2021 to September 2021, editors of the thesaurus checked its consistency before its first available online version, as presented in this paper (see Fig. 1).

Land management practices
Land management practices were classified in three main categories according to land use: (i) land management practices in annual and perennial croplands, (ii) land management practices in grasslands, and (iii) land management practices in forests and tree plantations.We chose to classify the land management practices inside large categories of land use rather than land cover for several reasons.Landuse categories are well harmonized between different standards (FAO, IPCC, SEEA, World Census of Agriculture, see Gong et al., 2009), whereas the matching of land cover categories between the main standards is less straightforward (see, for instance, Herold et al. (2009) and Yang et al. (2017) for the harmonization of FAO Land Cover Classification System with other land cover standards).Land-use categories suit well with greenhouse gas (GHG) balance accounting thanks to the IPCC framework (Bernoux et al., 2010;IPCC, 2006).Furthermore, some management practices may induce a change in land cover without changing in land use, such as management practices regarding plant management like agroforestry practices.In these categories, several subcategories were created regarding plant, biomass (through grazing and animal management in grassland, residue management in croplands, biomass fluxes in forests) and amendments management, but also erosion, water, fire, and land clearing management in the case of agroecosystems implanted after land clearing.These subcategories are mainly inspired from Smith et al. (2020).They rely on management techniques from the point of view of the land managers, which is commonly used in literature for the classification of land management practices that affect SOC dynamics (Supplement 1).Another classification of land management practices could be specifically based on the mechanisms affecting SOC dynamics, i.e., modification of carbon inputs and/or modification of SOM turnover.However, this approach would be less handy for a non-scientific audience.Furthermore, there are still knowledge gaps regarding the processes involved in SOC sequestration after the establishment of several management practices (Chenu et al., 2019).

The DATA4C+ thesaurus: technology, content, and browsing
The DATA4C+ thesaurus is freely available at the following URL: http://data4c-plus.net/admin/thesaurus/index,last access: 24 March 2022.The DATA4C+ thesaurus is connected to a PostgreSQL ® database.The intuitive web interface uses the jsPlumbTree function of the jQuery library, which is a plugin that renders a reducible and extensible tree structure representing the hierarchical relationship between different nodes.In addition, the plugin uses the jsPlumb library to draw connection lines using Bézier curves between nodes.The tree is drawn dynamically from left to right and top to bottom when connecting to the database.
Each term of the database is defined by four nodes as follows: data-id.Term identifier.It must be unique throughout the tree.
data-parent.Identifier of the parent node.
data-first-child.Identifier of the first child node.
data-next-sibling.Identifier of the next sibling node.
The DATA4C+ thesaurus was developed by Cirad.All the source programs are available on the forge at https://gitlab.cirad.fr/jean-baptiste.laurent/data4c(Laurent and Thevenin, 2022) and can be freely accessed on request under the CC BY-SA 4.0 FR license.To facilitate reuse of the DATA4C+ thesaurus, it can be downloaded in a Simple Knowledge Organization System (SKOS) format (W3C, 2009).The DATA4C+ thesaurus is accessible in Agroportal (http://agroportal.lirmm.fr/ontologies/DATA4CPLUS,last access: 24 March 2022) to enhance its findability, accessibility, interoperability, and reusability by scientists in agronomy, forestry, and soil sciences.It may also be used by other end users such as soil test laboratories to describe the soil samples analyzed or by land managers to describe and report their practices (e.g., for carbon farming programs).Additionally, the comma separated values (CSVs) file of DATA4C+ thesaurus is available on the data depository of Cirad (https://dataverse.cirad.fr,last access: 24 March 2022) under the CC-BY 4.0 FR license with the https://doi.org/10.18167/DVN1/HMCPMF.The DATA4C+ thesaurus classifies 224 defined terms related to land management practices in agriculture and forestry.It is organized as a hierarchical tree reflecting the drivers of SOC storage.To have access to the definition of a given term, the user must find the term in the tree and click on it.Then, a "pop-up" appears with the definition of the term and the source of the definition (Fig. 2).A link to the source of the definition (URL or DOI) is given for each term.By clicking on this link, a new web page appears.

Less subjectivity of land use and management practices will improve reuse of data and quality of meta-analyses
The terms "improved management practice" or "conventional agricultural" are currently used in the scientific literature despite their subjectivity (Sumberg and Giller, 2022).The use of this term implicitly means comparing one practice to another practice and describing the improved actions, which is hardly ever done.The DATA4C+ thesaurus   gives a framework to describe the practices.This is vital to produce robust meta-analyses.For instance, the term "improved management of pastures" encompasses diverse agronomic practices (e.g., introduction of leguminous species, switching from mineral to organic fertilizers, no burning for land clearing, reduced grazing intensity).The description of each of these agronomic practices is specific: species' names and plant density for the introduction of leguminous, type, amount and date of application of fertilizers for the switch from mineral to organic fertilizers, and amount of biomass left on site for no burning for land clearing.Besides, their impacts on SOC stocks are highly different, as highlighted by Maia et al. (2009), Conant et al. (2017), or Fujisaki et al. (2018).
4.2 More genericity in the description of management practices will improve reuse of data and quality of meta-analyses The DATA4C+ thesaurus intends to facilitate data sharing for the evaluation of soil carbon storage through land management practices, thanks to the genericity of the proposed terms.We evaluate the DATA4C+ thesaurus against land management practices used in several meta-analyses (Table 1).In many situations, there is an adequate matching between terms used in the meta-analyses and terms used in the thesaurus.However, some studies use levels of details uncovered in the thesaurus, such as the species family of plants sown in the fields (Bai et al., 2019), or several tillage techniques (Jian et al., 2020), that can be grouped into larger cate-gories used in the thesaurus (intermediate-intensity tillage or high-intensity tillage).These very detailed levels were not covered in the thesaurus because of the current lack of the evaluation of their effect on SOC dynamics.Indeed, the effect of soil tillage on soil carbon storage is still discussed by soil scientists (Chenu et al., 2019), and the use of numerous categories of tillage practices may weaken the significance of the observed trends.We used in the thesaurus classes of tillage intensity based on the study of Haddaway et al. (2021), which distinguished high-intensity tillage from intermediate-intensity tillage, depending on the inversion or not of the soil during tillage and the performed depth of the tillage practice.This offers, in our opinion, transparent criteria to characterize tillage intensity.
On the other hand, several studies use broader categories than in the present thesaurus, which may prevent reuse of the dataset.This is the case for land management practices in grasslands studied by Conant et al. (2017), where categories such as "grazing" and "fire" are not further detailed, despite the wide response range of soil carbon stocks according to the intensity of grazing for instance (Abdalla et al., 2018).
Concerning meta-analyses of SOC, Beillouin et al. ( 2022) identified issues of low transparency, reproducibility, and updatability.Improving the quality and reliability of synthesis papers is of utmost importance, as they are increasingly used to inform policy decisions with possibly large environmental and socioeconomic implications (Krupnik et al., 2019).Nosek et al. (2015) noted that advances must be made to give full and unbiased access to scientific data in line with open science practices.In that perspective, the transparency and the genericity of the terms defined in the DATA4C+ thesaurus, mostly inventoried in original papers and technical and institutional reports, will contribute to increase the quality of data and ultimately to merge and analyze data from various sources.

Future development of the DATA4C+ thesaurus: uses and accrual
The DATA4C+ thesaurus is expected to be used by scientists in agronomy, forestry, and soil sciences, with the aim of uniformizing the description of practices influencing SOC in their original research.As it was developed to be simple and easy to use, the thesaurus may also be used by several end users, like land managers (e.g., to report their practices for carbon farming), or by laboratories to describe the soil samples analyzed (e.g., metadata on the sample).The generated data will therefore be more easy to retrieve and integrated to perform meta-analyses in particular.Another perspective will be to mobilize the DATA4C+ thesaurus to feed models on SOC dynamics with more site-specific data.However, such a perspective would need to enrich the DATA4C+ thesaurus with vocabulary related to annual carbon inputs to enhance carbon inputs to soil (e.g., Bolinder et al., 2007).Accrual of the DATA4C+ thesaurus could also be focused on emerging practices and empirical farmers' practices, which are poorly studied by researchers.Promotion and peer reviewing of the updated versions of the DATA4C+ thesaurus will be performed by the Scientific and Technical Committee of the 4 per 1000 Initiative (https://4p1000.org/,last access: 24 March 2022).Versioning of the DATA4C+ thesaurus will be done at the following URL: http://data4c-plus.net/admin/thesaurus/index, last access: 24 March 2022, in Agroportal (http://agroportal.lirmm.fr/ontologies/DATA4CPLUS,last access: 24 March 2022) and on the data repository of Cirad (https://doi.org/10.18167/DVN1/HMCPMF).Suggestions of accrual could be sent to the corresponding author or at the following email address: data4c@cirad.fr.

Conclusions
The DATA4C+ thesaurus is the first attempt to compile and classify the land use and management practices in agriculture and forestry that influence SOC storage.Future uses of the DATA4C+ thesaurus will be crucial to improve and enrich it, but also to raise the quality of meta-analyses on SOC, and ultimately help policymakers to identify efficient agricultural and forest management practices to improve SOC storage.In that sense, the DATA4C+ thesaurus is a contribution to SDG 17 "Partnerships for the goals" (i.e., goals 17.6 and 17.7).
Financial support.Support was provided by the French National Research Agency (ANR, https://anr.fr/,last access: 24 March 2022) through the project DATA4C+ "Towards interoperability of databases to be more FAIR" (https://www.data4c-plus-project.fr/ en, last access: 24 March 2022) (grant no.ANR-19-DATA-0005-01).The DATA4C+ project is led by the CIRAD-INRAE-IRD consortium.The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the paper.
Review statement.This paper was edited by Jeanette Whitaker and reviewed by Jonathan Sanderman and one anonymous referee.

Figure 1 .
Figure 1.Summary of the different steps to build the DATA4C+ thesaurus.
. (2016) Crop fertilization Unbalanced application of chemical fertilizers Partially covered: mineral fertilization practice is included but not the appreciation of balanced vs. unbalanced application Balanced chemical fertilization Partially covered: mineral fertilization practice is included but not the appreciation of balanced vs. unbalanced application Straw retention and application of chemical fertilizers Mulched residues OR Shredded residues OR Buried residues AND Mineral fertilization Application of manure and chemical fertilizers Solid manure OR liquid manure AND mineral fertilization Jian et al. (2020) Tillage group Disk tillage Highor intermediate-intensity tillage depending on the depth Sweep Highor intermediate-intensity tillage depending on the depth Tandem disk Highor intermediate-intensity tillage depending on the the thesaurus Organic farm with cover crop as green manure Organic agriculture AND cover crop Organic farm with no tillage Organic agriculture AND no till https://doi.org/10.5194/soil-9-89-2023SOIL, 9, 89-100, 2023

Table 1 .
Matching evaluation of land management practices assessed in meta-analyses against land management practices in the DATA4C+ thesaurus.