Quality Assessment of Meta-Analyses on Soil Organic Carbon

. Soil organic carbon (SOC) plays a vital role in the global carbon cycle and is a potential sink for carbon dioxide. 11 Agricultural management practices can support carbon sequestration and therefore offer potential removal strategies, whilst 12 improving overall soil quality. Meta-analysis allows to summarize results from primary articles by calculating an overall effect 13 size and to reveal the source of variation across studies. The number of meta-analyses published in the field of agriculture is 14 continuously rising. At the same time, more and more articles refer to their synthesis work as a meta-analysis, despite applying 15 less than rigorous methodologies. As a result, poor quality meta-analyses are published, which may lead to questionable 16 conclusions and recommendations to scientists, policymakers and farmers.

developed in a particular area, and which is then applied on different sites (Schillaci et al., 2021) can diminish the precision of final results.

Available guidelines and their applicability
So far, there are no collaborations or guidelines for publishing systematic reviews or meta-analyses on agricultural or soil issues.In contrast, healthcare (The Cochrane Collaboration) and social sciences (The Campbell Collaboration) established such collaborative networks to develop high quality reviews already in the 1990s (Gurevitch et al., 2018;Collaboration for Environmental Evidence, 2018).These collaborations are focusing on specific disciplines and some of their tools, as trainings or the Cochrane Handbook for Systematic Reviews of Interventions, are partly applicable for agricultural and soil research (Table S1).Moreover, there are other voluntary guidelines available, which aim to support researchers in e.g., reporting or producing meta-analyses.Checklists for evaluating social science research synthesis (Cooper et al., 2019a) or evidence-based minimum item sets for reporting in systematic reviews and meta-analysis as PRISMA (Page et al., 2021) support synthesis consumers and authors.PRISMA-EcoEvo is a PRISMA extension for syntheses in ecology and evolutionary biology, which can be used for reporting, planning, registration and reviewing (O'Dea et al., 2021).
Moreover, for meta-analyses in ecology, a checklist of quality criteria is available (Koricheva and Gurevitch, 2014).The Collaboration for Environmental Evidence (CEE) provides guidelines and standards for evidence synthesis in environmental management, which can be used for conducting, commissioning or using the findings of systematic reviews and systematic maps in environmental management.Further, reporting standards (ROSES), a checklist for appraisal of confidence of evidence reviews (CEESAT) and free-to-access online training courses are offered by CEE.The collaboration even brought forth "Environmental Evidence", a journal facilitating the publication of evidence synthesis in environmental management (https://environmentalevidencejournal.biomedcentral.com/).Lastly, reviews by Philibert et al. (2012), Beillouin et al. (2019) and Krupnik et al. (2019) assessed the quality of agronomic meta-analyses or compared different meta-analytical methods with the help of quality criteria.However, they are formulated rather generally.
Although all these guidelines are available, they each use different criteria which are sometimes not reported exhaustively (Koricheva and Gurevitch, 2014), making it difficult to apply them interdisciplinarily (Nakagawa and Cuthill, 2007;Lortie et al., 2015), as for the quality assessment of meta-analyses in agricultural and soil sciences.
Additionally, as mentioned above, soil and agricultural scientists encounter specific issues different to ecology or medicine, when aiming to synthesize research outcomes meta-analytically.The guidelines and standards for evidence synthesis in environmental management and the CEESAT checklist by CEE clearly benefit scientists and other consumers of soil and agricultural meta-analyses, but do mainly focus on systematic reviews and maps and contain elements not mandatory in meta-analysis, e.g.registration, gathering a maximum of available relevant literature or performing critical appraisal.Moreover, the guideline is exhaustive and requires inexperienced readers time and effort to understand.Many, who are not aiming to become experts in the method themselves, might not be able to find the time for such an elaborate reading.

Why we need meta-analytical guidelines in agricultural and soil research
The contribution of agriculture to the global anthropogenic greenhouse gas (GHG) emissions (Tubiello et al., 2015) and the possibilities of sequestering carbon through improved soil management in the form of SOC (Smith, 2012;Paustian et al., 2016;Smith et al., 2005) are topics that have occupied soil and agricultural researchers over the last decades.Since 2000, the number of articles published on SOC has increased yearly (Fig. 1), due to climate change pushing the scientific community to search for mitigation and adaption opportunities in numerous ways, such as through agronomic practices.
Carbon sequestration in soils has gained increased resonance on the EU political agenda (EU Green Deal, Farm to Fork Strategy, EU Soil Strategy for 2030) -especially since the launch of "4 per mille initiative -Soils for Food Security and Climate" at COP21, and the publication of the global potentials of this initiative (Minasny et al., 2017).
Simultaneously, the number of meta-analyses published in the field of agriculture is continuously rising.We searched the Web of Science Core Collection for all available entries on "meta-analysis AND agriculture" since the year 2000 (Fig. 1, search conducted January 13 th , 2022).Between 2000 and 2010, there was little change in the number of meta-analyses published; a steady rise can only be seen since 2010.The increasing amount of available information, not only in agriculture and SOC research but across all scientific fields, is creating the need to synthesize data into a form which is easier to comprehend and allows the detection of overarching patterns (Culina et al., 2018).Unfortunately, as a consequence of the rising popularity of this method, more and more publications refer to their synthesis work as metaanalyses, despite applying less than rigorous methodologies.Many times, the term is misapplied to publications synthesizing information of primary studies, regardless of the methodologies used (Gurevitch et al., 2018).In fact, only studies using well-established statistical procedures -most importantly suitable effect-size calculation, correct study weighting by the inverse of variance, analysis of possible heterogeneity and appropriate statistical models which account for the structure of the meta-analytical data -should use the term "meta-analysis" to describe their synthesis method (Vetter et al., 2013;Gurevitch et al., 2018).When applying "non-standard metrics", which is using other methods than effect size as defined by Borenstein et al. (2009) to quantitatively synthesize primary studies, articles should not be called a "meta-analysis" or claim that "effect sizes" were calculated, as these terms are specific to the meta-analytical methodology (Borenstein et al., 2009;Koricheva and Gurevitch, 2014;Cooper et al., 2019c).It is important to promote this clear definition to allow the distinction between a "true" meta-analysis and other forms of synthesis work as e.g., correlation analyses or analyses through machine learning.
The previously mentioned reviews by Philibert et al. (2012) and Krupnik et al. (2019), who analyzed the quality of meta-analyses in agronomy, found that the overall quality of meta-analyses in this field is low.Philibert et al. (2012) concluded that more than half of the publications in the searched databases mentioned meta-analyses as a method but did not carry the method out.Further issues regarding effect size metrics, weighting, and heterogeneity analysis were found.
The more recent review by Krupnik et al. (2019), which analyzed meta-analyses studying the effects of conservation and organic agriculture on yield, also reported lacks in heterogeneity testing and weighting.Similarly, Beillouin et al. (2019), who studied meta-analyses on crop diversification, found issues on weighting, sensitivity analysis and database presentation.These results imply that the methodology applied in agronomical meta-analyses is variable and often not done according to standard metrics.The authors of the reviews concluded that there is a need for improvement of metaanalyses in agronomy.
Finally, it is a misconception that a high number of citations always equals quality (Aksnes et al., 2019;Leydesdorff et al., 2016).Koricheva and Gurevitch (2014) found that even in high-impact journals, cases of incorrect usage of the term "meta-analysis" can be encountered.This suggests that not only authors but also peer reviewers and journal editors do occasionally misunderstand the rules under which a meta-analysis must be conducted.O' Leary et al. (2016) analyzed the effects of journal impact factor on review quality and concluded that a high impact factor does not guarantee high quality of reviews and therefore did not recommended to use impact factor as a proxy for review quality.
All this provides reason to assume that core criteria, necessary in conducting meta-analyses, are not clear to many researchers in the field of agricultural and soil sciences.As a result, poor quality meta-analyses are published, which might report questionable conclusions and recommendations to other scientists, policymakers and farmers.Moreover, the interest in SOC sequestration and subsequent increase in related publications raises the question whether there are metaanalyses synthesising this knowledge.If so, does their quality show similar trends to agricultural meta-analyses reviewed in the past by Philibert et al. (2012), Beillouin et al. (2019) and Krupnik et al. (2019)?

Objectives
This study aims to quantitatively analyze 31 meta-analyses, studying the effects of different management practices on SOC, relevant for European cropland, published between the years 2005-2020.We compiled a quality criteria-set suitable for soil and agricultural sciences by adapting existing meta-analytical guidelines from other disciplines.The set is supported by a scoring scheme, which allows a quantitative analysis.A subsequent evaluation of the management practices studied in these SOC meta-analyses gives information on which agricultural operations require more or improved research.Finally, the aim was to demonstrate how to conduct a quick assessment of meta-analyses relevant for decision making.We chose a chapter of the IPCC "Special Report -Climate Change and Land" (Jia et al., 2019) and analyzed the quality of cited meta-analyses by using the most critical criteria of the compiled criteria-set.

Quality criteria-set
The quality criteria-set is based on the previous work of many experienced researchers with expert knowledge on metaanalysis (Table S1).The "Checklist of quality criteria for meta-analysis for research synthesis, peer reviewers and editors" by Koricheva and Gurevitch (2014) was used as a basis for the composition of the 17 quality criteria (Table 1).Their checklist is also built upon the previous efforts of other scientists who established quality criteria-sets in the fields of ecology, environmental management, conservation biology and agronomy.Other literature such as, "Introduction to Meta-Analysis" by Borenstein et al. (2009), "Handbook to Meta-analysis in Ecology and Evolution" by Koricheva, Gurevitch and Mengersen (2013), and "Handbook of Research Synthesis and Meta-Analysis" by Cooper, Hedges and Valentine (2019c) further supported the criteria construction and acted as sources for in depth explanation of those criteria, providing the reader with additional information (Table S2).
The 17 quality criteria were structured according to three groups: "Literature Search and Inclusion / Exclusion Criteria", "Meta-analysis", and "Results and Database Presentation".Additionally, a further division of the "quality criteria" into "sub-criteria" was conducted to provide a more detailed guidance.Each quality criterium or -if availablesub-criterium, was specified with the help of the column "Is criterium applied in meta-analysis (to which extent)", which offers the reader possible options, based on the availability of data or items within the analyzed meta-analysis.Each option ends with a numerical "Score", which indicates its quality.All individual scores can be summarized into a total score with a maximum of 30; the higher the total score, the better the overall quality of the meta-analysis.Furthermore, the quality-and sub-criteria were specified in the column "Description" to provide the reader with more detailed information.The final column offers references of relevant literature, supporting the authors' decisions on criteria formulation and scoring.In the supplementary material (Table S2) an extended version of this column can be found, where direct quotes of cited experts are provided.
Of these 17 quality criteria, we defined three as so called "cut-off" criteria (criteria 6-8 in Table 1), namely "Effect size", "Standard deviation extracted (or computed from statistics)" and "Studies weighted by 1/variance".When these criteria are not fulfilled by a meta-analysis, the most essential and relevant steps in this specific synthesis method are not met.These "cut-off" criteria aim to help consumers of soil and agricultural meta-analyses to identify the defining elements of the article and judge whether it is a "true" meta-analysis or not.As we wanted to highlight criterium eight "Studies weighted by 1/variance" and credit meta-analyses which did weight all studies correctly, we attributed a maximum obtainable score of four to this criterium.

Inclusion criteria, exclusion criteria and search strategy
First, inclusion (IC) and exclusion criteria (EC) were defined to create a framework for the literature screening (Table 2).
Studies were included when they (IC1) used the term "meta-analysis" in their title, abstract or author keywords.(IC2) Land uses included were arable-or crop land, also in combination with others as e.g., agroforestry or grassland.The (IC3) assessment of the effects of one or several management practices on SOC needed to be the aim of the study.Moreover, (IC7) European experiments needed to be a part of the (global) meta-analyses, as we wanted to collect and evaluate syntheses relevant for Europe.Articles were excluded when, for example, modelling was used to obtain SOC results (EC1).Articles were only included when they fulfilled all seven inclusion criteria.soil AND (agriculture OR management) AND (SOC OR OC OR "soil organic carbon" OR "organic carbon").552 articles were found (344 in Web of Science and 208 in Scopus) and automatic (conducted by Mendeley and JabRef software) and manual duplicate removal reduced the results by 167 articles (Fig. 2).The results were compared with the meta-analyses identified by Bolinder et al. (2020), who synthesized meta-analyses studying the effects of several management practices on SOC changes in agroecosystems.This led to the identification of one further study which complied with our inclusion and exclusion criteria (Table 2) and therefore was included in our evaluation.386 articles were exported into excel and screened by title, abstract and full text according to the pre-defined inclusion-and exclusion criteria.In total, 31 metaanalyses relevant for the scope of our study were found.Many articles were excluded, as they did not contain the word "meta-analysis" in their title, abstract or keywords, SOC was not the response variable of interest, or the studies investigated did not include European sites.Figure 2 shows a flow diagram of the complete screening process.The full information of the literature gathering, all 386 retrieved articles plus the screening decisions can be found in the supplementary material (Table S3 and S4, respectively).The complete reference list of the 31 meta-analyses can be found in the appendix (Table A1).

Quality analysis
The 31 retrieved meta-analyses were evaluated by two authors for their quality according to the quality criteria-set in Table 1.Each article was read thoroughly to ascertain whether certain criteria were fulfilled or not.Total scores for each meta-analysis were calculated, with a maximum reachable score of 30.The complete dataset containing the scores for each meta-analysis and all calculations can be found in the supplementary material (Table S2, S5).SigmaPlot version 14.5 and Microsoft Excel version 1808 were used for plotting of figures and tables and for calculations.

Management categories
The retrieved data also offered the possibility to analyze the "state of knowledge" on meta-analyses studying management effects on SOC.The aim was to assess how many meta-analyses were conducted on a certain management practice and whether their quality was sufficient to stop the production of new meta-analyses on the respective practices.This information will aid future research by guiding it towards knowledge needs and avoiding redundant work.We therefore grouped the meta-analyses according to the management practices they studied.11 management categories were formed and are described in Table 3.These categories aim to structure the collected SOC meta-analyses and allow a simplified investigation.As some meta-analyses studied the effects of more than one practice, they were added to all respective categories.10. High input system system that aims in increasing carbon by e.g., irrigation, winter crops, etc. according to IPCC (1997) Ogle (2005) 11.Set-aside effect of setting-aside land from crop production and planting trees or grasses Ogle (2005) Finally, the total number of articles per category were calculated and meta-analyses with the highest scores identified.
Simultaneously, information on treatment and control, the geographical scale and soil depth were extracted.As the overall score does not give information on whether the "cut-off" criteria were fulfilled, we extracted this information as well.We presented the overall effect sizes of the meta-analyses only when both these elements were fulfilled.
Overall treatment effects on SOC are shown in percentage change from the control; when results were displayed in log response ratio (LnR), we calculated percentages with the Eq. ( 1):

Quick assessment of meta-analyses relevant for policy making -An example
To provide readers with an example of the impacts of meta-analytical quality on policy-and decision making, we screened Chapter 2: "Land-climate interactions", of the Intergovernmental Panel on Climate Change (IPCC) "Special Report -Climate Change and Land" (Jia et al., 2019) for cited articles which used the term "meta-analysis" in the title.We chose this report by the IPCC, as their outputs are highly relevant for combating the global climate crisis and are often the basis of policy-making (IPCC, 2019), and because this exact chapter is deeply connected to the contents of this review.In total, 16 articles were retrieved and checked against the "cut-off" criteria of the quality criteria-set (Table S6).

Results
The investigation of the 31 meta-analyses, studying management effects on SOC published between 1990 and 2020, found that Ogle et al. (2005) published the first article on this topic.Nevertheless, the synthesis did not qualify as a formal metaanalysis, as no effect size was calculated.The first formal meta-analysis on SOC was published by Luo et al. (2010), who looked at the effect of no-till versus conventional tillage.Overall, the number of SOC meta-analyses, published between 2005 and 2020, increased over time (Fig. 3A).Scores, which were calculated based on the fulfillment of the quality criteria, also experienced a rise (15-year period) and related significantly with the publication year (y= -1993.9+0.9954*x;A1 and Table S2).Dashed line indicates maximum score of 30.

Literature search and inclusion / exclusion criteria
The 17 quality criteria are clustered into three groups (Table 1).The first one, "Literature search and inclusion / exclusion criteria" consists of five quality criteria; the first criterium, "Literature Search", was satisfied by more than half of the meta-analyses (Fig. 4).In nearly a quarter of the analyses, authors checked the reference lists of other existing metaanalyses and reviews for available literature.Therefore, the usefulness of this method seems to be widely underestimated.
By comparing retrieved literature to other existing publications, we can not only gain confidence in our search strategy, but also encounter information which might be difficult to find otherwise (e.g., grey literature).
Inclusion and exclusion criteria, as well as a description of treatment and a control were presented by almost all metaanalyses (we only analyzed whether treatment and control were described, not if they were comparable across included studies).Moderators were described by over half of SOC meta-analyses.Description of moderators, including their range (for continuous explanatory variables) or groups (for categorical explanatory variables) are necessary to present the way in which moderator analysis will be conducted.Results for the sub-criteria can be found in the supplementary material (Table S5).

Meta-analysis
The "Meta-analysis" group consisted of nine quality criteria (Table 1), which were satisfied by the SOC meta-analyses to variant extents.Effect sizes were calculated according to standard metrics by 74% of meta-analyses (Fig. 5A).Almost half of meta-analyses used log response ratio for effect size calculation and about a third applied raw mean difference or standardized mean difference.Standard deviations (SDs) were extracted (or computed from available statistics) from all primary studies by 16% and partly (for some studies correctly extracted, but for the rest ignored or roughly estimated by e.g., calculating the mean SD from available SDs or reassigning as 1/10 of the mean) by 42% of meta-analyses (Fig. 5B).
Weighting each study by 1/variance was done by 13% of meta-analyses (Fig. 5C).Nineteen percentage of SOC metaanalyses weighted only some studies by the inverse of variance, as they only extracted or computed SDs from some studies (and therefore received a score of "1" for criterium 7; for a detailed description of the criterium for weighting, see quality criterium number eight in Table 1).Accordingly, weighting was not done in over two thirds of analyses.We classified these three criteria (effect size estimate, SDs extracted and weighting by 1/variance) as "cut-off" criteria (6-8 in Table 1).When these are not fulfilled, a meta-analysis does not account as such.In our quality assessment, we acknowledged when authors partially weighted by the inverse of variance (as they only partially extracted SDs) with one point for each.Nevertheless, we urge authors to extract SDs for each study (or compute them from available statistics) and further weight them by the inverse of variance in order to conduct a high-quality meta-analysis.In Figure 6 and 7, satisfaction of criteria 9 to 14 and 15 to 17 (respectively) are displayed in form of stacked bars which show the percentage of meta-analyses that did fulfill the "cut-off" criteria (n= 4) and the ones that did not (n= 27; a total of 31).In the following, we will describe only the results for all 31 SOC meta-analyses.For the individual results, please refer to the figures.Corresponding data used for the calculation of these results can be found in the supplementary material (Table S7).
Subgroup analysis and meta-regression, which identify the source of variation between studies, were assessed by almost half of meta-analyses (Fig. 6).Models applied and software used were reported more frequently.Only about 25% of meta-analyses accounted for non-independence of effect size, while the rest failed to do so.Bulk density was measured in 35% of meta-analyses, the other 65% used pedotransfer function to estimate this parameter, therefore introducing a source of uncertainty in SOC stock estimation.Lastly, sensitivity analysis of the meta-analytical results was done rarely.

Results and database presentation
Figure 7 shows the results for the group "Results and database presentation".Almost half of the meta-analyses displayed their results in the form of figures or tables.Summarized effect sizes and confidence intervals or moderator analysis were presented graphically or in tabular form by 65% and 68% of meta-analyses respectively.Forest plots were presented by 6% of meta-analyses.Meta-data was presented in over two third of analyses, whereas a full database was made available to the readers in 13% and partly in 3% of cases (for further explanation see criterium 17 in Table 1).Information on the calculation of these results can be found in the supplementary material (Table S7).

Overarching findings
When looking at the overall results across the three quality criteria groups, quality varied greatly between the 31 analyses with a maximum score of 29, a minimum score of 2 and a median of 14. Haddaway et al. (2017) produced a meta-analysis of high quality which received the highest score according to our assessment.However, they used raw mean difference to calculate effect sizes, which may not be the most suitable for meta-analyses in the soil and agricultural field.In Sect.4.2.
"Meta-analysis" we will go more into detail on this issue.There were seven meta-analyses with scores up to five, the majority achieved scores between five and 15.Ten meta-analyses reached scores between 15 and 20, whereas only three reached a score above 20.Only four out of 31 meta-analyses are "true" meta-analyses, as they used standard metrics for effect size calculation and weighted all studies by the inverse of the variance (Fig. 8).

Analyzing management categories
Management practices studied in the meta-analyses were counted in order to assess their incidence.We found that almost half of the 31 meta-analyses studied the effects of tillage on SOC (in some cases besides other management practices) (Table 4).Other practices studied frequently were "organic agriculture" and "cover crop cultivation" (6 times each).Data on "crop residue", "fertilization", "amendments", "biochar" and "diversification" were synthesized less often.The effects of "combined practices", "high input" and "setting aside" on SOC were each assessed once.We found that meta-analyses, which passed the "cut-off" criteria, are available for four out of the 11 management categories (tillage, cover crop, crop residue, amendment).For tillage, we decided to show the three meta-analyses with the best scores (Bai et al., 2019;Haddaway et al., 2017;Li et al., 2020), as several analyses above average quality were available.Nevertheless, only Haddaway et al. (2017) fulfilled the criteria for effect size calculation, SDs and weighting, whilst also achieving an overall high score and is therefore the one publication providing a high-quality meta-analysis on the effects of management practices on SOC.In the categories "organic", "fertilization", "biochar", "diversification", "combined", "high input" and "set-aside", no meta-analyses conducted according to the standards are currently available.In the last column of Table 4, overall effect sizes for SOC can be found.As Haddaway et al. (2017) calculated effect sizes by raw mean difference, if was not possible to transform their results from stock into percentages.For the five management categories where no meta-analysis weighed by the inverse of variance ("fertilization", "diversification", "combined", "high input system" and "set-aside"), overall effect sizes for SOC change are not displayed.When looking at the retrieved data on SOC changes per management category (Table 4), it is apparent that the largest increases of SOC compared to the controls were achieved in the categories "organic", "cover crop", "amendments" and "biochar".Our quick analysis of the IPCC special report (Jia et al., 2019) found that out of 16 articles, more than 50% did not qualify as "true" meta-analyses, as five did not calculate effect sizes according to standard metrics and four failed to extract SDs and to weight by the inverse of variance.Seven articles did in fact conduct meta-analysis correctly.Six meta-analyses used log response ratio to calculate effect sizes, one used standardized mean difference.These seven meta-analyses extracted SDs for each study and weighted by the inverse of variance.Calculations and references of all 16 analyzed articles can be found in Table S6.

Discussion
Previous guidelines and expert knowledge on meta-analysis from other disciplines were adapted to construct an easy-touse criteria-set for the quantitative quality assessment of meta-analyses in soil and agricultural research.With the help of these criteria, we analyzed 31 meta-analyses, studying the effects of different management practices on SOC.Moreover, the retrieved meta-analyses were structured according to 11 categories of agricultural management practices, which allowed us to assess and analyze the state-of-knowledge on these categories.Hence, recommendations for future metaanalytical research and general improvement of applied methodology can be given.We found major deficiencies in the reporting of literature searches, application of standard metrics for effect size calculation, correct weighting by the inverse of variance, extraction of independent effect sizes and database presentation.The quality of meta-analyses rose over time (15-year period) and correlated significantly with publication year (R 2 = 0.382).Similar trends were observed in quality assessments of meta-analyses in the medical (Jamshidi et al., 2018) and environmental (Beillouin et al., 2019) field.
In the following, we will discuss the results of the quality assessment of meta-analyses on SOC with the findings of four quality assessments of meta-analyses and quantitative reviews in agronomy and ecology.We included the study by Philibert et al. (2012), focusing on agri-environment and -biodiversity, the review of Krupnik et al. (2019), looking at conservation and organic agriculture, the study by Beillouin et al. (2019), studying crop diversification and the excellent evaluation of meta-analyses in plant ecology by Koricheva and Gurevitch (2014).To simplify the discussion, not all information for the 17 quality criteria was extracted from the reviews.Instead, we selected quality criteria to be discussed according to 1. the information available in most of the reviews, which allowed a comparison of results and 2. relevance (as e.g., effect size metrics), as certain quality-criteria are more important than others.

Literature search and inclusion and exclusion criteria
The comparison of reviews for the criterium "Literature search reported" showed that our study found higher compliance (53%) with this criterium than the ones of Philibert et al. (2012) or Koricheva and Gurevitch (2014) (Table 5).Beillouin et al. (2019) reported that 46% of meta-analyses presented the search string and 86% the eligibility criteria.Krupnik et al. (2019) found that all analyzed meta-analyses presented the literature search sufficiently.This high agreement may be caused by the small study number (n=17) or the definition of less demanding criteria by the authors.
A quality criterium, which is of special significance to the soil and agricultural field, is the inclusion of grey literature.
Here, exceptionally large amounts of data are available, as governmental research activities are not focused on publishing results in scientific journals.Therefore, although the inclusion of grey literature is not compulsory, it is highly encouraged (Culina et al., 2018).When conducting a meta-analysis on an international or global scale, analysts will find that grey literature is often available in national languages only, which complicates and restricts its inclusion.Nevertheless, the most essential part of searching for literature, whether scientific or grey, is complete reporting.
Our results show that this reporting of search strategies is often limited.Therefore, essential information to allow reproduction of the study is lacking and possible differences in outcomes between meta-analyses, studying the same effects, cannot be fully explained.If a synthesis is not replicable, it cannot be fully trusted, as mistakes in methodological proceedings are possible (Haddaway et al., 2020;Parker et al., 2016).In another review, Hungate et al. (2009) showed how important complete reporting of search and screening strategy is.Lack of transparency prompted criticism on the results of meta-analyses.Non-identical time frames over which literature was gathered, differences regarding inclusion criteria and, in our eyes most importantly, limited search methods can influence the number of articles found and taken up into a meta-analysis.This indicates the need to draw quality criteria and disseminate good practices across research fields and to improve the power of meta-analytical results.

Meta-analysis
Effect size calculation is an essential and mandatory part of meta-analysis (Koricheva and Gurevitch, 2013).Therefore, the term "meta-analysis" should only be used when data is quantitatively synthesized as described in the textbooks of Borenstein et al. (2009);Cooper, Hedges and Valentine (2019c) and Koricheva et al. (2013).The investigation regarding the compliance of our SOC meta-analyses with the criterium "Effect size calculated according to standard metrics", showed that about three quarters of meta-analyses did calculate effect sizes according to such metrics.Koricheva and Gurevitch (2014) came to similar conclusions in their review of meta-analyses in plant ecology (Table 5).Further, only about half of SOC meta-analyses used log response ratio for effect size calculation.
These findings indicate that correct calculations of effect sizes are not applied consistently in the fields of SOC and plant ecology, although they represent the most fundamental and critical part in meta-analysis.Among the several possible choices in effect size metrics, we recommend using log response ratios when conducting soil and agricultural metaanalyses.They are easy to interpret, and effect sizes are not affected by different variances of control and experimental groups.Overall, they are more suitable for meta-analyses studying agricultural management effects on soil parameters as e.g., SOC, than the standardized mean difference (Hedge's d).When using the standardized mean difference, the results are more difficult to interpret (especially for policy makers or farmers) compared to log response ratios, which can be back-transformed to percent changes from the control.
In Sect.3.3 "Results and database presentation", we mentioned that, in our opinion, raw mean difference (also called unstandardized mean difference) is not recommended for calculating effect sizes in the field of soil and agricultural research.Unlike response ratio, raw mean difference does not consider variations in control levels, which are often highly variable across field experiments, particularly, on a global scale.In case of SOC studies, control levels may vary between 10 and 100 t C ha -1 , which makes using raw mean difference between treatment and control as an index of effect size meaningless.It may result in similar effect sizes for the relatively large as for small responses, as illustrated in Figure 9.
Therefore, raw mean difference can only be applied when all experiments studied in the meta-analysis are using the same scale (Borenstein et al., 2009).Raw mean difference usually does not result in a normal distribution of effect sizes, which is a prerequisite.Although this metric is easy to use, it may be suitable for meta-analyses when controls do not present a large variation across studies.That, however, is hardly possible to achieve for the diversity of pedo-climatic conditions.Weighting is essential, as different studies have different precision, and more precise studies with larger sample size need to be more heavily weighted in an analysis.The weighting should be done by the inverse of variance.Applying it in other ways, for example by sample size, can lead to several problems such as the introduction of unknown biases (as in e.g., Maillard and Angers, 2014;Han et al., 2016).When not weighted at all (as in e.g., King and Blesh, 2018), the variation within-and between-studies is not separated.Therefore, common-and random-effects models are not useable, leading to difficulties in assessing heterogeneity (Gurevitch et al., 2018).All these possible biases can adulterate the results of meta-analyses and therefore lead to false conclusions.According to findings by Hungate et al. (2009), depending on the functions used for weighting, differences in mean estimates of the effect sizes can be found.Weighting by sample size or not weighting resulted in comparable effect size estimates which often were larger than when weighted by inverse of variance.Our assessment showed that only 13% of SOC meta-analyses weighted by the inverse of variance, whereas Philibert et al. (2012) found 37% compliance.Koricheva and Gurevitch (2014) reported that three quarters of meta-analyses weighted by 1/variance.Meta-analyses studied by Krupnik et al. (2019) weighted by sample size, therefore are not correctly conducted according to our criteria-set.Beillouin et al. (2019) found that 40% of meta-analyses, studying diversification effects, weighted by 1/variance (and in some cases by sample size).
When using a random-or mixed-effect model, effect sizes might show a certain amount of variability that cannot be explained by sampling errors alone, raising the question whether moderator effects may have influenced the results.A moderator is a third variable that conditions the relations between two others.Therefore, moderator analysis must be conducted to identify their effects (Lipsey, 2019).In agricultural and soil sciences, abiotic factors (climatic zone, temperature, soil pH, clay content, etc.) as well as other applied management practices can moderate the results and should subsequently be accounted for (Valkama et al., 2015).Moderators can be analyzed by subgroup analysis or metaregression.Subgroup analysis is suitable for categorical moderators which can be described in form of groups, e.g., climate zone (tropical, continental, Mediterranean, etc.).Contrary, meta-regression is suitable for continuous moderators (e.g., duration of experiment, soil pH, etc.).We found that moderator analysis in form of Q-test was performed by about half of analyzed SOC meta-analyses.Results by reviews of Philibert et al. (2012), Koricheva and Gurevitch (2014) and Beillouin et al. (2019) showed that meta-analyses in agri-environment, plant ecology and conservation agriculture complied almost twice as much with this criterium.
Another issue frequently found in meta-analyses is the non-independence of effect size estimates, which occurs when effect sizes are not extracted independently, but are somehow related to each other -for example observations from different soil layers, from different treatment levels, or from sites located nearby which share the same pedo-climatic conditions.This non-independence can lead to the underestimation of standard error of the mean effect and subsequently can impact the free evaluations of the effects' statistical significance.Therefore, meta-analysts should be aware of the sources of non-independence and should select only one effect size among several related effect sizes (Gurevitch and Hedges, 1999;Nakagawa et al., 2017).An example would be the inclusion of only the treatment effect of cover crop mix A on SOC, compared to a control with no cover crops, although the results of several other mixes (B, C and D) are available too.As they have been conducted in different plots but on the same site, they share the same control and pedoclimatic characteristics and, therefore, are not independent.The same applies to several observations (e.g., SOC) taken from multiple sub-layers/horizons or varying treatment levels (e.g., fertilization experiments).It should also be acknowledged that in order to conduct a high-quality meta-analysis, the number of included independent studies/experiments from primary articles should be sufficient to allow the calculation of a rigorous overall effect estimate and to study the source of variation across studies.Hedges et al. (1999) structured sample sizes requirements as following: n ≥ 50, a large body of primary data; 20 ≤ n ≤ 50, intermediate; n ≤ 20, small.It is recommended to include at least 50 independent studies into a meta-analysis to obtain reasonably accurate 95% confidence intervals for effect sizes.
Lastly, the degree of sensitivity of meta-analytical results should be assessed.When results are sensitive to e.g., publication bias, it is indicated that these factors need specific attention (Koricheva and Gurevitch, 2014).Funnel plots can support the interpretation of statistics by visualizing bias and highlighting outliers (Borenstein et al., 2009), which should be excluded to conduct the analyses without them and see if the overall results are affected (Rothstein et al., 2013).
Another possibility is the testing via the Fail-safe number.The computation of this number allows us to detect how many additional studies it would take to reduce the overall effect to a non-significant one (Rosenthals's method) or an arbitrary minimal level (Orwin's method) (Borenstein et al., 2009).Philibert et al. (2012) reported that less than 10% of metaanalyses conducted sensitivity analysis.About 30% of SOC meta-analyses fulfilled this criterium.Beillouin et al. (2019) and Krupnik et al. (2019) found that about 40% conducted sensitivity analysis, whereas Koricheva and Gurevitch (2014) found a higher agreement of their meta-analyses or reviews with this criterium.

Results and database presentation
In the group "Result and database presentation", the presentation and availability of results and full database, which give all necessary information to reproduce an analysis, were compared.Extracted data should be provided to an extent sufficient to inform readers about all subsequent synthesis work (Woodcock et al., 2014).
The results of the moderator analysis should be displayed in form of figures or tables.For subgroup analysis, a summary forest plot (see Gurevitch et al. 2018, Figure 1c) is suitable.This plot should not be confused with the classic forest plot, which shows all calculated effect sizes, corresponding confidence intervals and summary effect size.Metaregression can be displayed in form of e.g., a bubble plot (see Gurevitch et al. 2018, Figure 1d).redundant work.Full datasets promote the use of the data by others and enable updating and detection of errors (Koricheva and Gurevitch, 2014).Of all five reviews, our findings complied least with this criterium (Table 5).Only 16% SOC meta-analyses reported databases, including all relevant information to allow recalculation of effect sizes.Overall, results were poor.Philibert et al. (2012) received similar results, Koricheva and Gurevitch (2014) and Beillouin et al.
(2019) found higher correspondences, and Krupnik et al. (2019) identified the highest agreement (over 70%) with the criterium.This might be explained by the small sample size or less demanding criteria, as in our analysis of criterium "Literature Search Reported".

Management categories
The results (Table 4) show that the management category "Tillage" was studied by 15 meta-analyses, with the highest score of 29 by the meta-analysis of Haddaway et al. (2017), who provided a in depth and high-quality synthesis of notill/reduced tillage versus conventional tillage effects on SOC at a global level using raw mean difference as effect size.A review of agricultural meta-analyses recently published by Young et al. (2021) found 14 meta-studies looking at the effects of no-till on SOC.Beillouin et al. (2021), who provide findings of available meta-analyses studying various land management practices on a global scale, identified over 20 studies on tillage effects.Therefore, we suggest that the topic is well covered for the moment and no further global meta-analysis is needed until there is a substantial number of new publications or new potential moderator effects of interest.Nevertheless, according to our findings, high quality metaanalyses and systematic reviews studying tillage effects on SOC in specific pedoclimatic zones or continents, such as Europe, are still missing.
The maximum score ( 16) in the organic management category was reached by the publication of García-Palacios et al.
( analyzing cover crop effects on SOC.In the category "crop residue", the maximum score of 21 was reached by the metaanalysis of Li et al. (2020).Literature search reporting, effect size calculation and moderator analysis was done well, but effect sizes were not extracted independently, outliers were not assessed, and a full database was not provided.Maximum scores in all other management categories did not achieve scores above 18.We therefore conclude that there is a need for further and improved meta-analyses on all management categories, except no-till/reduced tillage versus conventional tillage.

Impact of meta-analysis quality on policy making
In our quick quality assessment of meta-analyses cited in chapter 2 of the IPCC "Special Report -Climate Change and Land" (Jia et al., 2019), we found that over 50% of studies (nine out of 16) which used the term "meta-analysis" in their title, were in fact no true meta-analyses, as they did not fulfil the "cut-off" criteria.As not even the key criteria for conducting a meta-analysis were followed by these articles, the quality of the overall study and therefore the reliability of their results is unsure.In a study by O' Leary et al. (2016), 92 reviews were assessed on their value for decision-making with the help of the Collaboration for Environmental Evidence Synthesis Assessment Tool (CEESAT) (Woodcock et al., 2014; Collaboration for Environmental Evidence, 2020), which contains elements for analysing transparency, objectivity and comprehensiveness.They found that the evidence reviews did perform poorly, with a median score of 2.5 (of possible 39).Further, many of these reviews showed low reliability in methodology, which enhances the risk that the current knowledge is not adequately reflected.They concluded, that "such reviews thus have the potential to misinform decisionmaking, especially if selectively used by stakeholders with particular priorities" (O' Leary et al., 2016, p.80).
Scientific literature is used increasingly for environmental management decision making (Dicks et al., 2014).
Especially documents that synthesize the results of multiple studies and peer-reviewed publications are primary sources of information for respondents (Seavy and Howell, 2010).Although science is by far not the only factor which is influencing policy decisions, there have been cases in which scientific findings have had crucial impacts on policy changes (Pullin and Knight, 2012).Therefore, researchers are obligated to ensure that their evidence reviews (such as meta-analyses) accurately reflect the primary evidence base and are reliable and transparent (O'Leary et al., 2016).

How to fix the problem
The described limitations call for advances in meta-analyses conducted in soil and agricultural research.Firstly, to improve the overall quality, it is crucial to support education at university level and implement training for interested scientists and stakeholders.Gurevitch et al. (2018) stressed that such trainings should be part of the curriculum for higherdegree students.Furthermore, they point out that not only scientists but also editors, reviewers and science-policy practitioners would greatly benefit from knowledge on meta-analytical methodology, as it would enable them to assess the quality of meta-analyses and interpret results.
Secondly, readers of meta-analyses should check for the presence of key elements assuring transparency and replicability of the article (Lortie et al., 2015).Krupnik et al. (2019) argue that scientists and policy makers need to evaluate meta-analyses critically regarding treatment definition, data collection and analysis.Results of meta-analyses on highly politicized agronomic topics should be interpreted especially carefully.We fully agree with these claims and support the appeal to be critical when it comes to meta-analytical outcomes.The proposed quality criteria-set should aid this demanding process.
An issue that meta-analysts frequently face, is that many primary publications do not report SDs, which are needed to calculate variance and subsequently weight studies by the inverse of it.As a result, many studies cannot be included in the meta-analysis, thereby reducing the amount of valuable information needed to gain rigorous results.To solve this issue, a new tool named "EX-TRACT" was recently developed (Acutis et al., 2022).The easy-to-use Excel© worksheet application allows to obtain pooled error standard deviations (sw) from ANOVA and in Multiple Comparison Tests (MCT) outcomes.By using this tool, we can double the number of studies which can be included in a meta-analysis (Acutis et al., 2022) and avoid discarding primary literature which fits our scope.
Another available and highly useful tool allows the computation of SOC stock and its SD for a single soil layer based on SOC concentration and bulk density (also from multiple sub-layers) (Tadiello et al., 2022).The Excel© workbook automatically computes the means of stocks and SDs, saving the results in a ready-to-use database.This is especially helpful when conducting a meta-analysis.Since in original articles, SOC observations are often presented for multiple sub-layers, but not for the complete soil profile, meta-analysts tend to extract all available observations per a study, leading to a non-independence of effect sizes.With the help of this tool, it is possible to "fuse" the results from all layers into one, independent effect size.
The publication of protocols prior to a meta-analysis would benefit the method by allowing constructive criticism and suggestions for improvement by the scientific community (Moher et al., 2015;Brandt et al., 2013).Gurevitch et al. (2018) described that the pre-registration of planned meta-analyses, which are then peer-reviewed and published before the actual analysis is conducted, can aid the reduction of selective reporting and publication bias.Systematic review protocols for environmental sciences from the journal "Environmental Evidence" or the initiative "ROSES" are available and can be used for the construction of meta-analytical protocols.Protocols then can be published in suitable journals, e.g.MethodsX.
Lastly, another viable asset in improving the quality of future meta-analyses in soil science would be the creation of a European meta-analysis hub, which focuses on 1) the development of high-quality products, 2) the assessment of quality and 3) the creation of a European database.The database should comprise all available information of former metaanalyses on soil and agricultural research, providing researchers with valuable data.With the help of this database, new meta-analyses, studying management practices relevant for the pedoclimatic zones present in Europe, could be conducted.
This is important, as the inclusion of global experiments into an analysis can lead to over-diversification and therefore to the combination of "apples and oranges", which is not expedient.

Conclusions
Quality assessment of meta-analyses, especially in the complex agricultural set up, allows the detection of rigorous synthesis efforts and their distinction from work with lower quality.Meta-analyses in soil and agricultural research may encounter specific issues, which differ to other fields like medicine, environment or ecology.Therefore, we adapted metaanalytical guidelines from other disciplines to construct an easy-to-use criteria-set, which is suited to quantitatively assess the quality of meta-analyses in agriculture and soil sciences.With the help of these criteria, we further investigated the quality of 31 meta-analyses, studying the effects of agricultural management practices on SOC.By doing so, we aimed to present the application of the criteria-set and analyze the quality of quantitative reviews within this prominent topic.Our analysis showed that the overall quality of analyses improved over time, but only one achieved a high score.Deficits were found in literature search, statistical analyses, and data presentation.The correct weighting by 1/variance of effect sizes was found to be a challenge for many authors.In some cases, the term "meta-analysis" is still falsely used to describe quantitative syntheses of any style, independent of methodology applied.The analysis also revealed that out of 11 identified management categories studied by the meta-analyses, only the effects of no-till/reduced tillage versus conventional tillage on SOC are studied sufficiently in form of a high quality meta-analytical synthesis.
Our results indicate that the quality of meta-analyses in agricultural and soil sciences is, despite all efforts, still not satisfactory.As the information presented in summarizing research articles is frequently used by decision makers, this can also have negative impacts on evidence-based policymaking.It is high time that the agricultural and soil scientific community adapts rigorous meta-analytical methodologies and improves the quality of its output.We believe that the method is a viable and indispensable tool in quantitative synthesis of agricultural and soil research and only with combined efforts and collaborations between stakeholders across disciplines we will be able to overcome the presented challenges.

Appendix 644
Table A1: Assessed SOC meta-analyses and their identification numbers.

Figure 1 :
Figure 1: Number of meta-analyses in agriculture and primary research articles on soil organic published between 1 January 2000 and 31 December 2021 (search conducted on the 13 January 2022 on Web of Science Core Collection, searched in "Topic", results taken from WoS "Analyse Results" tool; Boolean search string for MA in agriculture: meta-analysis AND agriculture, carbon;Boolean search string for articles on SOC: "soil organic carbon")

Figure 2 :
Figure 2: Flow diagram of literature search and screening.Adapted from: Page et al. (2021) R 2 = 0.382) (Fig.3B) (normal distribution of scores tested with Shapiro-Wilk test; P= 0.115).If the observed rise in quality is projected into the future, without any intervention, a score of 30 will only be reached by the year 2033.As the metaanalysis byHaddaway et al. (2017)  (ID= 10; score= 29) is an outlier which influences the regression result, we also calculated how the prognosis would change if we removed this meta-analysis.The new regression line (y= -1907.6+0.9523*x;R 2 = 0.548) estimates that scores of 30 will be reached in 2034.

Figure 3 :
Figure 3: (A) Number of SOC meta-analyses published per year.(B) Scores of SOC meta-analyses over time (between 2005-2020) and corresponding regression line.Numbers beside dots indicate meta-analysis ID (ID and linked author information in Table

Figure 4 :
Figure 4: Compliance of meta-analyses with the criteria in group "Literature search and inclusion / exclusion criteria".

Figure 5 :
Figure 5: Compliance meta-analyses with "Cut-off" criteria in the group "Meta-analysis": (A) Ratio of effect size metrics used by the meta-analyses.(B) Ratio of meta-analyses which extracted or computed standard deviations.(C) Ratio of meta-analyses which weighted by the inverse of variance.

Figure 6 :
Figure 6: Compliance of meta-analyses with the criteria 9-14 in the group "Meta-analysis".

Figure 7 :
Figure 7: Compliance of meta-analyses with the criteria in the group "Results and database presentation".

Figure 8 :
Figure 8: Scores of individual SOC meta-analyses displayed as scores per group.Sorted from lowest to highest achieved score.Metaanalysis ID and full reference information appear in Table A1.Dashed line indicates maximum reachable score of 30.Filled circles indicate "true" meta-analyses, which used standard metrics for effect size calculation and weighted each study by inverse variance.Open circles indicated meta-analyses which weighted some studies by inverse variance.
quality assessment of meta-analyses, relevant for policy making

Figure 9 :
Figure 9: Example of the relationship between the SOC levels in control and effect sizes measured as response ratio or raw mean difference for three studies.Response ratio indicates increasing effect size with decreasing control level.Raw mean difference indicates equal effect sizes for all experiments and does not consider variation in control levels.Triangles indicate an increase or decrease of values; rectangle indicates constant values.

Table 4 : Meta-analyses with the highest score per management category, effect size and weighting used, and overall SOC responses.
Color scheme in green, yellow and grey aims to support the visual presentation of these outcomes.Meta-analyses which did not calculate effect sizes according to standard metrics or weighted by the inverse of variance do not qualify as meta-analyses; results are therefore not presented.The maximum achievable score is 30.
*Results for when all comparisons were included or only comparisons with SD values were included, respectively.

Table 5 : Comparison of quality assessment of meta-analyses in disciplines of soil science, agronomy and plant ecology. Topic Author Number of meta-analyses or quantitative reviews under assessment Literature search reported Effect size calculated according to standard metrics Weight Subgroup analysis and meta-regression Sensitivity analysis Full database
), which lacked in-depth reporting of the search strategy and independency of effect sizes, used studies where pedotransfer functions were applied, did not check for outliers, only extracted SDs partly, and thus weighted partly by 1/variance.Regarding the effect of cover crops on SOC, Jian et al. (2020) produced the meta-analysis which reached the highest score (21) out of six meta-analyses in this category.Reporting of literature searches and effect size calculations was conducted well, but the study failed to calculate moderator effects and to conduct sensitivity analysis, had non- independent effect sizes, and included studies with pedotransfer function application.Lessmann et al. (2022), who evaluated the global variation in SOC sequestration through improved cropland management, found six meta-studies