The methodological and reporting characteristics of Campbell reviews: A systematic review

Abstract Background The Campbell Collaboration undertakes systematic reviews of the effects of social and economic policies (interventions) to help policymakers, practitioners, and the public to make well‐informed decisions about policy interventions. In 2010, the Cochrane Collaboration and the Campbell Collaboration developed a voluntary co‐registration policy under the rationale to make full use of the shared interests and diverse expertise from different review groups within these two organizations. In order to promote the methodological quality and transparency of Campbell intervention reviews, the Methodological Expectations of Campbell Collaboration Intervention Reviews (MECCIR) were introduced in 2014 to guide Campbell reviewers. However, there has not been a comprehensive review of the methodological quality and reporting characteristics of Campbell reviews. Objectives This review aimed to assess the methodological and reporting characteristics of Campbell intervention reviews and to compare the methodological quality and reporting completeness of Campbell reviews published before and after the implementation of MECCIR. A secondary aim was to compare the methodological quality and reporting completeness of reviews registered with Campbell only versus those co‐registered with Cochrane and Campbell. Search Methods We searched the Campbell Library to identify all the completed intervention reviews published between 1 January 2011 to 31 January 2018. Selection Criteria One researcher downloaded and screened all the records to exclude non‐intervention reviews based on reviews’ title and abstract. A second researcher checked the full text of all the excluded records to confirm the exclusion. In case of discrepancies, the two researchers jointly agreed on the final decision. Data Collection and Analysis We developed the abstraction form based on mandatory reporting items for methods, results, and discussion from the MECCIR reporting standards Version 1.1; and additional epidemiological characteristics identified in a similar study of systematic reviews in health. Additionally, we judged the methodological quality and completeness of reporting of each included review. For methodological quality, we used the AMSTAR 2 (A MeaSurement Tool to Assess systematic Reviews 2) instrument; for reporting completeness we used the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta‐Analyses) checklist. We rated reporting as either complete/partial or not reported. We described characteristics of the included reviews with frequencies and percentages, and median with interquartile ranges (IQRs). We used Stata version 12.0 to conduct multiple linear regressions for continuous data and the ordered logistic regressions for ordered data to investigate associations between prespecified factors and both methodological quality and completeness of reporting. Main Results We included 96 Campbell reviews, 46 were published between January 2011 and September 2014 (pre‐MECCIR) and 50 between October 2014 and January 2018 (post‐MECCIR). Twenty‐two of 96 (23%) reviews were co‐registered with Cochrane. For overall methodological quality, 16 (17%) reviews were rated as high, 40 (42%) as moderate, 24 (25%) as low and 16 (17%) as critical low using AMSTAR 2. Reviews published after the release of MECCIR had better methodological quality ratings than those published before MECCIR (odds ratio [OR]   =6.61, 95% confidence interval [CI] [2.86, 15.27], p < .001). The percentages of reviews of high or moderate quality were 76% (post‐MECCIR) and 39% (pre‐MECCIR). Reviews co‐registered with Cochrane were rated as having better methodological quality than those registered only with Campbell (OR = 5.57, 95% CI [2.13, 14.58], p < .001). The percentages of reviews of high or moderate quality were 77% versus 53% between co‐registered and Campbell registered only reviews. Twenty‐five of 96 reviews (26%) completely or partially reported all 27 PRISMA checklist items. The median number of items reported across reviews was 25 (IQR, 22–26). Reviews published after the release of MECCIR reported 2.80 more items than those published before MECCIR (95% CI [1.74, 3.88], p < .001); reviews co‐registered on Campbell and Cochrane reported 1.98 more items than reviews only registered in Campbell (95% CI [0.72, 3.24], p = .003). An increasing trend over time was observed for both the percentage of high and moderate methodological quality of reviews and the median number of PRISMA items reported. Authors' Conclusions Many features expected in systematic reviews were present in Campbell reviews most of the time. Methodological quality and reporting completeness were both significantly higher in reviews published after the introduction of MECCIR in 2014 compared with those published before. However, this may also reflect general improvement in the reporting the methodology of systematic reviews over time or associations with other characteristics which were not assessed such as funding or experience of teams. Reviews co‐registered with Cochrane were of higher methodological quality and more complete reporting than reviews only registered in Campbell.

the methodological quality and completeness of reporting of each included review.
For methodological quality, we used the AMSTAR 2 (A MeaSurement Tool to Assess systematic Reviews 2) instrument; for reporting completeness we used the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklist.
We rated reporting as either complete/partial or not reported. We described characteristics of the included reviews with frequencies and percentages, and median with interquartile ranges (IQRs). We used Stata version 12.0 to conduct multiple linear regressions for continuous data and the ordered logistic regressions for ordered data to investigate associations between prespecified factors and both methodological quality and completeness of reporting.
Twenty-five of 96 reviews (26%) completely or partially reported all 27 PRISMA checklist items. The median number of items reported across reviews was 25 (IQR,(22)(23)(24)(25)(26). Reviews published after the release of MECCIR reported 2.80 more items than those published before MECCIR (95% CI [1.74, 3.88], p < .001); reviews coregistered on Campbell and Cochrane reported 1.98 more items than reviews only registered in Campbell (95% CI [0.72,3.24], p = .003). An increasing trend over time was observed for both the percentage of high and moderate methodological quality of reviews and the median number of PRISMA items reported.
Authors' Conclusions: Many features expected in systematic reviews were present in Campbell reviews most of the time. Methodological quality and reporting completeness were both significantly higher in reviews published after the introduction of MECCIR in 2014 compared with those published before. However, this may also reflect general improvement in the reporting the methodology of systematic reviews over time or associations with other characteristics which were not assessed such as funding or experience of teams. Reviews co-registered with Cochrane were of higher methodological quality and more complete reporting than reviews only registered in

| What is this review about?
The Campbell Collaboration undertakes systematic reviews of the effects of social and economic policies to help policymakers, practitioners, and the public to make well-informed decisions about policy interventions. There has not been a comprehensive review of the methods and reporting characteristics of Campbell reviews. Nor has the methodological quality and completeness of reporting of Campbell reviews been assessed. These factors, which are assessed in this review, are important to ensure the transparency, reliability, and usability of Campbell systematic reviews.

| What is the aim of this review?
We collected information about the epidemiological, methodological, and reporting characteristics of Campbell reviews of the effects of policies published between 2001 and 2018, and assessed their methodological quality and completeness of reporting. We also assessed whether the methodological quality and completeness of reporting were better following the release of the MECCIR standards and in reviews that were co-registered with Cochrane.

| What studies are included?
This review included 96 Campbell reviews that evaluated the effectiveness of any intervention. Forty-six were published between January 2011 and September 2014 and 50 published between October 2014 and January 2018. Twenty-two of the 96 reviews were co-registered with Cochrane.
1.1.5 | What are the main findings of this review?
Fifty-nine percent of the Campbell reviews were of high or moderate quality based on the AMSTAR 2. Twenty-five of 96 (26%) Campbell reviews completely or partially reported all 27 PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklist items. The median number of items reported across reviews was 25 (interquartile range [IQR],[22][23][24][25][26]. The methodological quality of Campbell reviews improved after the introduction of MECCIR, which might reflect a trend of improving quality over time. Seventy-six percent of reviews published after the introduction of MECCIR had high or moderate methodological quality (AMSTAR 2) compared with 39% of reviews published before MECCIR.
Campbell reviews co-registered with Cochrane had better methodological quality compared with those only registered on Campbell. Seventy-seven percent of reviews registered in both Campbell and Cochrane were of high or moderate quality compared with 53% of reviews only registered with Campbell.

| What do the findings of this review mean?
Campbell reviews are generally well reported: 59% of the included reviews are of moderate to high methodological quality, according to PRISMA and AMSTAR 2 standards for systematic reviews. Campbell reviews published after MECCIR were of better quality than pre-MECCIR, and reviews co-registered in Cochrane scored better on quality indices than reviews only registered in Campbell.
This report may be used as a baseline for assessing effects of implementing strategies to address limitations identified by this review (such as interpreting risk of bias, reporting funding of individual studies, explaining the selection of study designs, citing the prepublished protocol as well as other items). Future studies may be useful to explore more detail on specific methods such as search strategies and economic considerations.

| How up-to-date is this review?
This review included all the Campbell effectiveness reviews published between January 2011 and January 2018.

| BACKGROUND
Systematic reviews aim to summarize "the best available research on a specific question by synthesizing the results of several studies" (Campbell Collaboration, 2018). They use transparent procedures to find, evaluate, and synthesize the results of relevant research whilst minimizing bias.
They are increasingly popular across a wide range of sectors to inform policy and practice. Systematic reviews can support policymakers to develop evidence-informed policy and help practitioners to keep WANG ET AL.

| 3 of 23
up-to-date with relevant content knowledge (IOM, 2011;Oliver, 2015). In addition, granting agencies increasingly require the use of systematic reviews to justify new research. The trustworthiness of a systematic review is dependent upon the extent to which the review authors conducted the review using robust methods and the quality of reporting of the methods of the review (Steffen, 2010). Moher et al. (2007) and Page et al. (2016) demonstrated poor conduct and highly variable reporting of systematic reviews in health-related fields; they found only 7% of the included systematic reviews searched for unpublished data, less than half assessed the risk of publication bias and that the completeness of reporting was highly variable (Page et al., 2016). In the social sciences, Schalken and Rietbergen (2017) found reporting of quantitative systematic reviews in the field of industrial and organizational psychology was poor, for example, with only 1.7% reviews reporting assessment of individual study quality.
The Campbell Collaboration was established in 2001 to promote positive social and economic change through conducting and disseminating systematic reviews and other evidence syntheses. It undertakes systematic reviews to help policymakers, practitioners, and the public to make well-informed decisions about policy interventions (V. Welch, 2018). The Campbell Collaboration has made several efforts to promote and improve its work. In 2010, Cochrane and the Campbell Collaboration developed a voluntary co-registration policy under the rationale to make full use of the shared interests and diverse expertise from different review groups within these two organizations, avoid unnecessary duplication of effort by producing a single set of documents, and enable the availability to a wider audience (Campbell Systematic Reviews, 2010). This policy requires a well-coordinated editorial process to produce reviews that meet more than one group's standards.
In order to promote the methodological quality and transparency of Campbell reviews of intervention effects, the MECCIR were introduced in 2014 (Chandler et al., 2017). Campbell review teams are strongly encouraged to conduct and report reviews following MECCIR. When submitting the protocol and completed review, reviewers are expected to provide a checklist affirming the adherence of MECCIR standards to the respective editor.
To date, we have little knowledge about the overall quality of methods and reporting of Campbell reviews. As such we conducted this study to investigate these aspects of Campbell reviews as well as to assess their methodological quality and completeness of reporting.

| OBJECTIVES
The review has three main objectives: 1. To collect the methodological and reporting characteristics of Campbell reviews.
2. To assess the methodological quality and reporting completeness of Campbell reviews.

To determine the influence of MECCIR and co-registration with
Cochrane on the methodological quality and completeness of reporting of Campbell reviews.

| Title registration and protocol of the systematic review
The title registration and the protocol (Wang et al., 2019) for this systematic review were published in Campbell Systematic Reviews on 22 January 2019.

| Criteria for considering studies for this review
We included completed Campbell intervention reviews published from January 2011 to January 2018. If a review had been updated, we selected the most recent version for inclusion. For reviews with additional objectives other than effectiveness (e.g., cost-effectiveness), we only used data on intervention effectiveness. To explore how the MECCIR standards may have influenced the methodology and reporting of Campbell reviews, we compared all Campbell reviews published before the introduction of MECCIR (from January 2011 to September 2014) and all the Campbell reviews published after the introduction MECCIR (from October 2014 to January 2018). We did not consider a language restriction since all Campbell reviews are all published in English.

| Search methods for identification of studies
We entered the Campbell Systematic Reviews portal through the official website of the Campbell Collaboration. We limited the search from 1 January 2011 to 31 January 2018 and downloaded the completed reviews from all review groups.

| Selection of studies
One researcher downloaded and screened all of the records to identify intervention effectiveness reviews (i.e., a review evaluating the effects of a social or policy intervention) and exclude nonintervention reviews based on reviews' title and abstract. A second researcher independently reviewed the full text of all the excluded records to confirm the exclusions. In the case of discrepancies, the two researchers arrived at the final decision through discussion.

| Data extraction and management
Four reviewers independently pilot tested the data extraction form.
All reviewers abstracted two reviews in each pilot extraction and a third reviewer (X. W.) checked all the data for accuracy and consulted with a fourth reviewer (J. M. G.) where necessary. For the remaining extractions, two pairs of reviewers independently extracted data; discrepancies were resolved via discussion or adjudication by a third reviewer (X. W.).
We developed the abstraction form to collect information about methods and reporting of included reviews based on items from mandatory reporting items for methods and results from the MECCIR reporting standards Version 1.1 (MECCIR, 2017) and additional review characteristics used in a similar methodological study of reporting quality (Page et al., 2016).
We used Microsoft Excel 2018 to collect data on the following data extraction items: Quality assessment Methodological quality. We used A MeaSurement Tool to Assess systematic Reviews 2 (AMSTAR 2) instrument (Shea et al., 2017) to evaluate the methodological quality of the included reviews. Four reviewers independently pilot tested the assessment tool. AMSTAR 2 contains 16 items (Box 1), including seven identified by the tool's authors as "critical" (meaning weaknesses in these items are critical and should reduce confidence in the findings of a review). For the assessment of the item related to protocol (i.e., item 2: establish the protocol in advance and justify the significant deviations from the protocol), we referred to the protocol of the review, if available. The individual items were categorized as yes, partial yes, no. For domains (i.e., domains 11, 12, and 15) concerning metaanalysis, a "no meta-analysis conducted" response was added for reviews not pooling data from individual studies (i.e. included no/insufficient studies, outcomes too variable to combine). A "not applicable" response was added for items 8-10 and 13-14 for reviews with no included studies (i.e., empty reviews) (Appendix A).
The overall rating of confidence of each review was clarified as high, moderate, low, critically low considering the seven critical items

Completeness of reporting
We used the PRISMA checklist Liberati et al., 2009) to evaluate the reporting quality of the included reviews. For each review, we judged each PRISMA item as "completely reported" or "not reported." In addition, some PRISMA items had multiple components. For these items, we added an additional category of "partially reported" to indicate when only some of the components were reported. For example, a "partial yes" was given if a review only mentioned working from a protocol but did not say where to access the protocol, while a "yes" was given reported working from a protocol and provided the access (e.g., citation).

| Data synthesis
We summarized extracted data and quality assessments as frequencies and percentages for dichotomous data and median and IQR for continuous data. For methodological quality scores, we summarized the proportion of reviews of each rating category. For reporting completeness, we counted the number of PRISMA items reported in each review and report the median and IQR.
To explore whether AMSTAR 2 methodological quality ratings were influenced by MECCIR or co-registration with Cochrane, we used ordered logistic regression to test the association between quality and the aforementioned factors, and reported associations using odds ratio and 95% confidence intervals. We excluded the "no meta-analysis" and "not applicable" reviews when rating specific items.
To determine whether the number of PRISMA items reported in reviews was influenced by the introduction of MECCIR (pre-and post-September 2014) and by co-registration with Cochrane, we accumulated the overall number of PRISMA items reported: "1" if completely or not applicable, "0.5" if partially reported, and "0" if not reported. Then we used the multiple linear regression and report coefficients and 95% confidence interval.
All the analyses were conducted using Stata version 12.0.

| Differences between protocol and review
We planned to conduct subgroup analysis according to the introduction of MECCIR and co-registration in our protocol. Instead, we used the logistic regressions to examine the potential interaction between these two factors.

| Results of the search
We ran the search in January 31, 2018 and identified 98 full Campbell reviews published since 2011. After reading all the full text, we included 96 reviews and excluded two non-interventional reviews. The screening flow chart can be found in Appendix C. Reviews in our sample included a median of 18 (IQR, studies. Ninety-three reviews (97%) reviews considered experimental studies (i.e., randomized controlled trial [RCT], Quasi-RCTs or other controlled experimental studies) and 13 (14%) considered observational studies as eligible study design ( Table 1).

| Included studies
Citations of the included reviews are available in supporting information Appendix D.

| Excluded studies
We excluded two non-intervention reviews (see "Characteristics of excluded studies" section). Box 2. The overall rating of confidence of each review according to AMSTAR 2 High No more than one non-critical weakness: the systematic review provides an accurate and comprehensive summary of the results of the available studies that address the question of interest

Moderate
More than one non-critical weakness: the systematic review has more than one weakness but no critical flaws. It may provide an accurate summary of the results of the available studies that were included in the review

Low
One critical flaw with or without non-critical weaknesses: the review has a critical flaw and may not provide an accurate and comprehensive summary of the available studies that address the question of interest

Critically low
More than one critical flaw with or without non-critical weaknesses: the review has more than one critical flaw and should not be relied on to provide an accurate and comprehensive summary of the available studies Methods and reporting characteristics of included reviews (Table 2) Protocol. Protocol was mentioned in 85 (89%) of included reviews, however, only 54 (56%) clearly cited the protocol in the completed review, 31 (32%) stated that they worked from a protocol but did not provide reference of the protocol (e.g., "XXX and XXX contributed to the development of this protocol"). Nine of the 11 reviews that did not mention a protocol had one available in the Campbell Library.
Title and abstract. Sixty-four (67%) reviews were identified as a "systematic review" in the title, including four that described as a systematic review and meta-analyses. All reviews had a structured abstract.
Methods. All included reviews reported the eligibility criteria according to the PICOS (population, intervention, comparison, outcome, setting) framework. All reviews included outcomes about the potential benefits of the intervention but only 33 (34%) reported potential harms. Most reviews (93 (97%)) specified the eligible study designs; 80 (83%) indicated that only experimental studies were eligible, and 13 (14%) specified that observational studies were also eligible. Fifty-four (56%) reviews stated they would include published and unpublished studies. Seventeen (10%) reviews considered studies published in specific languages (English with or without other specific languages), and 34 (35%) indicated no limitations on language.
All reviews listed the information sources, while the details of the search strategy varied. For example, the years of search coverage were reported completely (i.e., both start and end dates were reported for all databases) in 53 (55%) of the reviews. Eighty-five (89%) reviews reported the full Boolean search logic while some reviews only described free text words.
For the process of screening, extraction, and risk of bias assessment, 77 (80%), 89 (93%), and 60 (63%) reported the methods used, respectively. For statistical methods, 89 (93%) reviews reported their analysis plan and 77 (80%) described the methods for assessing statistical heterogeneity. Sixty-four (67%) reviews reported their proposed methods to assess for publication bias and 81 (84%) reviews described planned approaches for additional analyses.
Results. Eighty-seven (91%) reviews included a study flow diagram detailing the results of the screening process; 13 (14%) referred to T A B L E 1 Epidemiological characteristics of the included studies (96 reviews)
Discussion and other information. The main findings and the strength of the evidence were summarized in the discussion section of 85 (89%) reviews. The relevance of the main findings to key stakeholders, such as policymakers, planners, practitioners (e.g., police, teachers, social workers) were mentioned in 92 (96%) reviews, and implications for future research were described in 93 (97%) reviews. Eighty-eight (91%) reviews reported limitations, but only 59 (61%) described limitations at both the study and review level and limitations at the review level were mentioned less frequently than limitations of included studies.

Methodological quality assessment
Nine out of 16 AMSTAR 2 items were completely or partially addressed in more than 80% of the reviews (Figure 1). Less than 70% of reviews addressed five methodological items: • Item 4: "Whether review authors explain their selection of the study designs for inclusion in the review?" (37, 39%).
• Item 10: "Whether review authors report on the sources of funding for the studies included in the review?" (14, 15%).
• Item 12: "If meta-analysis was performed, whether review authors assess the potential impact of risk of bias in individual studies on the results of the meta-analysis or other evidence synthesis?" (32, 33%).
• Item 14: "Did the review authors provide a satisfactory explanation for, and discussion of, any heterogeneity observed in the results of the review?" (58, 60%).
• Item 15: "If they performed quantitative synthesis, whether review authors carry out an adequate investigation of publication bias For the methodological quality, 16 (17%) reviews were rated as high quality, 40 (42%) as moderate, 24 (25%) as low and 16 (17%) as critical low. The commonest AMSTAR 2 critical items not addressed were the consideration of ROB when interpreting the results of the review and assessing the presence and impact of publication bias. The percentages of high or moderate quality reviews were 76% versus 39% between postand pre-MECCIR (Table 3) and 77% versus 53% between Cochrane and Campbell co-registered versus Campbell only registered.
We also graphed the percentage of all Campbell reviews that demonstrated in each quality level, by publication year (Figure 4). This showed overall improvement by time, especially after 2012.
We also explored factors associated with ratings on the specific items of AMSTAR 2 using ordered logistic regression or ordinary logistic regression. When adjusting introduction of MECCIR or not, Campbell reviews that were co-registered with Cochrane were more likely to be scored as meeting the full criteria for items about the protocol, selection in duplicate, risk of bias, sources of funding for the included studies, account for risk of bias and explanation for heterogeneity (Table 5a). When adjusting the co-registration or not, reviews published after MECCIR were more likely to be scored F I G U R E 1 AMSTAR assessment of each item. NA, not applicable (for reviews which included no studies); NoMA, no meta-analysis T A B L E 3 Overall confidence of AMSTAR 2 assessment (96) (overall, and according to pre-/post-MECCIR and whether co-registered with Cochrane)

Publication date Registration
Overall confidence All as fully meeting the criteria for items about the protocol, rationale for selection of study design, risk of bias, and explanation for risk of bias (Table 5b). Fourteen and eight co-registered reviews were published before MECCIR and after MECCIR, respectively.

Completeness of reporting
Based on multiple linear regression analysis, reviews published after the release of MECCIR reported 2.80 items more than those published before MECCIR (95% CI [1.74, 3.88], p < .001). Reviews coregistered with Campbell and Cochrane reported 1.98 more PRISMA items than reviews only registered with Campbell (95% CI [0.72, 3.24], p = .002) ( Table 6).
We also graphed the median number of PRISMA items reported by publication year (Figure 3), which showed a steady improvement in reporting completeness from 2012 to 2015.

| Summary of main results
We evaluated the methodological and reporting characteristics of 96 Campbell reviews: forty-six were published before MECCIR and 50 published after MECCIR standards were introduced. Twenty-two reviews were co-registered with Cochrane. Over 90% of these reviews were carried out by researchers from high-income countries; the remaining two reviews were from South Africa (upper-middle income country).
When considering reporting, a median of 25 out of the 27 PRISMA items were reported. Since all of PRISMA items are included in MECCIR, except for the title, it is not surprising that these items were well-reported. For specific characteristics, 33% of reviews did not report the title as a systematic review, this might be because all the Campbell reviews were in Campbell Systematic Reviews and report on whether unpublished studies were included in the review, despite its relevance for systematic reviews (Trespidi et al., 2011).
According to AMSTAR 2, 59% of the Campbell reviews were rated as high or moderate quality. We identified several areas for improvement in methods including clarification of the rationale for including specific study design, technique for assessing the risk of bias, the sources of funding for the included studies, and assessment of the potential impact of ROB in individual studies on the evidence synthesis. Whilst MECCIR recommends risk of bias assessment for RCT using the Cochrane risk of bias tool (Higgins et al., 2011), it provides no guidance for assessing ROB in non-RCT (MECCIR 2017). Since many Campbell reviews included study designs other than RCTs, Campbell review authors may need to pay more the attention to assessing ROB-related standards for different study designs. AMSTAR 2.0 includes two new items (funding sources of included studies and impact of ROB in individual studies); these were only achieved in 15% and 24% of the reviews, a relatively low proportion compared with the other items. This might imply that these methodological issues did not receive enough attention before (Shea et al., 2017).
Both methodological quality and completeness of reporting were better after the introduction of MECCIR, possibly because MECCIR clarified expectations for authors and editors and provided structured tools for assessing the reporting and methodology of reviews which could be used by both editors and authors. These improvements may also reflect general improvements in the reporting of systematic reviews over time due to external factors such as increasing awareness of the importance of transparency and reproducibility.
Our findings also showed that reviews co-registered with Cochrane were of better methodological quality and had more complete reporting than those only registered with Campbell. This is possibly because of the impact of the coordinated editorial and peer review processes taking place under the collaboration of Campbell and Cochrane.

| Overall completeness and applicability of evidence
This methodological review includes all Campbell intervention reviews published from January 2011 to January 2018.
The evidence provided here will help future Campbell reviews (and potentially non-Campbell Reviews) address known deficiencies in methodological quality and completeness of reporting of policy reviews. As we did not include non-intervention reviews, we do not know whether they have similar characteristics and cannot comment on the applicability of these findings to non-intervention reviews.
We also did not aim to compare Campbell reviews with non-Campbell reviews in this study. As such we are unable to comment on the similarity and differences in methodological and reporting characteristics of non-Campbell reviews in the social sciences.

| Potential biases in the review process
To ensure the reliability of the data extraction and quality assessment, we appraised the studies in duplicate, and a third reviewer was consulted if there were discrepancies. The observed improvement in methodological quality and completeness of reporting following the introduction of MECCIR may be confounded by general improvements in these over time Also, there may be other characteristics driving the effects seen post-MECCIR. For example, the author teams might be more experienced, the reviews may have had more funding and therefore had more resources to prepare their reports, and more Campbell resources may have been available. Future studies are needed to analyze the potential impact of these factors. Also, for the completeness of reporting assessments, we considered partially reported items as 0.5, while some items may have more than two components and the importance of these components may be different to each other. Therefore our estimates of reporting completeness may not reflect the potential range of reporting completeness across Campbell reviews. Other studies have similar problems (Page & Moher, 2017 (40) 12 (28) 14 (33) 28 (58) 17 (35) 3 (6) 4.41 .  reported in 23% of reviews (Schalken & Rietbergen, 2017).
The quality of some AMSTAR items was low including explaining the rationale for the study designs for inclusion, selecting the individual studies in duplicate, stating the sources of funding for included studies, assessing potential impact of risk of bias of individual studies on meta-analyses, and investigating publication bias. Similar findings were also observed among other systematic reviews (Almeida et al., 2019;Anaya et al., 2019;Cortese et al., 2019;Pussegoda et al., 2017;Yan et al., 2019).
When compared to 45 Cochrane reviews evaluated in a similar methodological study by Page et al. (2016)

| AUTHORS' CONCLUSIONS
Our study provides comprehensive and valuable information about systematic reviews across the social sciences. First, we included Campbell reviews without restriction on the research fields, reviews on Social Welfare, International Development, Education and Crime and Justice were all included. Second, we examined both the methodological and reporting quality, which is informative and valuable for researchers and for the Campbell Collaboration to make improvements in the future work, toward providing more transparent and reliable evidence to policymakers, practitioners, and the public in the future.
The methodological quality of Campbell reviews was good when compared to other reviews (Almeida et al., 2019;Anaya et al., 2019;Cortese et al., 2019;Page & Moher, 2017;Pussegoda et al., 2017;Yan et al., 2019). AMSTAR 2 ratings show that 59% of Campbell reviews were of high or moderate quality, and the number of reviews of low and critical low quality was decreasing by time. improvement might also be related with some other factors, such as time.

| Implications for practice
Our study could help Campbell reviewers, peer reviewers, and editors be aware of areas for improvement, in particular reporting the search dates, citing the protocol and reporting the deviations from protocol, and providing the characteristics of the included studies.
These findings may also be useful for refining the Campbell Revman

| Implications for research
This review provides an initial baseline assessment of overall reporting characteristics and methodological quality of Campbell reviews. Future studies may be useful to appraise specific processes in more detail such as search strategy development and how to consider economic evaluation. 2. Did the report of the review contain an explicit statement that the review methods were established prior to the conduct of the review and did the report justify any significant deviations from the protocol? * For Partial Yes: For Yes: ▯ Yes The authors state that they had a written protocol or guide that included ALL the following: As for partial yes, plus the protocol should be registered and should also have specified: ▯ review question ( 6. Did the review authors perform data extraction in duplicate? For Yes, either ONE of the following: ▯ at least two reviewers achieved consensus on which data to extract from included studies ▯ Yes ▯ OR two reviewers extracted data from a sample of eligible studies and achieved good agreement (at least 80 percent), with the remainder extracted by one reviewer. 10. Did the review authors report on the sources of funding for the studies included in the review? For Yes ▯ Yes ▯ Must have reported on the sources of funding for individual studies included in the review. Note: Reporting that the reviewers looked for this information but it was not reported by study authors also qualifies ▯ No 11. If meta-analysis was performed did the review authors use appropriate methods for statistical combination of results? * RCTs ▯ Yes For Yes: ▯ No ▯ The authors justified combining the data in a meta-analysis ▯ No metaanalysis conducted ▯ AND they used an appropriate weighted technique to combine study results and adjusted for heterogeneity if present.
▯ AND investigated the causes of any heterogeneity ▯ Not applicable (no study included) # NRSI For Yes: ▯ The authors justified combining the data in a meta-analysis ▯ AND they used an appropriate weighted technique to combine study results, adjusting for heterogeneity if present ▯ Yes ▯ AND they statistically combined effect estimates from NRSI that were adjusted for confounding, rather than combining raw data, or justified combining raw data when adjusted effect estimates were not available

▯ No
▯ AND they reported separate summary estimates for RCTs and NRSI separately when both were included in the review ▯ No metaanalysis ▯ Not applicable (no study included) # 12. If meta-analysis was performed, did the review authors assess the potential impact of RoB in individual studies on the results of the meta-analysis or other evidence synthesis? If a systematic review and meta-analysis planned to do the analysis but ended up not doing because of some technical issue (e.g. no enough included studies), we chose Yes. For Yes: ▯ Yes ▯ included only low risk of bias RCTs ▯ No ▯ OR, if the pooled estimate was based on RCTs and/or NRSI at variable RoB, the authors performed analyses to investigate possible impact of RoB on summary estimates of effect.
▯ No metaanalysis ▯ Not applicable (no study included) # 13. Did the review authors account for RoB in individual studies when interpreting/discussing the results of the review? * For Yes: ▯ Yes ▯ included only low risk of bias RCTs ▯ No ▯ OR, if RCTs with moderate or high RoB, or NRSI were included the review provided a discussion of the likely impact of RoB on the results ▯ Not applicable (no study included) # 14. Did the review authors provide a satisfactory explanation for, and discussion of, any heterogeneity observed in the results of the review? For Yes: ▯ Yes ▯ There was no significant heterogeneity in the results ▯ No ▯ OR if heterogeneity was present the authors performed an investigation of sources of any heterogeneity in the results and discussed the impact of this on the results of the review ▯ Not applicable (no study included) # 15. If they performed quantitative synthesis, did the review authors carry out an adequate investigation of publication bias and discuss its likely impact on the results of the review? * For Yes: ▯ Yes ▯ No ▯ performed graphical or statistical tests for publication bias and discussed the likelihood and magnitude of impact of publication bias ▯ No metaanalysis conducted ▯ Not applicable (no study included) # 16. Did the review authors report any potential sources of conflict of interest, including any funding they received for conducting the review? For Yes: ▯ Yes ▯ The authors reported no competing interests OR ▯ The authors described their funding sources and how they managed potential conflicts of interest ▯ No * Critical items. # We added an option as "Not applicable" for these items when no study was included in a review.