Abbreviated and comprehensive literature searches led to identical or very similar effect estimates: a meta-epidemiological study

Objectives: The objective of this study was to assess the agreement of treatment effect estimates from meta-analyses based on abbreviated or comprehensive literature searches. Study Design and Setting: This was a meta-epidemiological study. We abbreviated 47 comprehensive Cochrane review searches and searched MEDLINE/Embase/CENTRAL alone, in combination, with/without checking references (658 new searches). We compared one meta-analysis from each review with recalculated ones based on abbreviated searches. Results: The 47 original meta-analyses included 444 trials (median 6 per review [interquartile range (IQR) Conclusion: Abbreviated literature searches often led to identical or very similar effect estimates as comprehensive searches with slightly increased conﬁdence intervals. Relevant deviations may occur. (cid:1) 2020 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
A ''systematic review attempts to collate all empirical evidence that meets prespecified eligibility criteria to answer a specific research question'' [1]. To achieve this objective, the current paradigm is to conduct searches in several electronic databases and other literature sources. The Methodological Expectations of Cochrane Intervention Reviews (MECIR) states that ''searches for studies should be as extensive as possible'' [2], and Cochrane recommends to search at least MEDLINE, Embase, and Cochrane Central Register of Controlled Trials (CENTRAL), as well as additional sources (e.g., checking reference lists, gray literature) 2e4. Methodological standards by health technology agencies (such as NICE [5], IQWIG [6], EUnetHTA [7], AHRQ [8]) have similar recommendations. However, it is unlikely that even highly comprehensive search strategies in electronic databases, as the ones conducted in Cochrane reviews, find all the relevant evidence: Large proportions of clinical trial results are never published and they have systematically different, typically more ''negative'' findings than those that can be identified through extensive literature searches [9,10]. Thorough searches in other sources (e.g., trial registries, conference abstracts, or regulatory reports) might help to identify some of this hidden evidence. Yet, relevant evidence is likely to still be systematically missed [11].
As even the most comprehensive search strategies will not be able to detect all relevant studies, the question remains how much effort should be placed into literature searches of electronic databases. In situations where several trials are already found, randomly missing some would, on average, reduce the precision of the treatment effect estimates but would not systematically change the point estimates or bias the results [12]. Some systematic reviews, and especially rapid reviews, abbreviate their searches by limiting the data sources (e.g., the number of literature databases, gray literature, contacting experts) or the specifications of the search (e.g., publication year, study type, language). Presumably, such abbreviated searches do not retrieve all available evidence [13e17].
Previous metaresearch (which we identified by using the related-article function in PubMed and citation tracking of selected key literature; not identified in a comprehensive systematic search) aimed to determine which and how many data sources should be searched in systematic reviews and mainly investigated ''recall'' (sensitivity) of the search strategies [18e21]. This is often defined as the proportion of relevant studies retrieved by the search among the number of all relevant studies in a single database, or that can be found with a ''gold standard'', for example, by using multiple databases, searching gray literature, or using other search techniques (e.g., contacting experts working in the research field) [6,22]. Various studies that assessed recall of different search strategies concluded that multiple databases should be searched when doing a systematic review [18e21]. However, studies included in systematic reviews have varying contributions to the summary of findings [23]. It is plausible to assume that more impactful, larger studies are published in more prestigious journals that can be found in typical databases such as PubMed. They are probably more prominently discussed and cited in related research. As a consequence, such large studies might be easier to find than smaller studies even when simpler or abbreviated searches are used. Hence, finding or not finding a study when using abbreviated searches may not have such a great impact on the effect estimates and with that on health care decision-making. A metric such as recall, which is just relying on the mere number of studies found or missed, does not reflect the impact that missed studies may (or may not) have on the treatment effect estimates of evidence syntheses that serve as the basis of health care decision-making.
To our knowledge, there is only one other meta-epidemiological analysis of abbreviated searches that assessed this impact on effect estimates: In a simulation study, Marshall et al. aimed to determine the impact of a search in only PubMed (largest component of PubMed: MEDLINE) on more than 2,500 Cochrane reviews [24]. They found that summary odds ratios (ORs) of binary outcomes would not change in 71% of the reviews and that in only about 10% of the reviews, the treatment effects would differ by more than 20% on the OR scale, which they considered a moderate change. This study assumed that a search in PubMed would identify all studies that are indexed in PubMed (i.e., a 100% database recall), regardless of the search strategy [24]. However, because of differences in keywords, search syntax, and database structures, the efficiency of search strategies often varies across data sources [22,25]. Similarly, Hartling et al. [26], who aimed to assess the impact of searching only a few of the databases on the results of meta-analyses in Cochrane reviews, also used database indexation as proxy for finding relevant studies. Although they found that limiting the number of databases most often did not change the results of meta-analyses [26],

Key findings
What this adds to what is known?
Searching multiple data sources may increase the number of studies, study participants, and observed events contributing to meta-analyses, but abbreviated literature searches often give identical or very similar treatment effect estimates.
Deviations from results based on comprehensive searches that may be relevant for clinical decision-making may occur with any abbreviated search approach.
What is the implication and what should change now?
When performing a systematic review, searching at least two databases reduces the risk of missing information that could impact the treatment effect estimate.
What this study adds to the existing literature?
This study looks at the effect of abbreviated searches on treatment effect estimates and their precision.
a more realistic evaluation of the impact of abbreviating searches would require the replication of searches of systematic reviews. Our aim was to assess how treatment effect estimates of meta-analyses of the main outcomes of a random sample of Cochrane reviews would change if an abbreviated variant of the comprehensive search strategy was applied.

Overall methodology
Details of the rationale and design are described in the project protocol [27]. This article emerges from a twopart project. In the first part [28], we assessed the impact of abbreviating literature searches on Cochrane review authors' conclusions. For that, we searched the Cochrane Library with the search terms ''quality of evidence'' OR ''summary of findings''. We randomly selected 60 Cochrane reviews from the fields of mental health, osteoarthritis, chronic respiratory diseases, cardiovascular diseases, and cerebrovascular diseases that were published between 2012 and 2016. The reviews had to fulfill these eligibility criteria: (1) authors were able to draw a conclusion, (2) a summary of findings table and data to reproduce meta-analysis were present, and (3) the literature searches in each database were conducted in 2012 or later and reported in enough detail to reproduce them. We replicated each review's searches for the three most frequently used biomedical literature databases: MEDLINE (via Ovid MEDLINE or PubMed in accordance with original search strategy), Embase (via Embase.com), and CENTRAL (Cochrane Central Register of Controlled Trials via the Cochrane Library). These database searches as well as reference list searching had all been conducted in the original reviews. Replicating these searches and recombining them resulted in 60 times 14 (i.e., 840) abbreviated searches: MEDLINE, Embase, or CENTRAL only; MED-LINE and Embase, MEDLINE and CENTRAL, CEN-TRAL and Embase; MEDLINE, Embase, and CENTRAL; and each of these combined with a search of the included studies' reference lists. We restricted Ovid MEDLINE and PubMed searches to MEDLINE records only and excluded MEDLINE records from the Embase search results to ensure a clear distinction between the different databases and interfaces. For the reference list searches, we first identified all eligible references from the replicated database searches in Scopus. We then exported the reference lists to check for any references to studies included in the original review that had not been identified by the database searches.
We checked the original search strategies and corrected minor errors in spelling, syntax, and operators in a few cases before we reran the searches. As we only included Cochrane reviews, we assumed the respective quality assessments of the primary studies were well conducted. We hence did not conduct our own formal assessments. Risk of bias assessment of the primary studies can be found within each of the included Cochrane reviews (Supplementary material 2).

Data selection
For this second part of the project (which is based on the methods and data of the first part), we determined which of the 60 Cochrane reviews reported at least one primary or secondary outcome in the main summary of findings table that was a binary outcome (i.e., event-based, reporting an OR, risk ratio, or hazard ratio (Supplementary material 2). This excluded 13 reviews (references in Supplementary material 1). If several pertinent outcomes were reported in one review, we chose the one with the largest number of studies contributing to it or, if there were still several, the one mentioned first in the summary of findings table (Supplementary material 2).

Data extraction
From each Cochrane review, we extracted bibliographic information and crude event data or effect estimates of the studies combined in the selected meta-analysis. When we conducted the 47 times 14 (i.e., 658) abbreviated searches, we checked which of the studies that were originally included in the Cochrane review would also have been identified by the abbreviated search at the time of the original search.

Meta-analyses based on original and abbreviated searches
We recalculated each of the 47 original treatment effect estimations as summary odds ratios (sORs) using the crude event data or effect estimates of primary studies as reported in the Cochrane reviews. For consistency, we used this recalculation as reference for each of the following metaepidemiological analyses (instead of the reported summary effect from the summary of findings table that may be derived with diverse meta-analytical methods). In each of these meta-analyses, we then left out study results that could not be found by abbreviating searches. We used DerSimonian-Laird random-effects meta-analyses for our main analysis and the Hartung-Knapp-Sidik-Jonkman method and the fixed-effect approach in sensitivity analyses [29,30]. In two reviews, no event data that we could have meta-analyzed were reported [31,32]. Instead, we relied on reported relative risks or hazard ratios and deemed these as close approximations of ORs (which we found reasonable when event rates are 10% or less [33] as in our sample).

Meta-epidemiological analyses
First, we quantified the loss of information that abbreviated searches caused compared with the original comprehensive search strategies by determining the median number of trials, events, and participants lost per metaanalysis. We also reported the corresponding proportions of the number of trials, events, and participants that were obtained compared with the comprehensive search.
Second, we assessed how often the treatment effect estimates (sOR) based on each abbreviated search and the estimates based on the comprehensive search (1) were identical (i.e., exactly the same effect estimate and confidence intervals); (2) had a point estimate in the same direction and the same level of statistical significance (i.e., both 95% confidence intervals crossed or did not cross the null [OR 5 1]); (3) had a point estimate in the same direction but differed in the level of statistical significance (i.e., gain or loss of statistical significance); (4) we also assessed how often the point estimates pointed in different directions and had a confidence interval indicating a statistically significant effect (e.g., the original metaanalysis indicates benefit of the experimental treatment, but the abbreviated search leads to a meta-analysis indicating harm or vice versa) and (5) how often the treatment effect estimates could not be calculated anymore (i.e., the abbreviated search did not retrieve any of the trials included in the Cochrane review or only trials with zero events). We deemed the situations (4) and (5) most critical as we would expect that these have most impact on decision-making.
Third, we measured the absolute deviation of treatment effect estimates based on the comprehensive search and the estimates from each abbreviated search (i.e., the absolute difference between the two sORs on the log-scale; reported here after back transformation to the OR scale). The estimated absolute deviation is positive by definition and reported as x-fold deviation on the OR scale. For example, the absolute deviation would be 1.25-fold when the estimate based on a comprehensive search is an OR 5 1 and the abbreviated search gives an OR 5 0.8 or OR 5 1.25. Per abbreviated search, we summarized the absolute deviation across all reviews by providing the median, interquartile range (IQR), and range. The median absolute deviation reflects the deviation observed in at least 50% of the reviews, and the upper limit of the IQR reflects the deviation in at least 75% of the reviews.
Fourth, we quantified the change of treatment effect size that arose from abbreviated searches when compared with the comprehensive searches and illustrated which of the two search approaches resulted in more favorable treatment effect estimates for the experimental treatment assessed in each Cochrane review. We assumed that the second of the two treatments compared in Cochrane reviews is the control treatment (this was in 28 of the 47 reviews (60%) with placebo, usual care, or no treatment; 19 (40%) had an active comparator). We tested any impact of the order of comparators on the ratio of odds ratios (RORs) in a sensitivity analysis leaving out the reviews with active or mixed comparisons. We coined two positive outcomes and the corresponding metrics so that a sOR !1 consistently indicates more favorable effects with the experimental treatment (e.g., a sOR for survival of 1.25 became a sOR of 0.8 for mortality). We calculated the RORs by subtracting the sOR based on the comprehensive search from the sOR of the respective abbreviated search on log-scale and back transformed them on the OR scale [34]. An ROR greater than 1 indicates that the abbreviated search measured a more favorable result of the treatment than the comprehensive search.
Finally, we explored how the precision of the estimates changed by calculating the ratio of standard errors per meta-analysis. Values O 1 indicate larger standard errors (less precision) with abbreviated searches.
We summarized the ROR and precision per abbreviated search across all reviews by providing the mean (and standard deviation), the median (IQR), and the range. We used the Wilcoxon signed rank test to compare the number of events and participants across studies. P-values !0.05 were considered statistically significant. Analyses were performed using Stata/IC 14.2 (StataCorp, College Station, TX, USA) and RStudio, version 1.2.1335, (R version 3.6.1.)

Patient involvement
No patients were involved in this research.

Results
The 47 original meta-analyses (Supplementary material 1) included a total of 444 randomized controlled trials (RCTs) with data on 360,045 participants with 29,255 events (with a median of 6 trials per meta-analysis, IQR 3 to 11; median of 1,371 participants, IQR 685 to 8,041; and median of 209 events, IQR 62 to 773). Across all trials for which the Cochrane reviews reported crude events (432 of 444), the median event rate was 0.09 (IQR 0.04 to 0.21). The reviews' topics were on cardiovascular disease (21 of 47), chronic respiratory diseases (8 of 47), osteoarthritis (7 of 47), mental health (6 of 47), and cerebrovascular disease (5 of 47). Confidence intervals were not crossing the null in 13 of 47 meta-analyses (28%).

Loss of information
In half of all meta-analyses (median), searching only one data source led to the loss of at least one trial; in a quarter of the reviews, it led to the loss of at least two trials ( Table 1, Figs. 1 and 2A). With any of the abbreviated searches, at least 86% of the trials, 98% of the events, and 96% of the participants included in the selected meta-analyses were still found in more than half of all meta-analyses (highest proportion of missing information across the 47 meta-analyses when searching Embase only: missed 14% of trials, 2% of events, and 4% of participants).
Overall, the loss of information was highest with abbreviated searches in a single data source only (Table 1, Fig.  2AeD).

Comparison of effect estimates
Treatment effect estimates based on abbreviated searches were identical to those based on comprehensive searches in 34% (16 of 47) to 79% (37 of 47) of the 47 meta-analyses ( Figs. 1 and 3). They were not identical but in the same direction and had the same level of statistical significance in 13% (6 of 47) to 51% (24 of 47) of the meta-analyses. They were in the same direction but were changed in the level of significance in 2% (1 of 47) to 6% (3 of 47) of the meta-analyses [31,35,36]. Treatment effect estimates were in the opposite direction in 4% (2 of 47) to 9% (4 of 47) of the meta-analyses. This concerned overall six different meta-analyses [37e42]. After conducting the abbreviated searches, there were not enough data left to estimate treatment effects in 2% (1 of 47) to 4% (2 of 47) of the meta-analyses. A total of three reviews were thus affected [43e45]. Overall, abbreviated searches led in 6% (3 of 47) to 13% (6 of 47) of the metaanalyses to effects that were in the opposite direction or that could not be estimated anymore. There was one review where only the comprehensive search provided enough data to conduct a meta-analysis [45].

Absolute deviation and ratio of odds ratios
There was no deviation of treatment effect size estimates at all in 50% of the meta-analyses across all abbreviated searches (i.e., 1.00-fold deviation; exception: searching Embase only with 1.01-fold deviation; Table 2). The treatment effect size estimates did not deviate with any abbreviated search in 75% of the reviews by more than the 1.07fold (upper IQR for searching Embase only, Table 2, Fig. 4). For all abbreviated searches, there was one metaanalysis [42], where the abbreviated search gave results deviating substantially (absolute deviation to original meta-analysis up to 2.39-fold; outcome: withdrawal due to adverse events: two of the five RCTs could not be found with any of the search strategies as they are unpublished data from the industry sponsor and constitute 82.5% of the weight of the meta-analysis) [42].
Treatment effect estimates of abbreviated searches were not consistently smaller or larger than those based on comprehensive searches: the median ROR was 1.00 (IQR 1.00 to 1.00) across all abbreviated searches, that is, there was no change of the treatment effect size estimates in most meta-analyses and abbreviated searches ( Table 2). The mean ROR was similar across all abbreviated searches irrespective of whether the searches had been based on only a single or on multiple data sources.

Precision
Using abbreviated searches introduced imprecision to a certain extent. Standard errors were on average between 1.02-and 1.06-fold larger than with comprehensive searches, but there were no clear differences among abbreviated searches that were based on single or multiple data sources (Table 2).

Characteristics of studies that were missed by abbreviated searches
Studies that were not found by searching only MED-LINE were smaller and had fewer events than those that were found (median 100 vs. 135 participants; P 5 0.003; median 8.5 vs. 14 events; P 5 0.017). This was not the case for Embase (median 129 vs. 127 participants; P 5 0.575; median 16 vs. 13 events; P 5 0.685). Studies that were not found by searching only CENTRAL were smaller (median 101 vs. 131 participants; P 5 0.026) but had similar events (median 13 vs. 14 events; P 5 0.145).

Sensitivity analyses
Results from sensitivity analyses (i.e., using the Hartung-Knapp-Sidik-Jonkman method and the fixedeffect approach and excluding meta-analyses without active controls for ROR analyses) were similar and supported the main findings (Supplementary material 3).

Discussion
Our analysis of 658 abbreviated searches showed that effect estimates based on abbreviated searches often came to identical or similar statistical main results as those obtained through comprehensive searches in Cochrane reviews. However, in up to one in seven reviews (i.e., 6e13%), the direction of the effect estimate changed or it was not possible anymore to provide a result at all when relying on abbreviated searches. This may have a substantial impact on decision-making but which may also be an acceptable tradeoff for users of rapid reviews [46]. Treatment effect estimates of abbreviated searches were on average not consistently smaller or larger than original estimates in Cochrane reviews. We could not identify an abbreviated search that seemed to clearly outperform the other abbreviated variants. In fact, even the most comprehensive of the abbreviated searches could not obtain all information that the original Cochrane search strategy retrieved. However, abbreviated searches that were based on at least two data sources found many more studies than searches based on a single source and resulted in smaller loss of information by including a larger number of trial participants and events. This is in line with previous research using technical metrics such as the recall to compare search strategies [24]. Our results show that effect estimates do not necessarily change when some of the theoretically available data are not included in an evidence synthesis.
In the first part of this project [28], we analyzed the impact of abbreviated searches on the overall conclusion of a review. We found that abbreviated searches can be an acceptable alternative to comprehensive searches if the decision at hand does not require the highest possible certainty. We also found that single database searches were unreliable for drawing conclusions and should hence be avoided in evidence syntheses [28]. Overall, these results   are in line with our previous results. In contrast to that, the abovementioned analysis by Marshall et al. [24] found that the treatment effects would not change in most reviews when searching only PubMed. This discrepancy may be explained by the assumptions underlying their simulation study: instead of assuming that a search in PubMed would identify all indexed studies, we actually tested that on a large scale by replicating 47 searches in 14 different abbreviated variants.
Several limitations of our analysis merit closer attention. First, we searched specifically in MEDLINE (not in all sources covered by PubMed, e.g., PubMed Central) and in the Embase-specific source of Embase.com (i.e., excluding MEDLINE records that are available in Embase but are not independently indexed for Embase by the interface provider Elsevier). Systematic reviewers using PubMed may find some of the articles that our MEDLINE-specific search did not find, and when they would use Embase without restrictions, they probably would find some of the MEDLINE articles. We avoided this overlap to allow a cleaner comparison of the contributions of each data source. Thus, we may have observed larger disagreements between the abbreviated search variants than what would be seen in more typical applications of PubMed or Embase. However, this would not alter our overall interpretation.
Second, for the reference list searches, we used Scopus to make this step feasible in the several hundred literature searches. However, although Scopus is a very large database, not all relevant references are indexed, and this procedure may also have resulted in an overestimation of the disagreement between abbreviated searches with reference list search compared with comprehensive searches. Again, this would not change our overall interpretation.
Third, all of the analyses are based on binary endpoints with focus on major clinical topics. It is possible or even likely that any differences resulting from different search strategies would be more pronounced in less prominent endpoints of reviews, for example, adverse effects. Three of the endpoints in our analysis were related to withdrawal due to adverse events (Supplementary material 1). We have not explored the impact on frequently underreported continuous outcomes (such as quality of life). Outcomes with a perceived lower relevance or research results for more specific clinical fields may be reported more often in lower impact journals that are less likely to be indexed in the main literature databases. Thus, abbreviated searches may more often generate disagreeing results for more specific topics and less prominent outcomes than our results suggest. However, we did not assess potential predictors of disagreeing findings.
Fourth, we did not assess the impact of databases beyond MEDLINE, Embase, and CENTRAL, or the impact of alternative search methods beyond reference list searching. Cochrane reviews usually use a broad range of information retrieval strategies, for example, searching specific context relevant databases, searching other resources (e.g., clinical trial registries), or contacting experts. Hence, we cannot make inferences about abbreviated search techniques using other data sources than MEDLINE, Embase, CENTRAL, and reference lists.
Fifth, we only analyzed the impact of different abbreviated searches on the treatment effects from 47 Cochrane systematic reviews. We focused on five major health topics (mental health, osteoarthritis, chronic respiratory diseases, cardiovascular diseases, and cerebrovascular diseases), and our random sample may not reflect the diversity of all systematic review topics and outcomes. Assessing a larger sample from a wider spectrum of fields could have provided more precise and generalizable estimates. However, manually replicating and conducting more than the several hundreds of searches we already conducted for the overall project would not have been feasible.
Sixth, we included systematic reviews up to 2016 and no more recent reviews. However, the databases considered in this project, MEDLINE, Embase, and CENTRAL, continue to be standard sources for systematic reviews, and there were no fundamental changes in the typical information retrieval processes of systematic reviews that would let us believe that the results do not apply for more recent situations.
Finally, we determined the impact of abbreviating searches compared with the complex search strategies of Cochrane reviews as gold standard, reflecting a typical application of abbreviated searches. Because even Cochrane reviews may not retrieve all available evidence, we are not able to quantify the impact of abbreviating searches compared with theoretically perfect searches that would identify all existing evidence.
In light of these limitations, we encourage other researchers to repeat our methods, for example, in other health fields and with more studies, to better understand the applicability of the results. More certainty regarding our research questions and the results would help all systematic reviewers to decide on the extent to which searching is not only feasible, but also necessary to assure a minimum certainty about the possible impact of searching limited amounts of databases on the effect estimates.
Overall, we conclude that abbreviating literature searches often lead to identical or very similar treatment effect estimates in systematic reviews, but relevant differences may occur occasionally. Treatment benefits or harms found by using more comprehensive searches would typically remain visible with abbreviated searches. Sometimes statistical significance may be gained or lost, but in only one of seven reviews, favorable effects would seem unfavorable (or vice versa) or all information supporting decisions would be lost. It was not clear which type of abbreviated searches would be preferable. More comprehensive searches should be considered when higher certainty is required for decision-making.