Development and application of an outcome-centric approach for conducting overviews of reviews

There are gaps in current guidance concerning how to conduct overviews of systematic reviews in an outcome-centric manner. Herein we summarize the methods and lessons learned from conducting 4 outcome-centric overviews to help inform the Canadian 24-Hour Movement Guidelines for Adults aged 18–64 years and Adults aged 65 years or older on the topics of resistance training, balance and functional training, sedentary behaviour, sleep duration. We defined “critical” and “important” outcomes a priori. We used AMSTAR 2 to assess review quality and sought 1 systematic review per outcome. If multiple reviews were required to address subgroups for an outcome, we calculated the corrected covered area (CCA) to quantify overlap. We report our methodology in a PRISMA table. Across the 4 overviews, authors reviewed 1110 full texts; 45 were retained (low to high quality per AMSTAR 2), representing 950 primary studies, enrolling over 5 385 500 participants. Of 46 outcomes, we identified data for 35. Nineteen outcomes required >1 review (CCA range: 0% to 71.4%). Our outcome-centric overviews addressed unique aspects of overviews, including selection and quality assessment of included reviews, and overlap. Lessons learned included consistent application of methodological principles to minimize bias and optimize reporting transparency. Novelty • Overviews of reviews synthesize systematic reviews in a rigorous and transparent manner. • Outcome-centric systematic reviews assess the quality of evidence for primary studies contributing to an outcome. • This manuscript describes the development and application of extending the concept of outcome-centric systematic reviews to the design and conduct of outcome-centric overviews.


Introduction
Guideline developers require rigorous synthesis of the best available evidence to inform recommendations. Often, guideline development efforts occur within restricted time periods or with limited human and financial resources. Practical, rigorous, and transparent decisions must inform how guideline developers synthesize and interpret the best available evidence efficiently. High-quality guidelines use evidence from systematic reviews (Brouwers et al. 2010), either existing (published) or generated de novo. Systematic reviews synthesize primary studies that fit prespecified eligibility criteria to answer a specific research question by using established and explicit methods to reduce bias and random error (Cook et al. 1997;Higgins and Green 2011).
For many research questions, multiple systematic reviews have been published. Recently, systematic review methodology has been extended to overviews of systematic reviews (Hunt et al. 2018;Pollock et al. 2018) (also known as umbrella reviews, metareviews, review of reviews, and synthesis of systematic reviews (Hunt et al. 2018)). Overviews synthesize systematic reviews in rigorous and transparent ways, such that the unit of analysis is a systematic review, rather than a primary study. Overviews can summarize evidence regarding the same intervention or exposure where different outcomes are addressed in different systematic reviews (Becker and Oxman 2011). Overviews thus allow for brokering existing work and provide a synthesis of existing systematic reviews that are relevant to a specific research question (Becker and Oxman 2011;Hunt et al. 2018). Methodology for conducting overviews of systematic reviews is in its infancy and is rapidly evolving (Lunny et al. 2018). Although the Cochrane Handbook recently updated its chapter on the conduct of overviews of reviews , there are gaps in current guidance concerning how to conduct overviews of reviews in an outcomecentric manner. In an outcome-centric approach, the quality of the evidence is rated separately for each outcome across studies contributing to the estimate of the effect, rather than rating the quality for each study as a single unit. The Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) group recommends an outcome-centric approach (Guyatt et al. 2008), which has important implications for guideline development from overviews. This outcome-centric approach allows a transparent summary of the overall quality of evidence across outcomes.
The newly released Canadian 24-Hour Movement Guidelines for Adults aged 18-64 years and Adults aged 65 years or older (hereafter called Guidelines ) were informed in part by evidence from 4 overviews El-Kotob et al. 2020;McLaughlin et al. 2020;Saunders et al. 2020). Overviews were used because of the anticipated large body of systematic reviews existing for some topic areas of interest and because of restricted timelines and fixed resources. During the development of the Guidelines, we developed a novel methodological approach for conducting overviews in an outcome-centric manner. The objective of this paper is to summarize the methods we developed for conducting overviews of reviews using an outcome-centric approach. We identify the relevant literature that helped to inform our approach and provide rationale for methodological decisions.
We outline the lessons we learned from conducting overviews of reviews.

Protocol development and methods for ongoing decision-making
We developed a Content Working Group to discuss issues related to all of the Guideline-related reviews, which included some de novo reviews . Group members included the primary or senior authors for the overviews (J. . During our monthly scheduled meetings, we discussed questions related to protocol development and methodological conduct identified by overview authors. We conducted additional informal discussions via email or by teleconference to maintain momentum. We consulted external methodological experts and considered published overviews and guidance for conducting overviews. Standardized approaches to the overviews, based on principles of minimizing bias and optimizing transparency of decision-making, were followed and are outlined throughout this manuscript. We developed our research questions for each overview of reviews at a meeting of the Guideline Consensus Panel in October 2018 . Our approach was informed by standard methodology to conduct (Higgins and Green 2011) and report (Liberati et al. 2009) systematic reviews. We reviewed the literature for guidance on conducting overviews (Becker and Oxman 2011;Hunt et al. 2018;Lunny et al. 2016aLunny et al. , 2017Lunny et al. , 2018Pollock et al. 2018), and empirical studies documenting the methodological conduct (Hartling et al. 2012;Lunny et al. 2016b;Pieper et al. 2012) and reporting (Hartling et al. 2012;Lunny et al. 2020;Pieper et al. 2012) of overviews. We had a fixed 12-month time frame to complete the overviews so that the principal findings could be presented for consideration at the subsequent Consensus Panel Meeting .

Eligibility criteria
We summarize the research questions and all of the key elements of the eligibility criteria by overview in Table 1 by population, intervention or exposure, comparison, outcomes, and study designs. Please also see individual overviews for further details of each item below El-Kotob et al. 2020;McLaughlin et al. 2020;Saunders et al. 2020 According to the ProFaNE taxonomy, resistance training is defined as "contracting the muscles against a resistance to 'overload' and bring about a training effect in the muscular system. The resistance is an external force, which can be one's own body placed in an unusual relationship to gravity (e.g. prone back extension) or an external resistance (e.g. free weight)." (Lamb et al. 2011). § Gait, balance, and functional training or 3D exercise (e.g., Tai Chi or dance) defined according to the ProFaNE taxonomy: "Gait training involves specific correction of walking technique (e.g., posture, stride length and cadence) and changes of pace, level and direction. Balance training involves the efficient transfer of bodyweight from one part of the body to another or challenges specific aspects of the balance system (e.g. vestibular systems). Balance retraining activities range from re-education of basic functional movement patterns to a wide variety of dynamic activities that target more sophisticated aspects of balance. Functional training uses functional activities as the training stimulus and is based on the theoretical concept of task specificity." (available from http://www.profane.eu.org/taxonomy.html). Tremblay et al. (2017). literature search), and to report risk of bias of individual studies included in the review (item 9: risk of bias) (Shea et al. 2017).
Although the AMSTAR 2 incudes 7 "critical" items (Shea et al. 2017), we focused on these 2 items to ensure that relevant literature was appropriately and comprehensively sought and critically appraised, since we intended to broker quality appraisals as conducted by individual review authors. Systematic reviews with or without meta-analysis were eligible. Each lead author determined the primary study designs suitable for inclusion within their systematic reviews. Throughout the remainder of this document, we use the term "review" to denote "systematic review" and "overview" for a systematic review of systematic reviews.

Population
Across all overviews, the population of interest was communitydwelling adults aged 18 years and older, including apparently healthy adults. If outcomes for the population of interest were not reported separately, we considered systematic reviews with a mixed population if 80% or more of participants were from studies performed in our population of interest or if the sample average fit within the criteria.

Intervention/exposure/comparison
The 4 reviews addressed resistance training (El-Kotob et al. 2020), balance and functional training activities (McLaughlin et al. 2020), sedentary behaviour , and sleep duration .

Outcomes
The GRADE approach relies on evidence related to a predefined group of priority outcomes to inform recommendations (Guyatt et al. 2008); thus the process is outcome-centric. "Critical" and "important" outcomes for each research question were identified at the Consensus Panel meeting . Our Content Working Group adopted the same definition for similar outcomes (e.g., health-related quality of life) across overviews ). For our outcome-centric overviews, we sought individual reviews that synthesized data for our outcomes of interest. Table 1 lists the 46 (26 unique) "critical" and "important" outcomes by overview.

Subgroups of interest
The subgroups of interest were age (<65 vs. ≥65 years old, because of our remit to develop guidance for adults and older adults), dose (e.g., volume, frequency, or intensity, as applicable for each review, such as duration of sleep or intensity of resistance training), and type (e.g., power or traditional strength training, screen-based and nonscreen-based sedentary behaviours).

Search strategy
A 2-step strategy was used to develop the search for systematic reviews. We modified topic searches developed for previous Canadian pediatric 24-hour movement guidelines (Tremblay et al. 2017) to reflect an adult population. This was combined with the Scottish Intercollegiate Guidelines Network search string for systematic reviews (https://www.sign.ac.uk/what-we-do/methodology/ search-filters/). In the second step, we used the initial 50 results to make changes to the search as necessary and translated the search into other databases (Ovid platform for Medline, Embase and Psy-cInfo; Ebsco platform for CINAHL). Searches were conducted the week of December 18, 2018, and updated on August 14, 2019, and reflected publications from a 10-year time period between 2009 and 2019. The search strategies are all located at the following unique and persistent identifier: http://hdl.handle.net/1974/ 27648. We de-duplicated and imported all bibliographic records from the search into a reference management software (Reference Manager Software, Thompson Reuters, San Francisco, CA, USA).

Study selection
Independently, in duplicate, pairs of reviewers screened titles and abstracts. If 1 reviewer included an abstract, it automatically progressed to full-text review. We required both reviewers to agree on excluded abstracts. We then retrieved all potentially relevant full-text articles and supplementary data for further review. We resolved disagreements by consensus or by consultation with a third reviewer. We documented all reasons for excluding Fig. 1. Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) diagram adapted for reporting overviews of reviews. *Note: If no reviews reported critical outcomes, we rescreened records that reported <80% mixed populations for primary studies. Candidate reviews may have reported on more than 1 outcome of interest. The sum of reviews per outcome may be more than the number of candidate reviews. Similarly, the sum of excluded reviews across outcomes may be more than the number of reviews for each outcome, and for the number of candidate reviews. AMSTAR 2, A MeaSurement Tool to Assess systematic Reviews 2; RoB, Risk of Bias.
full-text reviews and used Covidence (Veritas Health Innovation, Melbourne, Australia) to document all citation decisions. Figure 1 outlines the generic Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram for our overview of reviews.
In keeping with our outcome-centric approach, we sought 1 systematic review to inform each outcome by age, dose, or type of intervention/exposure. To handle potential situations where multiple reviews reported an outcome, we developed and applied the a priori hierarchy described below and in Fig. 2. First, we prioritized reviews directly reporting each outcome (i.e., measures of the actual outcome). If reviews with direct outcome measures were not identified, we sought reviews that reported on the most pertinent indirect marker (e.g., for the direct outcome "cardiovascular disease", we prioritized the surrogate outcome "blood pressure" over "lipid profile"). We established a hierarchy of indirect outcomes a priori and documented these decisions in each overview. Second, we prioritized reviews reporting the association between the outcome and each of the following subgroups: age, dose, or type of exposure, as described above.
If we identified multiple eligible reviews (based on the first 2 steps), the review of the highest quality based on a full AMSTAR 2 assessment (i.e., assessed as having "high", "moderate", "low", or "critically low" confidence in the review) was selected. If more than 1 review had the same (highest) AMSTAR 2 quality rating, we prioritized the most recent review by publication date. If the above process did not yield sufficient information to address the subgroups for an outcome, we retained multiple reviews. This process was repeated for every critical and important outcome in our overviews.
If we identified no reviews for critical outcomes, we reassessed reviews where <80% of the population in the included studies met our eligibility criteria for the target population and repeated the process above. If these reviews reported primary study data, we conducted a meta-analysis or narrative synthesis with the subset of studies that met our eligibility criteria. If we did not identify any reviews for an outcome, we considered conducting de novo reviews for "critical" outcomes, but not for outcomes deemed "important", since these outcomes have less significance to inform decision-making (Guyatt et al. 2008).

Data extraction, risk of bias, and quality assessments
We used spreadsheets (e.g., Excel (Microsoft, Redmond, Wash., USA) or Google Sheets (Google, Mountain View, Calif., USA)) and Covidence (Covidence, Melbourne, Australia) for data extraction, which was completed by 1 reviewer, and verified by another reviewer for each overview. By outcome, we extracted standard information from each review, including characteristics of the review (e.g., first author name, publication year, country, funding source, databases, and timeframes searched), characteristics of primary studies within a review (e.g., number and the publication year of the included primary studies, participant characteristics (pooled sample size, age, sex)), intervention/exposure, comparator(s), outcome(s), setting, and risk of bias results and quality assessment tools for primary studies (as assessed by review authors). We considered risk of bias and quality assessments in 2 stages. First, we used AMSTAR 2 (Shea et al. 2017) to assess the quality of included reviews. Second, for included reviews, we extracted the quality assessments of the evidence or of the primary studies reported by the authors for each outcome.

Data synthesis
By outcome, we reported evidence from randomized and nonrandomized studies separately. We reported the available summary estimates and confidence intervals, and the number of primary studies and participants that contributed to each estimate in summary of findings tables in each overview. We reported both pooled estimates and results from primary studies that were not included in pooled estimates but described in systematic reviews. In instances where we retained multiple reviews for an outcome, we assessed the degree of overlap in primary studies contributing to that outcome across reviews using the corrected covered area (CCA) (Pieper et al. 2014). The CCA represents the proportion of repeated occurrences of primary studies in other reviews, adjusted by the number of unique primary studies (see Pieper et al. 2014 for further details). The extent of primary study overlap among the reviews was interpreted and reported in each overview as either slight (0%-5%), moderate (6%-10%), high (11%-15%), or very high (>15%) (Pieper et al. 2014).

Fig. 2.
Decision rules for retaining a review(s) in an outcome-centric overview of reviews. See Methods, Study Selection, for further details.

Research reporting
We used PRISMA to guide the reporting in our overviews (Liberati et al. 2009). Table 2 outlines our approach to an outcomecentric review and extended reporting within the PRISMA checklist.

Analysis
Across the 4 reviews, we estimated the workload per review by calculating the search efficiency (% of the included reviews vs. initial number of studies identified) and time from review registration to initial manuscript submission (Borah et al. 2017). We documented the availability of data for critical and important outcomes, and prevalence and quantity of overlap across outcomes. Finally, to summarize the processes associated with review and primary study quality assessment, we documented the AMSTAR 2 assessments across outcomes, and types of risk of bias assessments used for primary studies in individual studies. We calculated counts and percentages for binary data and means and standard deviation (SD) for continuous data.

Results
Across the 4 overviews, authors reviewed 9432 studies and 1110 full-text publications, of which 141 reported our outcomes of interest. We retained 45 reviews to address our questions, representing 0.5% overall search efficiency ( Table 3). The mean (SD) time from review registration to manuscript submission was 61.3 (6.3) weeks. These 45 reviews represented 950 primary studies enrolling over 5 385 500 participants. Of 21 critical outcomes, we identified reviews for 19. For 2 critical outcomes in our resistance training review, we extracted a subset of data from primary studies meeting our population inclusion criteria (El-Kotob et al. 2020). We did not identify a review for 11 of 24 important outcomes, representing at least 1 outcome in each of the 4 overviews (24% of all outcomes). Of the 34 outcomes for which we identified data, 17 (10 critical, 7 important) required more than 1 review to address our subgroups of age, dose, or type of exposure. The CCA for overlap ranged from 0% to 65.7% for critical outcomes and 0% to 71.4% for important outcomes.
Of the 45 retained reviews, the AMSTAR 2 ratings were distributed as follows: critically low (n = 1, 2%), low (n = 10, 22%), moderate (n = 29, 64%), and high (n = 5, 11%). Common AMSTAR 2 limitations included lack of protocol registration (item 2), no list of excluded studies or justification (item 7), lack of duplicate study review (item 5) or data abstraction (item 6), missing detail about the rationale for included studies (item 8), and unreported sources of funding for included studies (item 10). Please see individual reviews for further information.
The 4 overviews documented 6 different types of primary study designs and reported 14 different tools to assess risk of bias (Table 3). For randomized trials alone, authors documented 6 different types of risk of bias assessments (Table 3).

Discussion
We developed and applied an outcome-centric approach for conducting overviews of reviews, with the goal of rigorously identifying the best available evidence while prioritizing available resources within a limited timeframe and reducing research waste. While overviews share many methodological similarities with systematic reviews, we identified gaps on how to conduct outcomecentric overviews. A recent audit identified 7 unique areas that arise during the conduct of overviews, including identification of review methods that raise concerns about bias or quality, and report of overlapping information and data (Lunny et al. 2020). In this paper, we provide guidance around unique aspects of outcomecentric overviews including protocol development to minimize bias, how to select reviews for an outcome, and how to address overlap. This manuscript complements efforts to develop report-ing guidelines for overviews that are currently underway (Pollock et al. 2019).

Rationale for using overviews during guideline development
As with many fields of research, the availability of systematic reviews in sleep, sedentary behaviour, and physical activity is accelerating rapidly. In the medical literature, approximately 75 new randomized controlled trials and 11 systematic reviews were published per day in 2010 (Bastian et al. 2010); the number of systematic reviews doubled to approximately 22 per day by 2016 (Page et al. 2016). The time and resources required to conduct a systematic review from protocol registration to publication is approximately 66 weeks, with trimmed mean of 63 papers reviewed in full text, yielding 15 included primary studies (Borah et al. 2017). Since 2000, the number of overviews of reviews is also increasing (Hartling et al. 2012;Pieper et al. 2012). For our 4 overviews, the time from protocol registration to manuscript submission was a mean of 61.3 weeks, with a mean of 275 papers reviewed in full text, yielding a mean of 12 reviews per overview, and a mean of 251 primary studies per review Thus, our overviews leveraged more data in a similar time frame to a single systematic review and reduced research waste. Given the volume of existing systematic reviews, conduct of de novo primary reviews for our research questions was not warranted, especially when considering the number of research questions that we had, our limited resources, and our limited timeframe.

Development of methodological guidance
Our outcome-centric overview is similar to a typical systematic review in the development and registration of a protocol; design of the research question(s); identification of critical and important outcomes; electronic search strategy; duplicate title, abstract, and full-text review; and duplicate data extraction (Liberati et al. 2009). Below, we outline some of the unique aspects of our approach and ongoing methodological challenges for future research where possible.

Protocol development to minimize bias
We set 2 minimum methodological criteriaadequacy of literature search (AMSTAR 2, item 4) and risk of bias reported for individual studies included in the review (AMSTAR 2, item 9) (Shea et al. 2017) -for reviews to progress from screening on to further consideration. We based these criteria on previous methodological studies of overviews documenting poor reporting (Hartling et al. 2012;Lunny et al. 2020;Pieper et al. 2012). We reasoned that the absence of reporting these 2 core criteria rendered the review at important risk of selection bias and hampered our understanding of the study risk of bias for the underlying evidence base. It is possible that reviews excluded at this stage reported data for some of the 10 intermediate outcomes for which we did not identify data from our cohort of included reviews.

Selecting a review
Our outcome-centric overviews sought 1 review for each outcome. If we identified multiple reviews documenting our outcomes of interest, we prioritized review rigour, as assessed by highest AMSTAR 2 score among reviews. We believed that if we could choose among different reviews, it was most important to have confidence in the overall conduct of the review. If multiple reviews had the same AMSTAR 2 score, we then chose the most recently published review, with the aim of capturing the most current evidence. Other objective a priori criteria such as recency of search, number of included primary studies, or review publication date could also inform selection among multiple reviews. In our view, the criteria to best inform review selection may be context-specific. For instance, if an intervention is older and well-studied, "number of included primary studies" may be an Across the critical and important outcomes, aim to identify 1 systematic review per outcome Complete the following steps for each outcome: 1. Identify all reviews reporting the outcome 2. Prioritize direct (measures of actual outcome) over indirect measures. If indirect measures, develop hierarchy 3. If multiple reviews report the same outcome, complete full AMSTAR 2 assessment and choose highest quality 4. If multiple reviews have same (highest) AMSTAR 2 quality, choose most recent by publication date 5. Determine whether additional reviews are required to address subgroups for the outcome 6. If no reviews report critical outcomes, reassess cohort of excluded reviews that did not meet 80% eligibility criteria (above) to determine if a subset of primary studies reported the outcome Data collection process 10 Describe method of data extraction from reports (e.g., piloted forms, independently, in duplicate) and any processes for obtaining and confirming data from investigators Same Report quality of reviews across outcomes, and risk of bias across reviews for each outcome Additional analysis 23 Give results of additional analyses, if done (e.g., sensitivity or subgroup analyses, metaregression (see item 16)).

Summary of evidence 24
Summarize the main findings including the strength of evidence for each main outcome; consider their relevance to key groups (e.g., healthcare providers, users, and policy makers)

Same
Limitations 25 Discuss limitations at study and outcome level (e.g., risk of bias), and at review level (e.g., incomplete retrieval of identified research, reporting bias) Discuss limitations at systematic review and outcome level, and at overview level

Conclusions 26
Provide a general interpretation of the results in the context of other evidence and implications for future research Same important consideration; if an intervention is relatively new and registered trials are ongoing, recency of search may be more important. In any case, establishing and documenting the rationale for these decision rules a priori is important to avoid introducing bias during study selection. However, the number of primary studies or publication date alone do not explicitly consider other risks of bias to the conduct of candidate reviews, and we suggest an overall AMSTAR 2 assessment to complement these criteria. Further research on methods to rigorously select systematic review(s) for an outcome is needed.

Addressing overlap
The currently available guidance for overviews suggests calculating overlap for all primary studies from all included reviews in an overview (Pieper et al. 2014;Pollock et al. 2018). In contrast, our outcome-centric overview assessed overlap by outcome when multiple reviews were required to address subgroups of age, dose, or type of intervention or exposure. Consider the following applied example for 2 reviews: Review A reported mortality and cardiovascular disease (CVD) by age; Review B reported mortality by dose. In our outcome-centric approach, we included Reviews A and B for mortality and reported the CCA; we included review A for CVD, with no overlap issues to report.
We measured overlap using the CCA, which was originally developed with a group of 60 systematic reviews to quantify the overlap among primary studies included in an overview (Pieper et al. 2014). In our outcome-centric overview, the degree of overlap could be high because the same studies addressed different subgroups for the review (e.g., modification of effect by age, vs. modification of effect by exposure dose). For example, in our sleep review ), the mortality outcome had a CCA of 65.7% because the authors included 2 reviews of very similar studies (1 of short sleep duration, and 1 of long sleep duration) to address the dose subgroup. Therefore, we calculated and reported CCA for the purpose of characterizing the overlap in primary studies that contributed to the estimates of effect, while avoiding strictly "doublecounting" data contributing to the same outcome.

Methodological challenges
A priori, for each outcome, we specified that 80% of the primary studies contributing to the estimate were required to meet inclusion criteria. We included this guideline to optimize the generalizability of results to our target population. For 2 critical outcomes in 1 review , we reassessed studies initially excluded for this 80% guideline, identified reviews with the highest proportion of primary studies meeting inclusion criteria, and synthesized data from this subset of studies. We only conducted a de novo synthesis of a subset of data for critical and not of important outcomes.
Once we identified the retained reviews for our outcomes, we encountered heterogeneity in the risk of bias assessments. The reviews reported 14 unique risk of bias tools, including multiple tools for randomized and observational studies. While we anticipated randomized trials and observational studies would use different risk of bias assessments, we did not anticipate different risk of bias assessments within similar study designs. A priori, we chose to retain the assessments as reported by authors and did not harmonize these assessments across reviews. As a result, we ob-served multiple types of risk of bias assessments within and across outcomes. Potential strategies to reduce variability include restricting inclusion criteria to specific types of risk of bias assessments or re-abstracting risk of bias assessments using 1 tool for the primary studies contributing to an outcome. Further study of strategies to manage different risk of bias metrics in primary studies of included reviews is required.
Finally, by AMSTAR 2 assessments, we identified methodologic limitations in our included reviews. The quality of an overview is highly dependent on the rigorous conduct of systematic reviews. Ongoing knowledge translation efforts to improve the rigorous design, conduct, and reporting of systematic reviews amongst authors are critical. The EQUATOR network is an excellent resource for reporting checklists (https://www.equator-network.org/). Instruction in evidence synthesis should be core research methodology in all research training programs. Taken together, these methodological challenges highlight an overarching challenge: overviews are ultimately limited by the coverage, methods, reporting, and overall quality of their included systematic reviews (Pollock et al. 2016).

Lessons learned
Upon reflection, we identified several lessons learned in the design, conduct, and analysis of overviews. First, it was critical to engage a multidisciplinary team, including content experts, information scientists, and methodologists with experience in the conduct of traditional systematic reviews. Whilst it may not be possible to anticipate all methodological challenges, adoption of principles of methodological rigour, strategies to minimize bias, and reporting transparency consistently guided our decisions. Early protocol development and registration, regular meetings among team members, and careful documentation of methodological decisions provided a framework for a consistent approach across reviews.
We recommend adopting a focused and structured a priori approach to prioritize outcomes, identify and retain reviews, and criteria to choose amongst multiple reviews reporting the same outcome. Similarly, we suggest establishing strategies to address gaps in evidence for outcomes based on importance, time, or resource constraints. Reviews may report different methods for risk of bias assessment; we suggest deciding upon strategies to address these different approaches across outcomes. Finally, we suggest use of the PRISMA checklist (Liberati et al. 2009), which helped us design, conduct, and report our overviews and our process for overviews in a rigorous and transparent manner.

Strengths and limitations
Our outcome-centric methodological approach has several strengths. We build on established, rigorous methodology for the conduct of systematic reviews, embracing principles of transparent reporting and minimization of bias. Our 2-step screening approach minimized bias by excluding poorly conducted systematic reviews concurrently with initial assessments of population, intervention/exposure, comparison, and outcome criteria. We identified the most rigorous systematic reviews for each outcome. We advanced the assessment of overlap by applying the concepts of CCA to the studies contributing to the estimate of effect of an   Observational studies (not otherwise specified) • QUALSYST • Quality assessment tool of observational cohort and crosssectional studies from the NHLBI 2 different tools Note: HRQOL, Health-related quality of life; NHLBI, National Heart Lung and Blood Institute; PEDro, Physiotherapy Evidence Database; TESTEX, Tool for the assessment of Study quality and reporting in EXercise.
*Missing outcomes -strength and resistance training (n = 6): incident type 2 diabetes, incident depression, brain health, incident cancer, fall-related injuries or falls, bone health; balance (n = 1): sedentary behaviour; sleep (n = 3): health-related quality of life, work productivity, physical activity and sedentary behaviour. † Total = number of outcomes where multiple reviews required and corrected covered area calculated. S162 outcome, rather than to a complete cohort of studies. Among our 4 overview exemplars, we identified areas for immediate improvement in reporting systematic reviews and primary studies. Our approach also has limitations. We restricted our search for systematic reviews to the most recent 10-year window, which excluded primary studies that may have been published after the search dates of the systematic reviews. We did not update the primary evidence by outcome for the retained reviews because of time and resource constraints. We retained the most rigorous systematic reviews by outcome; however, some reviews still had important methodology gaps as assessed by AMSTAR 2. Systematic review authors used different scales to assess risk of bias, and we did not reassess the quality of the primary data to a common metric. The results from any rigorous systematic review are still dependent on the quality of the primary studies contributing to an estimate of effect.

Conclusions
Our outcome-centric approach advances the existing methodology to the design and conduct of overviews of reviews. Across 4 overviews for the Guidelines, we transparently reported our approach to implementing an outcome-centric overview. We described solutions to specific overview methodology challenges, including review selection and overlap. Given the increase in interest for overviews of reviews, our principles could be generalizable to other situations where decision makers require rigorous syntheses of large bodies of evidence within fixed resources (e.g., time, personnel, funding).