Methodological issues of systematic reviews and meta-analyses in the field of sleep medicine: A meta-epidemiological study

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


Introduction
In the era of evidence-based medicine, credible evidence is the foundation upon which trustworthy decisions are built [1]. Systematic review and meta-analysis (SRMA) serves as an important source of evidence to support decision-making [2]. A systematic review summarizes findings from all available studies of the same topic, with or without a meta-analysis, and is expected to provide comprehensive evidence [3][4][5]. However, whether the evidence from a SRMA is credible largely rely on its design and conduct. These include but are not limited to 1) how the literature was searched and screened, 2) how the data were collected and analysed, and 3) how the results were interpreted and discussed. SRMAs involving methodological issues may generate non-credible results and mislead clinical practice [6].
In order to make a valid evaluation regarding methodological quality, several instruments have been developed in the past. These include the Sacks' checklist [7], the Overview Quality Assessment Questionnaire [8], the Assessment of Multiple Systematic Reviews (AMSTAR) [9], and the updated version of AMSTAR (AMSTAR 2.0) [10]. These instruments are widely used to assess methodological issues related to SRMAs. Jadad et al evaluated 50 SRMAs on the treatment of asthma and found that even after peer review there remain serious methodological flaws [11]. Xu et al investigated 529 dose-response meta-analyses and found that 87.9% of them were poorly designed and conducted [12]. These studies reveal that a large proportion of SRMAs may have poorly implemented safeguards that validate their conclusions.
Since SRMA was introduced to the field of sleep medicine, there has been an increasing number of SRMAs published during the past decades, some of which have been used as evidence in clinical guidelines (e.g. [13][14][15]) that governs physician's decisions, patients' behaviours, and administrators' policies. What makes things J o u r n a l P r e -p r o o f worrisome is the question around how well these SRMAs were designed and conducted and therefore whether the evidence they produced is credible. In this review, we conducted a comprehensive assessment to examine the methodological issues as well as investigate potential mechanisms to improve SRMAs conducted in the field of sleep medicine.

Protocol
A protocol for the meta-epidemiological study was developed in advance to formulate the design and conduct of this study (appendix 1). A meta-epidemiological study is defined as a methodological survey that "aims to evaluate trends and patterns in the literature with the overarching goal of improving the design, methods and conduct of future research" [16][17][18]. The protocol contained details regarding the review question, eligibility criteria, literature search, screen, quality assessment, data collection, and data analysis. Some changes were made: First, we limited the inclusion criteria to focus on systematic reviews with meta-analysis on healthcare interventions as suggested by the reviewers; second, we replaced the pre-defined subgroup analysis with a regression analysis considering that the interaction test of the potential difference of the effects among groups is underpowered when there are 3 or more categories [19]; further, we added a post-hoc sensitivity analysis for the regression to test the robustness of the results.

Eligibility criteria
We included systematic reviews with meta-analyses or meta-analyses alone on J o u r n a l P r e -p r o o f healthcare intervention published in the major academic journals of sleep medicine.
We focused on healthcare intervention because AMSTAR 2.0 was designed to assess such types of systematic reviews [10]. Systematic reviews had to contain a quantitative synthesis, as the appropriate use of meta-analysis is part of the outcome of interest. The definition of systematic review has been clearly documented in the Cochrane handbook [20]. A meta-analysis refers to a statistical and quantitative synthesis of available findings of similar studies on the topic in question, which is generally regarded as a type of systematic review [21][22][23][24]. Overviews, scoping reviews, and narrative reviews were not considered since they differ from SRMAs [25]. Pooled analysis that do not use a regular literature search for at least one database were also not considered. Studies that consisted of original data plus a systematic review/overview/scoping review, again, were not considered. The primary outcome of the current review was the methodological flaws within the eligible studies. The secondary outcome was to examine the association between baseline characteristics (see data analysis part) and methodological weaknesses.

Literature search and screen
Literature search was conducted by one experienced researcher (XC). We searched for SRMAs published in academic journals in sleep medicine indexed in PubMed, Medline and Embase databases from inception to 22-Oct, 2019. We identified 23 related journals from SCImago Journal & Country Rank (https://www.scimagojr.com/) such as "Sleep", "Sleep medicine reviews", "Sleep medicine". Of these we excluded four predatory journals (e.g. Journal of sleep disorders & therapy) based on the Beall's list and 19 journals were included. Of these non-predatory journals, we used indexing status as an additional criterion (MedLine versus non-MedLine). A full list of the J o u r n a l P r e -p r o o f journals and search strategy are presented in the appendix (appendix 1). Grey literature was not considered as we only aimed at peer reviewed SRMAs. We did not review the reference lists of eligible SRMAs since the sample would be sufficient and representative.
The Endnote X7 software was used to find duplicates. The Rayyan online app (https://rayyan.qcri.org/) was used for literature screening, which allows blinding of the raters to ensure the process was independent. Titles and abstracts were first screened by the lead author (XC) and those that were clearly not SRMAs were removed, a post hoc double-check of these excluded studies were performed by another author (LFK); the full-text of the remaining records were screened by two researchers (XC and LY) separately to make a further decision. Any disagreements were recorded and discussed until consensus was reached. The Cohen's kappa statistics was used for assessing inter-rater agreement [26,27].

Data collection
Baseline characteristics such as first author's name, number of authors, year of publication, region of affiliation of the first author, number of studies included, use of reporting guideline, funding information, type of main meta-analysis used in systematic reviews, and journal of publication of each SRMAs were extracted. This was done by one researcher (LY) and double checked by another researcher (XC). This information could be directly extracted and therefore no missing data was expected.
Meta-analyses were categorized as either a standard meta-analysis or a special type of meta-analysis. A standard meta-analysis was defined by use of classical synthesis methods based on head-to-head comparisons; special type of meta-analysis J o u r n a l P r e -p r o o f were those that involved more sophisticated assumptions and comparisons including diagnostic meta-analysis, dose-response meta-analysis, network meta-analysis, activation likelihood estimation meta-analysis, meta-analysis of prevalence, meta-analysis of means, meta-analysis of correlations, and meta-analysis of nucleotide polymorphism [28][29][30][31][32][33][34][35][36][37][38][39][40]. A detailed description of different types of meta-analyses is presented in Table S1.

Evaluation of methodological issues
The AMSTAR 2.0 instrument (https://amstar.ca/Amstar-2.php) was used to evaluate the potential methodological issues of eligible SRMAs The global methodology rating of a SRMA has routinely been judged by how many critical and non-critical weakness were identified, for example, high quality has been denoted as presence of none or only one non-critical weakness and critical low quality as two or more critical weakness [10]. However, such a judgement is somewhat arbitrary and anchors the assessment to the tool we have used. In order to make this universally valid, we used the relative quality rank as an alternative to measure the global methodological rating. This was done by enumerating items implemented out of 16 and creating a relative quality rank by dividing each J o u r n a l P r e -p r o o f enumerated count of safeguards by the maximum count across the SRMAs. The best SRMA thus has a rank of 1 (which serves as the anchor) and all lesser values are below this (range zero to 1).
There were two ("Yes" or "No") possible responses for each item, except for items 2, 4, 7, 8, and 9 where three possible responses ("Yes", "Partial Yes", "No") were available to rate the extent of a SRMA' adherence to the criterion. If an item was rated as "No", it was regarded as a weakness for the SRMAs. If no information was provided, a "No" response was rated [10,43]. For item 8 (describes the characteristics of included studies adequately), there was no clear indicators to distinguish "Partial Yes" (described all components but not in details) and "Yes" (described all in details), thus we contacted the principle investigator of AMSTAR for clarification but did not receive a response. Therefore, we rated all eligible SRMAs that described required components of characteristics as "Partial Yes" for item 8 to make a conservative evaluation. The enumerated counts considered "Yes" as 1 and "Partial Yes" as 0.5 while "No" was 0.
The lead author (XC), took charge of the assessment of methodological issues using the AMSTAR 2.0 tool. To ensure the quality of the process, at most 15 SRMAs were scheduled for assessment each day. A careful cross-checking process was utilised after the evaluation of all eligible SRMAs was completed. Then these records were double-checked by another researcher (LY). Any disagreements were discussed with two other methodologists (LFK and SD).

Data analysis
The baseline information of the SRMAs (e.g. author number, region) was qualitatively summarized. A bar chart was used to describe the adherence for each item with the J o u r n a l P r e -p r o o f proportion of each response ("Yes", "Partial Yes", and "No"). For methodological issues, we focused on two separate aspects: a) the numbers of total weakness and critical weakness for each SRMA and b) the relative rank of all items and critical items of each SRMA.
In order to investigate potential measures to improve the methodological validity, we established a weighted least square regression for the relative quality rank against four predefined variables. The best SRMA thus has a rank of 1 (which serves as the anchor) and all lesser values are below this (range zero to 1). The predefined variables were: 1) region of affiliation of the first author (America, European, and Asia-pacific),  [9,10]. The number of authors were categorized by the quartiles. We did not use funding information as a dependent variable because it was already contained in the AMSTAR 2.0, which would break the i.i.d assumption of regression analysis [44].
Considering that SRMAs published in the same journal may have clustering on the methodological issues, a cluster robust-error variance was used in regression analysis [45].
A post hoc sensitivity analysis was employed under the consideration that the detection of publication bias (item 15) could be difficult or not defined for SRMAs with special types of meta-analyses (e.g. network meta-analysis) in the current period due to methodological constraints. Item 15 of AMSTAR 2.0 may not be well suited for these SRMAs. Therefore, we recomputed the relative quality ranks by removing SRMAs with special type meta-analysis and repeated the regression analysis to see if the results remained stable. The analyses were conducted using Stata 14.0/SE (StataCorp, College Station, TX) with confidence level set at 0.95.

Baseline characteristics
The literature search identified 1,630 records, of which 936 were identified as duplicates. We further excluded 104 records by screening the titles and abstracts (appendix 1). Of the remaining 590 records screened by full-text, 353 were SRMAs.
Of which, we identified 163 that focused on healthcare interventions and were included in the analysis ( Figure 1). The kappa statistic was 0.66 between the two raters. A detailed description of the screening process, list of included studies, and list of excluded studies including the reasons for exclusion are available in appendix 1.
Baseline characteristics of the included studies are presented in Table 1 and The majority of the meta-analyses within these SRMAs were standard meta-analyses (n=157, 96.32%), and only 6 (3.68%) were special type meta-analyses.
For SRMAs with special type meta-analyses, 5 were network meta-analysis and 1 was activation likelihood estimation. About half (48.47%) of the SRMAs referred to the use of a reporting guideline (e.g. PRISMA [46]).
The median number of included studies in each SRMA was 13 (IQR: 8 to 23), J o u r n a l P r e -p r o o f most of them included more than 10 studies (n=107, 65.64%). In terms of funding information, 87 (53.37%) were supported by non-profit (government or institute) funding, 4 (2.45%) were supported by profit (industry) funding, 25 (15.34%) did not receive funding, and 47 (28.83%) did not report funding information.

Detailed methodological issues
The details of evaluation of the methodological issues for each SRMA are presented in appendix 3. Figure 2 presents the adherence to each methodological item.

Issue 1. Research questions and inclusion criteria
Most of the SRMAs (n=152, 93.25%; 95%CI: 88.32%, 96.19%) presented a clear research question and inclusion criteria in light of the population, intervention, comparison, and outcome (PICO). However, there were still some SRMAs that failed to clarify this (n=11, 6.75%; 95%CI: 3.81%, 11.68%), of which seven failed to provide a clear comparison, four did not clearly specify the population and one did not specify both intervention and comparison.

Issue 2. Protocol registration
Protocol registrations were identified in 28 (17.18%; 95%CI: 12.16%, 23.71%) SRMAs. There were 10 SRMAs that reported a protocol was developed in advance, but failed to provide it, and we decided to rate these as "No". For the 28 with accessible protocol, eight failed to develop a meta-analysis plan, a plan for investigating source of heterogeneity, and justify any changes from the protocol.

Issue 3. Study designs for inclusion
There were only 12 (7.36%; 95%CI: 4.26%, 12.42%) SRMAs that reported the J o u r n a l P r e -p r o o f reasons why certain study designs were included. Of which, one explained it in the abstract, and 11 explained it in the introduction or methods section. The majority (n=151, 92.64%; 95%CI: 87.58%, 95.74%) of the SRMAs failed to report the reason.
For the 12 SRMAs, three stated why only randomized controlled trials were included, seven reported why both randomized controlled trials and non-randomized studies of interventions were included, while the rest explained why only non-randomized studies were included.

Issue 4*. Literature search (Critical Item)
In total, 19 (11.66%; 95%CI: 7.59%, 17.49%) of the SRMAs used a comprehensive literature search strategy that satisfies all the components required (rated as "Yes"). In addition, 125 (76.69%; 95%CI: 69.63%, 82.52%) of the SRMAs searched two or more databases, provided keywords or strategy, and justified any limitations, which met the minimal requirement (rated as "Partial Yes"). However, there were 11.66% (95%CI: 7.59%, 17.49%; n=19) of the SRMAs failed to use a comprehensive literature search (rated as "No"). The reasons were: 13 of them only searched one database and six did not provide keywords or search strategy.

Issue 5. Duplicate study selection, literature screen
There were 73.01% (95%CI: 65.72%, 79.24%; n=119) of the SRMAs that stated that the study selection process was conducted by two reviewers independently (rated as "Yes"). It is notable that, of the 119 meta-analyses, only eight provided objective evidence (e.g. kappa statistic) that the process involved two reviewers.

J o u r n a l P r e -p r o o f
Similarly, 63.19% (95%CI: 55.56%, 70.21%; n=103) of the SRMAs stated that the data extraction process was conducted by two reviewers independently (rated as "Yes"). Again, only two of the 103 SRMAs provided objective evidence (kappa statistic) that the process involves two reviewers. More than one-third (n=60, 36.81%; 95%CI: 29.79%, 44.44%) of the SRMAs failed to perform study selection in duplicate.

Issue 7*. Study exclusion and justification (Critical Item)
Only 23 ( 10.1%, 20.97%) SRMAs that failed to describe all of these components. In details, 13 out of 24 SRMAs failed to specify the study design of the included studies, four did not provide any description on PICOS, four did not describe the comparators, one did not describe the outcomes, and two did not describe at least two of the component (IC: 1, ICO: 1).

Issue 9*. Risk of bias assessment
J o u r n a l P r e -p r o o f  31.03%) failed to achieve minimal requirements, of which, five did not report the assessment results, one did not report which tool was used, and 33 did not assess the risk of bias.

Issue 11*. Methods for statistical combination
For the statistical methods of combination, we identified 19 (11.66%; 7.57%, 17.49%) SRMAs with methodological issues to pool the data. The main problem was that most of them (n=15) incorrectly combined different types of studies together (e.g. cohort and cross-sectional study), of which, three also had other problems, for example, did not consider confounding and heterogeneity. In addition, two did not report the method of how the data were synthesized; one used fixed-effect model and did not consider heterogeneity; and one did not report how adjustments for confounding were handled.

Issue 13*. Results interpretation with risk of bias
Again, most of the SRMAs (n=130, 79.75%; 95%CI: 72.60%, 85.47%) did not discuss risk of bias with the results interpretation notwithstanding if bias was incorporated into results or not. We documented 33 (20.25%; 95%CI: 14.80%, 27.07%) SRMAs did consider risk of bias in results interpretation of which 15 were those that had adjusted results for bias as reported above. Thus, most of these SRMAs failed to assess the potential impact of risk bias on the results.

Issue 14. Exploring and explanation of heterogeneity
There were 107 (65.64%; 95%CI: 58.06%, 72.50%) of the SRMAs that had a low between study heterogeneity or had some heterogeneity and attempted to explore the source of heterogeneity and discussed the potential impact on the conclusions.

J o u r n a l P r e -p r o o f
In addition, 23 (14.11%; 95%CI: 9.59%, 20.28%) investigated publication bias but failed to discuss the potential influence (rated as "Partial Yes"). As much as 64 (39.26%; 95%CI: 32.09%, 46.92%) did not include investigation of publication bias (rated as "No"). Amongst the 64, 58 did not detect publication bias, and six did not provide results of publication bias. The reasons were recorded by nine SRMAs that the number of included studies were too small to assess publication bias.

Issue 16. Report sources of conflict of interest
Source of conflict of interest was reported by the majority of the SRMAs (n=158, 96.93%; 95%CI: 93.02%, 98.68%).  For total items, the best SRMA had a safeguard count of 12.5 and thus was regarded as the anchor for relative ranks. The median relative rank was 0.64, with the first quartile as 0.52 and the third quartile as 0.72 (Figure 4). This indicated that the top quartile SRMAs had up to a third (0-28%) of methodological safeguards missing.

Rating of each issue and global confidence
For six critical items, the best SRMA had a count of 6 meaning that all the six critical items were well adhered to. The median relative rank was 0.5, with the first quartile as 0.33 and the third quartile as 0.58 (Figure 4). This indicated that the top quartile SRMAs had up to almost half (0-42%) methodological safeguards missing.

Discussion
In this review, we comprehensively evaluated the methodological shortcomings of published SRMAs of healthcare intervention in the field of sleep medicine. Our results suggested that most of these SRMAs have 7 or more methodological issues and 2 of which, on average, were critical issues. These issues mainly were with study inclusion and exclusion and risk of bias assessment and interpretation. By summarizing the relative ranks, we found that the majority of the SRMAs were of much lower quality than the best SRMA, and this is more serious for the six critical items. And even for the best one, there were still 3 methodological items that were not well adhered to.
Results from the regression analysis suggest that SRMAs published in recent years tend to have higher quality ranks. This finding indicates that the methodological quality of SRMAs improved over the years. We further observed that SRMAs with the first author from Europe and the Asia-Pacific region tend to have higher quality ranks, especially for critical items. However, we did not observe a stable improvement on the methodological quality over the years for critical items.
We found that use of a reporting guideline did not helpful to increase the methodological validity of SRMAs for both global quality and critical items. This could be expected because reporting guidelines were primarily designed to help authors remember all items that need to be reported, rather than to conduct a SRMA [48]. However, some of the methodological issues were highly correlated with reporting problems, for example description of the inclusion criteria, study section, baseline characteristic, and conflict of interests. The role of reporting guideline on the methodological validity should be further investigated through well-designed experimental studies.

J o u r n a l P r e -p r o o f
The findings of the current study concurred with reviews from other fields (e.g. urology, bariatrics, general surgery) [49][50][51][52][53][54]. For example, Corbyons et al conducted a survey on SRMAs published in urology and their findings suggested that the methodological quality of these studies was suboptimal [49]; Storman et al found that 99% of the published systematic reviews/meta-analyses in bariatric surgery were critically low on methodological quality [50]. These findings revealed that, many SRMAs may have serious methodological issues.
In this review, we did not use the rating scheme recommended by AMSTAR 2.0 to rate the methodology confidence of eligible, instead, the relative rank method was utilized. In additional to the reason we mentioned earlier (i.e. subjective judgment), the rating scheme of this instrument is not sensitive to distinguish the confidence of SRMAs with critical low quality -all SRMAs with two or more critical issues were rated as critical low. Indeed, a SRMAs with 2 critical issues might be more credible than one with 3 critical issues. The relative rank method provides a better solution to rate the confidence and can avoid such problems.
We did not consider protocol registration as a critical item. Our previous study suggested that developing a protocol in advance although of benefit to improve reporting, may not represent the methodological quality of SRMAs well [55]. Waugh This whole concept needs revisiting to assess its fitness for purpose.
Based on current findings and our experiences, we proposed some J o u r n a l P r e -p r o o f recommendations about dos and don'ts of SRMAs beyond the AMSTAR 2.0 instrument: 1) when starting a SRMA, it is helpful to design and conduct it according to a well-designed instrument (e.g. AMSTAR 2.0 [10]); 2) when including both observational and experimental studies in a SRMA, it is not recommended to incorporate data of these two types of studies together as the former would introduce risk of reverse causality; 3) it is highly recommended to explain the selection of effect estimator (e.g. odds ratio, risk ratio) to measure the effects in the meta-analysis and how the effect estimators were dealt with when difference estimators were used by these studies; 4) it is recommended to use two or more weighting methods as sensitivity analysis when the effect was small but statistically significant; 5) if applicable, a dose-response gradient should be investigated; 6) when measuring publication bias, P-value driven methods (e.g. Egger's test, rank correlation test [56]) are discouraged as these are dependent on the number of studies included in a meta-analysis, instead non-P-value driven methods (e.g. LFK index [57]) should be used.
In this review, we employed a comprehensive evaluation of SRMAs in sleep medicine, to the best of our knowledge, this is the first study that focuses on the methodological issues of SRMAs in this field. We collected nearly all published SRMAs of healthcare intervention in the field of sleep medicine, therefore, our findings have a high level of representativeness. We acknowledge that our review had some limitations. In this study, the literature search was based on 19 academic journals of sleep medicine that there was no doubt that some related studies published in other journals (e.g. general journals) were not included, which may bring some selection bias on the results. Previous study had document that the methodology quality of meta-analysis may differs from general journals and specialist journals [58].

J o u r n a l P r e -p r o o f
However, it is difficult to identify meta-analyses on this topic (sleep) from other journals. Further, our study put focus on the methodological validity of SRMAs, while neglected the importance of the quality of individual studies included in these SRMAs.
The quality of these original studies is also very important. In addition, as we mentioned earlier, some methodology tips may not well reflected and covered in the AMSTAR 2.0, which may affect the validity of current survey. Moreover, the screen and assessment processes, although were strict, may still at risk of systematic errors since both were of somewhat subjective. These limitations should be highlighted and merit attention in the results interpretation.

Conclusions
In conclusion, the methodological validity of SRMAs of healthcare intervention was suboptimal when measured by AMSTAR 2.0 in the field of sleep medicine.
Although the it has improved over time, methodological confidence was lacking for most of these SRMAs. Based on current findings, we advocate a critical evaluation on the methodological validity of a SRMA before it can be used as clinical evidence.

Practice Points
1. Most of the systematic reviews have 7 or more methodological issues and 2 of which, on average, were critical issues. These issues can have serious impact on the credibility of the evidence.
2. Relative ranks are likely a better quality-assessment scheme than the absolute judgments commonly used.

Research Agenda
Future studies should: 1. Undertake a critical evaluation of the methodological validity of a systematic review before using it as clinical evidence.
2. More focus should be put on the methodological validity of the systematic reviews and meta-analyses rather than simple checklists followed by inexperienced researchers.

Type of study included
J o u r n a l P r e -p r o o f Table 2. Regression analysis for relative quality ranks of the methodology to four pre-defined variables. J o u r n a l P r e -p r o o f Table 3. Sensitivity analysis of systematic reviews with meta-analysis after excluding 6 special type meta-analyses.  Many thanks for giving us the opportunity to respond to the reviewers' comments. Our specific responses are detailed below.

Region
Reviewer #1: I thank the authors for their responses and appreciate the effort that went into revisions of this work. There remain points to be clarified and issues resolved before the reader can make an informed decision as to the findings of this research.
I still have concerns about methodology part of this research. In response to my earlier concern, you clarified that this is an epidemiological survey on methodological quality of published meta-analytic reviews of healthcare intervention in the field of sleep medicine, where meta-analyses of RCT, NRCT, and "not reported" original studies were included. You further stated that "We conducted the study according to AMSTAR 2.0 where relevant items were applicable. There is no need for us, in principle, to follow AMSTAR when reporting this paper." The clarifications are needed to the following items: 1. Re statement: "The current study does not aim to assess healthcare interventions, but rather a survey on methodological quality. Therefore, we only defined the "P" and "O". "P" was systematic reviews with meta-analysis of healthcare intervention, "O" was the methodological quality." AND "By going through the list, several studies were narrative reviews (References 97, 304, 347, 352, etc). What was your rationale to include them?" a. Please clarify inconsistences in inclusion/exclusion criteria-systematic reviews with meta-analyses, OR meta-analyses OR both. This decision has significant implication for your study searches, selection, appraisal, and results. Response: We have revised the description of the inclusion criteria, now it says "We included systematic reviews with meta-analyses, or meta-analyses alone of healthcare intervention..." b. Please provide definition of "survey on methodological quality" as it concerns methods used in epidemiological research applied in this study. This should also be reflected in the title and in manuscript's method section, as per AMSTAR 2.0.

Response:
We have changed the term "survey on methodological quality" to "meta-epidemiological study", and added the definition of "meta-epidemiological study". This change is reflected in the title and methods section in the revised manuscript.
In a recent study (J Comp Eff Res. 2020;9(7):497-508.), the term "meta-epidemiological study" refers to a study type that aims to evaluate trends and patterns in the literature with the overarching goal of improving the design, methods and conduct of future research.
2. You indicated that you used The AMSTAR 2.0 instrument to evaluate the methodological issues of each study included in this research. The AMSTAR was developed to enable appraisal of systematic reviews of randomised and non-randomised studies of healthcare interventions.
a. Please provide rationale for using this framework to appraise narrative reviews that applied quantitative synthesis of multiple studies, and studies that included mixed data (i.e., longitudinal studies, cross-sectional., etc.). Response: Narrative reviews often use qualitative summary or a meta-ethnographic technique to qualitatively synthesize the findings (https://guides.library.uab.edu/c.php?g=63689&p=409774). They do not involve a quantitative synthesis. In another word, a review with quantitative synthesis is either a systematic review with meta-analysis, or a pooled analysis (differs from meta-analysis, as it does not have a regular literature search). As we have clarified a pooled analysis was not included.
There were few SR/MAs that included mixed data such as longitudinal studies or cross-sectional one, while in the healthcare intervention part, they only use experimental study for the analysis (e.g. sleep Breath. 2016;20(2):719-731.). For longitudinal studies mentioned in the included SR/MAs, these were based on post-pre design, so they can be considered quasi-experimental (i.e. NRSI).
The different study designs do not impact the appropriateness of using AMSTAR 2.0. There were several items related to this point (e.g. item 3, item 11, item 12), while there is a very clear criteria to assess RCTs and NRSIs separately. For example, for item 11, both cohort and cross-section studies should adjust the potential confounding, and studies with different designs should not combine together. Longitudinal study is analytic design, while cross-sectional is observational design, so could not be combine together. As we reported in the results "The main problem was that most of them (n=18) incorrectly combined different types of studies together (e.g. cohort and cross-sectional study)…" 3. Some studies included in your reviews were systematic reviews with meta-analyses and some were meta-analyses. You also stated in response to my earlier comment (revision 1) that "References 97, 304, 347, and 352 included quantitative synthesis of multiple studies (i.e. meta-analyses)" and therefore were included. Please elaborate on how rating for the proportion of weakness of each item of the AMSSTAR 2.0 varied based on each different study methods. Response: Please see our response to Q1a. A systematic review may not include a meta-analysis, while a meta-analysis is always a type of systematic review. In our study, we included systematic reviews with meta-analyses or meta-analyses alone, which means they were all systematic reviews, and all contain meta-analyses. In essential, they are the same. 4. The revised version of the manuscript STILL includes numerous reviews within the 169 selected that do not concern healthcare interventions. Some of them are listed below: articles. Please elaborate on the rationale to present this table in the main text. Response: We presented table 1 in the main body because we believe it will be useful for readers to understand different types of meta-analyses. According to your suggestion, we now put it into the supplementary file (Table S1).
For the citations in the examples, we did not use the 169 articles, as you can see in the appendix data file that the majority of the 169 articles were generic meta-analysis and only 6 were special type meta-analysis (5 network meta-analyses, and 1 coordinate-based meta-analysis), which we were not suitable to show the different types of meta-analysis. 6. The process of study selection, analyses, visual presentation of the results, as well as citations of included studies can only be found in the supplementary material. This creates difficulty to the readership to follow research analyses and results reported by researchers. Transparent reporting, linkage of information that made a basis for your reporting, included but not limiting raw data that made aggregation in Table 2, Table S1 (baseline characteristics, standard meta-analyses OR special type meta-analyses, study origin, reporting guidelines, funding information, etc.) should be available. For example, when a reader faces statement like this one: "There were 90 (53.25%) SRMAs that adequately used a risk of bias assessment for all of the important biases…. The remaining 42 (24.85%) failed to achieve minimal requirements, of which, 5 did not report the assessment results", he/she would like to connect these statements to specific studies, which is currently not possible. Not only reporting numbers do not make any application, but also the extent of risk of bias is unknown. It is anticipated that included studies are cited in the main body of the manuscript and are linked to reporting of the results. Response: We now present the raw data of study baseline characteristics in the supplementary file (appendix 3). In addition, the process of study selection ( Figure 1) and sensitivity analyses (Table 3) are now in the main text.
The included studies were not cited in the main body due to the restriction (A maximum of 100 references requested by the journal's policy) on the number of references.
7. Your limitation section should be revisited. Qualitative studies, non-systematic reviews, publications within other but major journals, and systematic reviews of observational data were falling outside the score of your research, and therefore it is unclear why you reported these features within your limitation section. The focus of your limitation section should be on the credibility of your research, search criteria, validation of keywords for searches, omission of attention to the risk of bias assessment of the individual studies that were included in each systematic review with meta-analyses, type of intervention, etc., and overall " methodological survey" design and conduct, to let the reader be informed how well your results, including pooled estimates, were protected against misleading results. Response: We have revised the limitation section accordingly, we now focus on the credibility and validity of our study.