Quality assessment of kidney cancer clinical practice guidelines using AGREE II instrument

Abstract Background: Evidence-based guidelines are expected to provide clinicians with explicit recommendations on how to manage health conditions and bridge the gap between research and clinical practice. However, the existing practice guidelines(CPGs) vary in quality. This study aimed to evaluate the quality of CPGs of kidney cancer. Methods: We systematically searched PubMed, Embase, China Biology Medicine disc, and relevant guideline websites from their inception to April, 2018. We identified CGPs that provided recommendations on kidney cancer; 4 independent reviewers assessed the eligible CGPs using the Appraisal of Guidelines for Research and Evaluation (AGREE II) instrument. The consistency of evaluations was calculated using intraclass correlation coefficients (ICC). Results: A total of 13 kidney cancer CGPs were included. The mean scores for each AGREEII domain were as follows: scope and purpose—76.9%; clarity and presentation—76.4%; stakeholder involvement—62.8%; rigor of development—58.7%; editorial independence—53.7%; and applicability—49.4%. Two CPGs were rated as “recommended”; 8 as “recommended with modifications”; and 3 as “not recommended.” Seven grading systems were used by kidney cancer CGPs to rate the level of evidence and the strength of recommendation. Conclusions: Overall, the quality of CPGs of kidney cancer is suboptimal. AGREE II assessment results highlight the need to improve CPG development processes, editorial independence, and applicability in this field. It is necessary to develop a standardized grading system to provide clear information about the level of evidence and the strength of recommendation for future kidney cancer CGPs.


Introduction
An estimated 62,700 Americans were diagnosed with kidney cancer and 14,240 died of the disease in 2016. [1] The vast majority (greater than 90%) of kidney cancers are renal cortical tumors known as renal cell carcinoma (RCC). [2] RCC comprises approximately 3.8% of all new cancers in the western world; the detection rate of RCC has been increasing in the past 10 years by approximately 1.7% per year. [3] Since 2005, a number of new targeted agents have come into the market for the treatment of this disease. [4] Although many of these therapies showing promising outcomes with improved progression-free survival and overall survival, diagnosis, treatment, and management of kidney cancer still remain the major challenge for clinicians. Therefore, kidney cancer clinical practice guidelines (CPGs) drafted by local, national, and international organizations have been developed to standardize clinical practice and improve effectiveness of management. Ideally, evidence-based guidelines are expected to provide clinicians with explicit recommendations on how to manage health conditions and bridge the gap between research and clinical practice. [5] However, the existing CPGs vary in quality and comprehensiveness, leading to difficulties with standardization of care, adaptation, and implementation, particularly in resource-limited settings. The usefulness of guidelines primarily depends on the quality, rigorous methodology, and transparency of development. [6] It is important to determine whether the recommendations are, indeed, based on high-quality evidence. [7,8] At present, there is no literature comparing and evaluating the strengths and weaknesses of all available CPGs for the treatment of kidney cancer.
We aimed to assess and summarize the quality of all currently available international kidney cancer CPGs by conducting a critical review using the Appraisal of Guidelines for Research and Evaluation (AGREE) II instrument. [9] We sought to identify gaps limiting evidence-based practice, and highlight potential opportunities for improvement.

Materials and methods
We conducted a comprehensive evaluation of kidney cancer CPGs using the AGREE II instrument, and the study was performed according to the guidelines from Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) [10] and some related studies. [11][12][13] As it is a review of the previous works of literature, approval of the ethics committee was not required.

Search strategy
PubMed, Embase, and China Biology Medicine disc databases were systematically searched up to April, 2018. We combined the terms "kidney cancer," "renal cell carcinoma," "renal tumor," and a filter to identify guideline documents (practice guideline [pt] OR guideline [pt] OR guideline * [ti]). We also searched the websites of guideline development organizations: Guidelines International Network Web site (http://www.g-i-n.net/), National Institute for Health for Health and Care Excellence website (https://www.nice.org.uk/guidance), National Guideline Clearinghouse (https://guidelines.gov/), Scottish Intercollegiate Guidelines Network (http://www.sign.ac.uk/), Clinical Practice Guidelines Portal website (https://www.clinicalguidelines.gov. au/), New Zealand Guidelines Group website (https://www. health.govt.nz/), BCGuidelines website (http://www.bcguide lines.ca/alphabetical), AQuMed Database website (http://www. aezq.de/aezq/publications). In addition, we searched Google Search Engine and checked the references of all the related guidelines to include more potential guidelines.

Inclusion and exclusion criteria
The inclusion criteria were as follows: complete guideline text is available in English; guideline contains recommendations regarding kidney cancer interventions; and the guideline should be published after 2008. If the guideline had been updated, only the most recent version was assessed. For every guideline ultimately included, we thoroughly searched for accompanying technical and supporting documents to better inform our assessments. The following studies will be excluded: duplicate guidelines, guidelines for patients, editorials, secondary or multiple publications, and short summaries.

Guideline screening and data extraction
Two authors (L.M.X. and Y.P.J.) independently identified search results to determine eligibility guidelines, and extracted the basic information from included guidelines. Disagreements were resolved by consulting the third expert adjudicator (L.Y.X.).

Quality appraisal of guidelines
Four independent reviewers evaluated the quality of each kidney cancer CPG according to AGREE II instrument, [14] which includes 23 items on a 7-point Likert scale across 6 domains. Each domain captures a unique dimension of the CPG quality: scope and purpose, stakeholder involvement, rigor of development, clarity and presentation, applicability, and editorial independence. Items were scored based on a scale ranging from 1 (absence of item) to 7 (item is reported with exceptional quality). The standardized score for individual domain, which ranged from 0% to 100%, was calculated using the following formula: (actual score À minimal possible score)/(maximal possible score À minimal possible score) Â 100%. AGREE II protocol [14] states that no overall score is calculated to determine if a CPG is recommended or not recommended. Each guideline was classified as: "recommended" for overall scores >60%, "recommended with modifications" for scores between 30% and 60%, and "not recommended" for scores <30%. [15]

Strength of recommendation and level of evidence
We extracted the level of evidence and the strength of recommendations of each kidney cancer guideline if they adopted evidence grading systems.

Statistical analysis
We calculated the standardized score of each domain for individual included CPGs, and determined the number of recommendations and the percentage distributions among quality of evidence and strength of recommendation classes. Agreement among 4 appraisers' scores was tested using intraclass correlation coefficients (ICCs) with 95% confidence interval (CI) for each domain of all included guidelines. [16] As a previous study described, [17] the ICCs between 0.01 and 0.20 were considered minor, 0.21 to 0.40 fair, 0.41 to 0.60 moderate, 0.61 to 0.80 substantial, and 0.81 to 1.00 very good. A value of P <.05 indicated statistical significance. All tests were 2-sided. Statistical analyses were conducted using Excel2010 and SPSS version 21.0 (SPSS Inc., Chicago, IL). Figure 1 shows the flow how we identified and selected the guidelines. The initial search yielded 1313 titles and abstracts, of which 126 were excluded as duplicates and 1108 were removed after reviewing abstracts. Full text identified was then performed on a total of 79 articles, of which only 13 [2,4,[18][19][20][21][22][23][24][25][26][27][28] met inclusion criteria.
3.3. CPG quality assessment (AGREE) 3.3.1. Consistency. The ICC values indicated that the overall agreement among 4 appraisers received higher reliability scores, ranging from 0.57 to 0.92 ( Table 2). The ICCs for the AGREE appraisal conducted by the 4 reviewers was lowest in the "applicability" domain (0.57), highest in the "rigor of development" domain (0.92), and the overall assessment was 0.79, which indicated the intrareviewer item score agreement was good. Domain scores of the AGREE II quality assessment are illustrated in Table 2   3.3.8. Overall assessment. This assessment concerns "the rating of body quality of the guidelines and whether the guideline would be recommended for use in practice." According to the appraisal of the individual domains and overall scores, 2 kidney cancer guidelines overall scored >60%, and were rated as "recommended" by the appraisers; 8 were rated as "recommended with modifications"; and 3 as "not recommended" (Table 3).
3.3.9. Level of evidence and strength of recommendation.

Discussion
The study evaluated the quality of kidney cancer CPGs published after 2008, and 13 kidney cancer CPGs were included. Two guidelines were rated as "recommended," 8 as "recommended with modifications," and 3 as "not recommended." Seven grading systems were used by kidney cancer CGPs to rate the level of evidence and the strength of recommendation. There may exist some kidney cancer CPGs published before 2008, [29,30] and were not updated, but the recommendations in those guidelines had been outdated and could not be used in practice according to IOM statements of CPGs. [31] Hence, we did not include these CPGs in this review. Among the 13 kidney cancer CPGs included, the highest mean scores were achieved in scope and purpose, stakeholder involvement, and clarity and presentation, whereas the main weaknesses across kidney cancer The appraisal CPGs obtained the lowest score in applicability domain, suggesting that guideline developers have not paid sufficient attention to potential barriers affecting practical implementation of recommendations. Therefore, it is recommended that there should be a pilot test for the applicability of new guidelines before the release of clinical practice to ensure their feasibility. Guideline groups should provide recommendations and address the barriers as much specificity as the evidence permits. [32] The guideline developed by AUA (2017) [2] was recommended (scoring 68.8%) in our appraisal as a good example in future guideline development for this domain.
Kidney cancer CPGs also performed poorly in editorial independence domain, information related to potential conflicts of interest was scarce or not even mentioned, especially the guidelines developed by SEOM, CIRSE, SOS, and AOS. Because the conflicts of interest are the most common source of bias and often under-reported, CPG developers should explicitly declare whether potential conflicts of interest (such as between editorial board and pharmaceutical or medical device manufacturer) will impact on guideline drafting, including the rigorous vetting process and the transparent and available rules for review. Recently, some studies have reported that developers of CPGs were affected by pharmaceutical or medical device manufacturers, so it is important to know how much these interactions could have affected the recommendations. [33,34] Rigor of CPGs mainly focuses on the methodological process of guidelines development, because this domain can better reflect the quality of CPGs than the other 5 domains. Even though vast majority of guidelines contained references, many did not explicitly describe literature search and selection methods, and were ambiguous regarding how to appraise evidence and formulate recommendations. This step is crucial to determine whether the recommendations really depend on the best available evidence. The low score might be caused by the poor methodology and reporting, or unfamiliarity with criteria of CPG development, or missing performance of external peer review and updating process.
As we all know, adaptation of existing guidelines to clinical practice may be a more valid and cost-effective means of achieving high-quality guidelines worldwide. [35] To achieve this aim, the majority of guidelines applied grading systems to rate the quality of evidence so as to communicate clear message, quickly and concisely to help guideline users, readers, and stakeholders to  [4] 68.00 60.00 50.00 60.00 28.00 46.00 Recommended with modifications CIRSE, 2016 [20] 71.00 64.00 51.00 84.70 67.70 25.00 Not Recommended SEOM, 2017 [18] 63.00 24.00 39.00 64.00 3.00 25.00 Not recommended SOS, 2015 [25] 67  Table 4 Grading systems used in the included guidelines.  [21] GuideLines Into DEcision Support (GLIDES) High, intermediate, low, insufficient High-The available evidence reflects the true magnitude and direction of the net effect and that further research is very unlikely to change either the magnitude or direction of this net effect. Moderate-The available evidence reflects the true magnitude and direction of the net effect. Further research is unlikely to alter the direction of the net effect; however, it might alter the magnitude of the net effect. Low confidence that the available evidence reflects the true magnitude and direction of the net effect. Further research may change either the magnitude and/or direction this net effect. Insufficient evidence is insufficient to discern the true magnitude and direction of the net effect. Further research may better inform the topic. The use of the consensus opinion of experts is reasonable to inform outcomes related to the topic.
Strong, moderate, weak Strong-There is high confidence that the recommendation reflects best practice. This is based on strong evidence for a true net effect; consistent results with no or minor exceptions; minor or no concerns about study quality; and/or the extent of Expert Panelists' agreement. Other compelling considerations (discussed in the guideline's literature review and analyses) may also warrant a strong recommendation.
Moderate-There is moderate confidence that the recommendation reflects best practice. This is based on good evidence for a true net effect; consistent results, with minor and/or few exceptions; minor and/or few concerns about study quality; and/or the extent of Expert Panelists' agreement. Other compelling considerations (discussed in the guideline's literature review and analyses) may also warrant a moderate recommendation. Weak-There is some confidence that the recommendation offers the best current guidance for practice. This is based on limited evidence for a true net effect (eg, benefits exceed harms); consistent results, but with important exceptions; concerns about study quality; and/or the extent of Expert Panelists' agreement.
Other considerations (discussed in the guideline's literature review and analyses) may also warrant a weak recommendation.

findings)
Grade C (RCTs with serious deficiencies of procedure or generalizability or extremely small sample sizes or observational studies that are inconsistent, have small sample sizes, or have other problems that potentially confound interpretation of data). understand the confidence of estimate of the effects and the strength of recommendations. The confidence of estimate of the effects reflects the extent to which confidence in an estimate of the effect is adequate to support a particular recommendation. Also, the strength of guideline recommendation reflects the extent of collective confidence that adherence to the recommendation will do more good than harm. [36,37] However, we found different grading systems with various systems of codes were used to rate evidence and recommendations in kidney cancer CPGs, which could confuse the guideline users to apply these guidelines. Therefore, it is important to develop a standardized grading system to provide clear information about the level of evidence and the strength of recommendation for kidney cancer CPG users, and good news is that we find some guideline organizations such as the American Urological Association (AUA) begin to adopt GRADE system instead of old systems in their new version of guideline development handbooks. [2,26] There are several strengths of our findings. On the one hand, the strength of recommendations and level of evidence of each kidney cancer guideline were carefully extracted if these guidelines adopted evidence grading systems, which may indicate the overall quality of kidney cancer guidelines; On the other hand, our authors have different academic backgrounds, including methodological and medical experts, which ensured the reliability of our conclusions.
Inevitably, our study has some limitations: Firstly, we only included guidelines published in English; guidelines for some other languages are not included and may affect the universality of the results. Secondly, AGREE II instrument places emphasis on methods of guideline development and the transparency of reporting, but could not assess potential impacts of recommendations on patient outcomes. [38,39]

Conclusions
Our analysis of current CPGs for the kidney cancer revealed that methodological quality of CPGs was acceptable, but there is still plenty of space for improvement, especially in the editorial independence, applicability, and rigor of development in the CPG development. Kidney cancer CPGs should develop recommendations with the evidence of high quality, while minimizing bias with compelling methodological rigor, openness, and transparency. If possible, CPGs should underline the demand for additional studies to close the gaps in clinical care that has a significant effect on patient outcomes.   [20] CIRSE 1a, 1b, 2a, 2b, 3a, 3b, 4, 5 1a-Evidence from systematic review or meta-analysis of randomized controlled trials 1b-Evidence from at least one randomized controlled trial 2a-Systematic reviews (with homogeneity) of retrospective cohort studies 2b-Individual retrospective cohort study or low quality randomized controlled trial 3a-Systematic review (with homogeneity) of case-control studies 3b-Individual case-control study 4-Case series 5-Evidence from a panel of experts SOS, 2015 [26] EL-1, EL-2, EL-3