Methodology quality and grading recommendations in cancer screening guidelines: a systematic review

Introduction: Cancer prevention and screening guidelines are ideally suited to the task of providing high-quality and effective screening in clinical practice. We systematically reviewed cancer screening guidelines for recommended cancer prevention and screening interventions, in order to provide recommendations for development and implementation of high-quality guidelines. Methods: We included cancer screening and prevention recommendations mainly on lung, breast, gastric, liver, colorectal, and prostate from Pubmed, Embase, Web of Science, China Knowledge Resource Integrated Database, Wanfang Data, SinoMed, and then searched in other organizations’ website and database such as the United States Preventive Services Task Force, the American Cancer Society, the American College of Physicians, the National Comprehensive Cancer Network, the National Institute for Health and Care Excellence, European Society for Medical Oncology, and the National Guidelines Clearinghouse until June 30, 2018. An abstraction form to code information according to AGREE II was made and four researchers completed separately. The primary outcome was each recommendation’s quality; the second outcome was benefit-harm “comparability” rating, based on how benefits and harms were presented. Results: There are no high-quality guides in the 19 guides because each domain 3 didn’t reach 60 points, and only 2 guides scored above 50 points. 10 recommendations 9 guidelines were included, year from 2010 to 2018. The majority of guidelines (67%) were supported by a systematic review and that most guidelines had explicit criteria for rating the quality and strength of evidence (64% and 73%, respectively). 7 guidelines clearly mentioned benefits and harms, and 2 only the of the terms of a relative risk reduction (larger while the of an absolute risk ConclusionsOur findings led us to consider potential contributors to the lack of clarity in guidelines. We recommend the use of “summary of findings” tables, an approach proposed in a series of papers from the GRADE guidelines group, as the best method of summarizing and presenting outcome information.


INTRODUCTION
Clinical practice guidelines (CPG) are systematically developed statements to assist practitioner and patient decisions about appropriate healthcare in specific clinical circumstances. The Institute of Medicine's standards for guideline development identifies that "a clear description of potential benefits and harms should be provided for each recommendation in a clinical practice guideline" [1]. They are intended to facilitate more consistent, effective and efficient medical practice, and improve health outcomes [2,3]. CPGs aimed at directly influencing patient, clinician and policy maker decision making. They are also becoming the basis of quality of care measures that are likely to affect urologist reimbursements with pay for performance measures on the horizon [4][5][6].
Although clinicians' perceptions about the magnitude of benefits and harms help shape their recommendations when discussing cancer screening with patients. It is useful and necessary for clinicians if guidelines are in good quality and reliability.
When the qualities of clinical guidelines are not presented in a balanced fashion, testing and treatment decisions can be adversely affected [7]. However, there are not any evidence-based guidelines in cancer screening in China presented and guided [8]. How these important clinical resources present recommendations has not been previously evaluated and extended to cancer screening in China. In order to know the quality of the guidelines in cancer screening and give the recommendations to Chinese guideline makers, we systematically examined how guidelines present quality information according to AGREE II and RIGHT.

METHODS
We set an investigative group including 16 researchers and assistants on training, searching, evaluating, discussing guidelines and drafting the manuscript. The group had a kick-off and arrangement meeting in April 2018. The group leader (LJ) and creators in-chief (DM, CWQ, SHC, CYL) discussed the implementation process.
During the meeting, the search strategy prepared for guidelines related to cancer screening pre-retrieved in the United States Preventive Services Task Force (USPSTF) and decided the final version.
We included the target cancer whether the incidence and mortality was within the top ten from the International Agency for Research on Cancer database and most concerned [9]. We also documented whether life-years or quality-adjusted life-years lost because of the target cancer were mentioned. Finally, lung, breast, gastric, liver, colorectal, and prostate cancer screening guidelines were decided to evaluate in our systematic review.

Source of materials
For the purposes of this study, all recent clinical practice guidelines/ recommendations for organizations around the world were search electronically in We included guidelines with sufficient evidence reported based on systematic review and meta-analysis and scaled by Grading of Recommendations Assessment, Development and Evaluation (GRADE) or any other evidence scaled tools. All guidelines/recommendations discussing the high-risk evaluation, technology, benefits and harms of cancer screening with population/patients were also included as "positive recommendations" "positive guidelines". We excluded statements with insufficient evidence, focusing solely on behavioral counseling or children/adolescent populations. Recommendations/guidelines often can appear in multiple forms including summary, document, and published manuscript within a medical journal. We'd better to to find and evaluate the most complete version available. If several versions of a guidelines/ recommendations from an organization were available, we formally reviewed only the most recently published version.

Instruments
All methodological quality of guidelines were assessed by the AGREE II instrument within six main domains: 1) scope and purpose; 2) stakeholder involvement; 3) rigor of development; 4) clarity of presentation; 5) applicability; and 6) editorial independence. Each domain is divided into smaller categories called items, with a Services Task Force Recommendation" available from the USPSTF. After training, they can review the eligible guideline documents in their entirety, including abstracts and online appendices and tables, to abstract each specific recommendation within the document. LJ and CYL also trained these four assistants how to use the AGREE II instrument in Chinese and English versions. They also needed to complete the online AGREE II overview tutorial and practice exercise.
We pilot-tested the tool's use in two separate sets of the guideline before the entire evaluation and grading. Each assistant independently scored the guidelines using the AGREE II items and then assigned an overall quality rating to the guidelines using the same 7-point scale. They also indicated whether: recommended its use by clinicians for practice, recommended its use with some modifications, or did not recommend its use. When assistants met problems, they would discuss with the PI and creators in-chief timely. Two (LJ and LN) then reviewed each recommendation/guideline to ensure all were correct. D a t a E x t r a c t i o n a n d M a n a g e m e n t Two assistants (LJ and YM) used the software Endnote X to manage and storage the full text of each eligible recommendation/guideline. The goal of this study was to evaluate the quality especially the method using by guideline developers. Two research assistants (WXQ and FXS) used a standardized form which was discussed and made by creators in-chief to retrieve quality information according to AGREE II and both quantitative and qualitative data from each guideline/recommendation independently. Then two (LJ and CYL) reviewed the abstracted information to check the extraction. Differences were highlighted and resolved by other group researchers, consulting the guideline document as needed.

G r a d i n g o f t h e i t e m s a n d d o m a i n s
The AGREE II user manual was used by appraisers to refer to the criteria for each item. A grade was assigned for each item using a 7-point scale indicating the level of agreement with each statement about the methodological quality of the guidelines/ recommendations. A score of 1 was given if none of the criteria for an item were met or the item was reported very poorly, while a score of 7 was given for an item if it met all the criteria and was well reported. The domain score was calculated using the sum of the item scores in a given domain and transforming the number into a percentage of the maximum score that domain could obtain. For this review, a high quality guidelines/recommendations required a score of 60% for rigor of development (domain 3) as well as 60% in any two other domains. Discrepancies were resolved by consensus after discussion among reviewers.
A n a l y s i s We used descriptive statistics (eg, means, standard deviations, frequencies) to summarize the guideline data and the quality information based on AGREE II. The quality by guideline type, guideline year, and target cancer to evaluate for any patterns that might emerge were tabulated.
All assessments were based on the published full text versions of the guidelines/recommendations and on any supporting documentation as referenced.
For our hypothesis we performed descriptive statistics and nonparametric tests using SAS 9.4 to calculate the intraclass correlation and the variation for each of the 23 AGREE II questions as a measure of interrater reliability. Intraclass correlation was considered poor, fair, good and excellent for values in the range of less than 0.4, 0.40 to 0.59, 0.60 to 0.74 and 0.75 to 1.0, respectively.

Summary and Characteristics of Guidelines
Finally, 10 recommendations or consensus, 9 guidelines were included. Table 1 showed them with a variety of different cancers for screening, publication (updated) years, guideline titles, organizations, whether based on systematic review or metaanalysis, whether using GRADE, whether considering benefits and harms etc. The correlation was reported 0.81. Only four guides used GRADE for evidence quality grading, and three used other evidence grading methods or revised existing grading methods. 14 guides refer to benefits and harms that take into account the screening program.

Methodological quality evaluation results
The quality of the 19 guides included in the study was not high, and the average score of only 2 guides was 50. See Table 2. The methodological quality of the guides from China was poor, especially in the scores of Fields 3 and 6. Few guidelines can use a systematic approach to the retrieval and selection of evidence and fail to perform external expert reviews. In the disclosure of conflicts of interest, China's guidelines have not yet formed a sense of reporting relevant content. Although the time span is large, it is more concentrated in the past two years. The average of 3.6 years of publication, the information is outdated, and some areas of the guidelines have not been well updated, so even the quality of the guide is better. High, but the release or update time is too long, and its guiding role in clinical practice will be greatly reduced [11]. Each guideline elaborates on the target population or high-risk groups, and satisfies the basic conditions for the publication of the guidelines, but the scientific nature of the research is uneven.
Only a small number of guidelines can systematically collect, screen, and evaluate existing literature using systematic review methods, and develop clinical guidelines with evidence-based medicine support. A small number of guidelines evaluate the evidence that is included in order to facilitate readers to make trade-offs, but only a few guidelines. In the future development of guidelines, scholars need to do more in terms of systematic reviews and evidence grading.
As a basis for the development of medical and health work, the quality directly affects the quality of health care services and plays an important role in the health service system [12]. In addition to the content related to the systematic review, there are still many problems in the included guidelines. The ultimate goal of the guide is to benefit the target population, but the 14 guidelines do not take into account the views and choices of the target population and lack ethical support. The purpose of the development guide is to serve health practice and to provide specific and practical guidance, so applicability is one of the important indicators of the evaluation guide. Only 3 of the included guides refer to comments or tools for applying recommendations, and 11 guides report potential related resources when applied, and the overall applicability is not high. Guideline development requires a certain amount of funding to be successfully completed, and conflicts of interest between funders and guideline developers become the most common source of bias in the guideline development process, and guidance as a guiding document for health practice must be objective and independent. So disclosure of conflicts of interest in the development of guidelines is increasingly important to improve the transparency of the guidelines. Only 9 of the 15 guides reported funding sources and conflicts of interest, and most Chinese guidelines did not mention conflictrelated content.
Presenting high-quality quantitative estimates is possible for the cancer prevention services we reviewed. One reason may be a belief among guideline developers that recommendations should be as simple as possible. However, that reasoning does not justify omitting this information from the text and appendices. Another reason might be a fear among guideline developers that too much transparency may lead individual clinicians and patients to decide against utilizing a preventive service when guideline developers believe the service is valuable. Finally, there may be a belief that from a population perspective any cancer screening or prevention service with a statistically significant improvement in mortality is worthwhile. More broadly, the GRADE group now recommends that guideline developers present much quality table. The guidelines reviewed in this study were largely written prior to these GRADE recommendations. It is possible that future cancer screening and prevention guidelines may put more emphasis on the way benefits and harms are presented.
Ideally, our study will motivate guideline developers to consistently include outcome summary tables similar to the ones the GRADE group recommends.
Prior to quantifying estimates of absolute effect for different outcomes, guideline groups must identify what the important outcomes are and determine the quality of evidence for each outcome. Determining estimates of absolute effect for each important outcome is a critical next step. The focus of our paper is how guidelines present these outcome estimates.
This paper focuses on how benefits and harms are presented to clinicians. An indepth discussion of the extent to which clinicians should present numerical information to patients and how best to communicate this information to patients is beyond the scope of this paper. It is often argued that patients do not understand numbers and thus it might be better to avoid providing risk information as it might be too confusing. However, there are many articles that have provided advice on how to make complicated risk statistics easier for patients (and likely their providers) to understand.
Although absolute effects are difficult to know precisely, firm recommendations should not be made without guideline developers and clinicians at least estimating how big the absolute benefits and harms are most likely to be. Groups recommending interventions for asymptomatic individuals should strive to clearly present absolute estimates for the chance of benefit and harm. The GRADE approach for creating summary of findings tables within guidelines-which includes how to present measures of absolute effect for both benefits and harms-could help standardize this task. Without access to transparent risk information on all clinically important outcomes, clinicians and policy makers cannot properly judge the tradeoffs. Moreover, without such information clinicians cannot fully engage patients in true shared decision-making.

Ethics approval and consent to participate
An ethical exemption were applied.

Consent to publish
The "consent to publish" was submitted as additional supporting files.

Availability of data and materials
This section was presented within the additional supporting files.

Funding
National Natural Science Foundation of China 81602930.

Conflict of interest
None.