A critical review to traumatic brain injury clinical practice guidelines

Abstract The aim of this study was to assess the quality of clinical practice guidelines of traumatic brain injury (TBI) and investigate the evidence grading systems. A systematic search of relevant guideline websites and literature databases (including PubMed, NGC, SIGN, NICE, GIN, and Google) was undertaken from inception to May 2018 to identify and select TBI guidelines. Four independent reviewers assessed the eligible guidelines using the Appraisal of Guidelines for Research and Evaluation (AGREE II) instrument. The degree of agreement was evaluated with intraclass correlation coefficient (ICC). From 1802 records retrieved, 12 TBI guidelines were included. The mean scores for each AGREE II domain were as follows: scope and purpose (mean ± SD= 74.2 ± 9.09); stakeholder involvement (mean± SD= 54.6 ± 11.6); rigor of development (mean ± SD=70.1 ± 13.6); clarity and presentation (mean ± SD=78.4 ± 11.5); applicability (mean ± SD= 60.5 ± 13.6); and editorial independence (mean ± SD=61.7 ± 14.8). Ten guidelines were rated as “recommended.” The ICC values ranged from 0.73 to 0.95. Seven grading systems were used by TBI guidelines to rate the level of evidence and the strength of recommendation. Most TBI guidelines got a high-quality rating, whereas a standardized grading system should be adopted to provide clear information about the level of evidence and strength of recommendation in TBI guidelines.


Introduction
Traumatic brain injury (TBI) is one of the leading causes of death and disability in both developing and developed countries, with the highest incidence among young people <30 years of age. [1,2] Clinical practice guidelines (CPGs) have been developed by various organizations from different countries to improve patient's outcomes of TBI; the brain trauma community's approach to guideline development has evolved as the science and application of evidence-based medicine advanced.
During the past 20 years, >30 TBI guidelines have been developed and updated from different organizations. [3] However, TBI guidelines vary in quality, comprehensiveness, and grading system, leading to difficulties with standardization of care, adaptation, and implementation. Despite this, a major criticism of the TBI guidelines is that they may not be appropriate for use in all locations due to differences in available resources. Although some previous studies have evaluated quality of existing TBI guidelines, they just have focused on the subsets of TBI severity such as mild TBI only [3][4][5] or reviewed only a limited number of TBI guidelines, [6] and none of them focused on the grading systems of TBI guidelines adopted, which actually is very important to help guideline users, readers, and stakeholders to understand the confidence of estimate of the effects and the strength of recommendations. Moreover, according to Institute of Medicine (IOM) statement of guideline, [7] some old TBI guidelines (published before 2007) evaluated by these studies had been abandoned in clinical practice as the recommendations were outdated.
Hence, we conducted this study to assess and summarize the quality of all currently available international TBI guidelines by conducting a systematic review using the Appraisal of Guidelines for Research and Evaluation (AGREE) II instrument. [6] We also compared the codes of evidence quality and strength of recommendation among different TBI guidelines.

Study design and protocol
This study conducted a comprehensive review of clinical guidelines using the AGREE II instrument. This study was performed in accordance with the guidelines from preferred reporting items for systematic reviews and meta-analyses (PRISMA). [8]

Identification of guidelines
Systematic searches were performed in PubMed database from inception to May 31, 2018 ). We also searched the websites of guideline development organizations, NICE (https://www.nice.org.uk/guidance) and SIGN (http:// www.sign.ac.uk/), and guideline databases such as GIN (http:// www.g-i-n.net/) and NGC (https://www.guideline.gov/). Besides, we searched Google Search Engine as well as the references of all the obtained guidelines to include more potential guidelines.
Two reviewers (DBS and WKP) independently evaluated search results to determine inclusion or exclusion of references and extracted the general characteristics of each guideline. Disagreements were resolved by consensus or by consulting the third expert adjudicator (GTK).

Selection of guidelines
The inclusion criteria were as follows: complete guideline text is available in English; guideline contains recommendations regarding TBI interventions; and the guideline should be published after 2007. If the guideline had updates, only the most recent version was assessed. For every guideline ultimately included, we thoroughly searched for accompanying technical and supporting documents to better inform our assessments. The following literatures will be excluded: duplicate guidelines, guidelines for patients, editorials, translations of guidelines, secondary or multiple publications, and short summaries. For multiple versions of guidelines, only the newest guidelines were included in the analysis and the older versions were excluded.

Quality appraisal of guidelines
We used the latest version of the AGREE II instrument to evaluate each TBI CPG meeting our inclusion criteria. According to AGREE II handbook, each CPG was scored on 23 items within 6 domains. Domain 1 (scope and purpose) is divided into 3 items: guideline objectives, health questions, and population application. Domain 2 (stakeholder involvement) is based on 3 items: guideline development group, preferences of target population, and target users. Domain 3 (rigor of development) includes 8 items: systematic methods used to search evidence, criteria for selection, strengths and limitations of the evidence, methods for formulating the evidence, health benefits and side effects of recommendations, explicit links between recommendation and supporting evidence, expert reviewers, and updating guideline for future use. Domain 4 (clarity and presentation) includes 3 items: recommendations are specific and unambiguous, different options for management, and key recommendations. Domain 5 (applicability) includes 4 items: facilitators and barriers, advice/ tools to implement recommendations into practice, resources for implications, and auditing criteria. Domain 6 (editorial independence) is based on 2 items: editorial independence from the funding body and conflicts of interest of the guideline development members.
In this study, each TBI guideline was scored by 4 independent reviewers (DBS, WKP, MWJ, and M.W.J) according to AGREE II user manual. Among the 4 reviewers, DBS and WKP are senior doctors of neurosurgery; MWJ and LY are methodologists in guideline development. Besides, YL had rich experiences in the application of AGREE II and published a study about using AGREE II to assess clinical guidelines, [9,10] DBS and WKP were trained to use the AGREE II instrument through the online tutorials on the AGREE website.
The user manual defines each item and assists the user in determining a guideline's score for that item. Items were scored based on a scale ranging from 1 (absence of item) to 7 (item is reported with exceptional quality). Domain scores were calculated by summing item scores within each domain from each reviewer, and then standardizing them as a percentage of the maximum possible score. AGREE II protocol states that no overall score is calculated to determine if a CPG is recommended or not recommended. Instead, guidelines in this study were recommend if the guidelines have >4 domains scoring >50%. [6] 2.5. Strength of recommendation and level of evidence The strength of recommendations and level of evidence of each TBI guideline were extracted if these guidelines adopted evidence grading systems.

Data analysis
We performed a descriptive statistics analysis using the calculation of the total score by each reviewer and the score per domain. The number of recommendations and the percentage distributions among quality of evidence and strength of recommendation classes were determined. Agreement between each reviewer's scores was tested using a 2-way ANOVA with single-rater 2-way intraclass correlation coefficients (ICCs) with 95% confidence interval (CI) for each domain across all guidelines. [11] According to a previous study, [12] the degree of agreement between 0.01 and 0.20 was deemed minor, 0.21 to 0.40 fair, 0.41 to 0.60 moderate, 0.61 to 0.80 substantial, and 0.81 to 1.00 very good. A value of P <.05 denoted statistical significance. All tests were 2-sided. Statistical analyses were conducted using SPSS version 19.0 (SPSS Inc., Chicago, IL).

CPG quality assessment (AGREE)
3.3.1. Consistency. The ICC values, which indicate the overall agreement between reviewers, generally received higher reliability scores. The ICC values for TBI guidelines appraisal using the AGREE II ranged from 0.73 to 0.95 ( Table 2). The ICCs for the AGREE appraisal conducted by the 4 reviewers were lowest in the "applicability" domain (0.73) but higher in the other 5 domains (all ≥0.75), which indicated the intrareviewer item score agreement was good. Domain scores of the AGREE II quality assessment are illustrated in Table 2.

Domain 1-scope and purpose.
This domain focuses on the overall objectives, expected benefits or outcomes and target population of the guidelines. The mean score of TBI guidelines in this domain is 74.2 with a SD of 9.09, and all guidelines scored >50. The lowest score was 60.1 which was from guidelines for the treatment of minor and severe TBI (RASH, 2008). The highest score was 88.9, from Brain Trauma Foundation and American association of Neurological Surgeons (BTF/AANS, 2017).

Domain 2-stakeholder involvement.
This domain contains items on the involvement degree of professional members, consideration of the views and preferences of the target population, and the definition of target users. Scores fluctuated remarkably with a mean score±SD of 54.6 ± 11.6. Five (38%) TBI guidelines scored <50%, of which the lowest was 40.2 from China (China, 2009). 3.3.6. Domain 5-application. This domain evaluates the consideration of facilitators or barriers when implementing the guidelines and the monitoring criteria. The mean score and SD of this domain was 60.5 ± 13.6, among which 3 TBI guidelines scored <50, with the lowest score of 39.1 from Brain Trauma Foundation (BTF, 2012). Table 1 The characteristics of included TBI guidelines.

Guideline ID Origin
Institution/guideline development group   [21] 78.9 60.9 80.7 90.2 77.6 70.8 NIHCE, 2014 [22] 79. 3.3.8. Overall assessment. This assessment concerns "the rating of body quality of the guidelines and whether the guideline would be recommended for use in practice." According to the appraisal of the individual domains and overall scores, 10 TBI guidelines had >4 domains scored >50, and rated as "recommended" by the appraisers ( Table 2).
3.3.9. Level of evidence and strength of recommendation.

Discussion
This critical review investigated the quality of TBI guidelines published after 2006. Although there exist some TBI guidelines published before 2006, [25][26][27][28] and still were not updated, the recommendations in the guidelines had been outdated and could not be used in practice according to IOM statements of clinical guidelines. [7] Hence, we did not include these guidelines in this review.
Our study included 12 TBI guidelines; across all TBI guidelines, the highest mean scores were achieved in clarity and presentation, scope and purpose, and rigor of development, whereas the main weaknesses across TBI guidelines were stakeholder involvement, applicability, and editorial independence. Brain Trauma Foundation/American association of Neurological Surgeons (BTF/AANS, 2017), Scandinavian Neurotrauma Committee (SCN, 2013), and Eastern Association for the Surgery of Trauma (EAST, 2015) were the 3 CPGs with best results. All of the TBI guidelines evaluated in this study were developed by high-income countries (HICs) and are therefore minimally applicable in resource-limited settings. Besides, the distribution of level of evidence and strength of recommendations varied significantly among different TBI guidelines.
Overall, the strong scores in the clarity and presentation, scope and purpose, and rigor of development domains have been reported in other systematic reviews evaluating TBI CPGs. [4,[29][30] This is likely attributed to the scientific rigor of developing a guideline, which typically involves a highly methodical approach. [31] In general, the guidelines that were more recently developed or updated, and those that had undergone numerous updates, most consistently demonstrated the highest quality by AGREE II scores.
Our analysis indicates an overall improvement in the above domains in the most current CPGs, consistent with other studies. [21][22][23][24] In a 2011 critical review of mild TBI guidelines by Tavender et al, [4] the NSW 2006 TBI guideline got a worse score of AGREE II in all domains with the exception of scope and purpose domain when compared with the updated 2011 version. [19] It is noteworthy to mention that new version of TBI guidelines also has the advantage of newer and more rigorous evidence-based medicine in addition to the availability of designing guidelines around the AGREE II format. Nevertheless, a frequently criticized area in our results, within the rigor of development domain, was the lack of procedures for updating the guidelines for quality improvement. Given the trend toward improved guideline quality with newer revisions, development of a quality improvement list may help to ensure quality of future TBI guidelines.
The AGREE II result may also associate with reporting of key information in guidelines, which indicated more attention should be paid to improve the reporting quality of guidelines. In 2016, the AGREE working group developed a new checklist for improving the reporting quality of CPGs, [32] which might be referred by TBI guidelines developers in the future.
Older reviews have demonstrated limited stakeholder involvement in TBI guideline development, a trend that persisted in new versions. [4,33] Although there have been progressive improvement in guideline development, the domains of stakeholder involvement, applicability, and editorial independence remain weak, specifically when it comes to piloting interventions, addressing potential costs, barriers to implementation, and auditing for quality improvement. Recent studies suggested that successful implementation of guidelines could improve patient outcomes [34][35][36] ; however, applicability of guidelines to a given locale based on factors, such as availability and cost of resources, provider skills, and population needs and values, is critically important for successful implementation of guidelines in a manner that will improve patient care. Consideration of stakeholder involvement and applicability are imperative considering these domains are intrinsically associated with CPG implementation and translation to other settings such as LMICs.
It has been suggested that adaptation of existing guidelines to local situations may be a more valid and cost-effective means of achieving high-quality guidelines worldwide. [37] However, the various codes of evidence level and strength of recommendation could bring challenges to reach the target. As we know, most guidelines, especially evidence-based guidelines, applied grading systems to communicate clear message, quickly and concisely so as to help guideline users, readers, and stakeholders to understand the confidence of estimate of the effects and the strength of recommendations easily. The confidence of estimate of the effects reflects the extent to which confidence in an estimate of the effect is adequate to support a particular recommendation. And the strength of guideline recommendation reflects the extent of collective confidence that adherence to the recommendation will do more good than harm. [38,39] However, in our study, different grading systems with various systems of codes were used to communicate grades of evidence and recommendations in TBI guidelines, which could confuse the guideline users when using these guidelines. Therefore, a standardized grading system should be established to provide a clear information about the level of evidence and the strength of recommendation for TBI guidelines users, and the good news is that we find some guideline organizations such as Scottish Intercollegiate Guidelines www.md-journal.com Table 3 Grading systems used in the included guidelines.

Codes of evidence and recommendation
Grading systems 1++ High-quality meta-analyses, systematic reviews of RCTs, or RCTs with a very low risk of bias 1+ Well-conducted meta-analyses, systematic reviews, or RCTs with a low risk of bias 1-Meta-analyses, systematic reviews, or RCTs with a high risk of bias 2++ High-quality systematic reviews of case control or cohort studies. High-quality case control or cohort studies with a very low risk of confounding or bias and a high probability that the relationship is causal 2+ Well-conducted case control or cohort studies with a low risk of confounding or bias and a moderate probability that the relationship is causal 2. Case control or cohort studies with a high risk of confounding or bias and a significant risk that the relationship is not causal  Table 3 (continued).

Codes of evidence and recommendation
Grading systems  Table 3 (continued).

Codes of evidence and recommendation
Grading systems Network (SIGN) and National Health and Medical Research Council (NHMRC) begin to adopt GRADE system instead of old systems in their new version of guideline development handbooks. [40,41]

Strengths and limitations
Our overall findings have some strengths. First, our authors were from different background consisting of clinical experts and methodologists with extensive experience in evaluating clinical guidelines, which improved the reliability of our findings. Second, different domains have been appropriately weighed to derive overall assessment and recommendation. Nonetheless, our study also has limitations. Exclusion of guidelines published in languages other than English, or other forms (i.e., books, booklets, or government documents), might have resulted in under-representation of guidelines from less developed countries. Third, AGREE II instrument focused on methods of guideline development and the transparency of reporting, but could not assess potential impacts of recommendations on patient's outcomes. [42,43] Furthermore, our study could not establish the causality between the poor performance and the characteristics of TBI guidelines, matching current guidelines to future guideline updates (in a cohort study) would allow for better assessment of guideline quality than did our cross-sectional assessment.

Conclusions
Most TBI guidelines got a high-quality rating. The high-quality domains were achieved in clarity and presentation, scope and purpose, and rigor of development. Our findings called for a standardized grading system to provide a clear information about the level of evidence and the strength of recommendation in TBI guidelines.