Blood-Based Protein Biomarkers for the Management of Traumatic Brain Injuries in Adults Presenting to Emergency Departments with Mild Brain Injury: A Living Systematic Review and Meta-Analysis

Accurate diagnosis of traumatic brain injury (TBI) is critical to effective management and intervention, but can be challenging in patients with mild TBI. A substantial number of studies have reported the use of circulating biomarkers as signatures for TBI, capable of improving diagnostic accuracy and clinical decision making beyond current practice standards. We performed a systematic review and meta-analysis to comprehensively and critically evaluate the existing body of evidence for the use of blood protein biomarkers (S100 calcium binding protein B [S100B], glial fibrillary acidic protein [GFAP], neuron specific enolase [NSE], ubiquitin C-terminal hydrolase-L1 [UCH-L1]. tau, and neurofilament proteins) for diagnosis of intracranial lesions on CT following mild TBI. Effects of potential confounding factors and differential diagnostic performance of the included markers were explored. Further, appropriateness of study design, analysis, quality, and demonstration of clinical utility were assessed. Studies published up to October 2016 were identified through searches of MEDLINE®, Embase, EBM Reviews, the Cochrane Library, World Health Organization (WHO), International Clinical Trials Registry Platform (ICTRP), and clinicaltrials.gov. Following screening of the identified articles, 26 were selected as relevant. We found that measurement of S100B can help informed decision making in the emergency department, possibly reducing resource use; however, there is insufficient evidence that any of the other markers is ready for clinical application. Our work pointed out serious problems in the design, analysis, and reporting of many of the studies, and identified substantial heterogeneity and research gaps. These findings emphasize the importance of methodologically rigorous studies focused on a biomarker's intended use, and defining standardized, validated, and reproducible approaches. The living nature of this systematic review, which will summarize key updated information as it becomes available, can inform and guide future implementation of biomarkers in the clinical arena.


Introduction
T raumatic brain injury (TBI) is among the most common neurological disorders worldwide, and globally, its incidence continues to rise. 1,2 According to the Centers for Disease Control (CDC) in the United States, over the past decade, rates of TBIrelated emergency department (ED) visits have increased by 70%. Most of these TBIs are classified as mild (mTBI), posing a substantial everyday workload. Clinical diagnosis remains a challenge, and CT is considered the diagnostic cornerstone used in the ED to rule out post-traumatic brain lesions and complement clinical assessment of patients with a possible mTBI. 3 However, it is generally acknowledged that CT is not always available, implies patient radiation exposure, and is relatively costly in terms of ED logistical burden and healthcare expenditures because of the small proportion of subjects (*10%) diagnosed as having actual traumatic intracranial lesions. 3,4 The need to manage patients with possible mTBI more effectively and efficiently-to reduce unnecessary CT scans and medical costs, while not compromising patient care and safety-has driven the quest for sensitive blood-based markers as objective parameters that can be easily and rapidly measured in the systemic circulation. Identification of biomarker signatures associated with distinct aspects of TBI pathophysiology may be also of clinical value for a more accurate characterization and risk stratification of TBI, thereby optimizing medical decision making and facilitating individualized and targeted therapeutic intervention. As such, over the past decades, a focused effort has been made to identify novel blood biomarkers for TBI, and a growing number of candidates has been described and proposed, [5][6][7][8] leading to the recent incorporation of S100B into the Scandinavian Neurotrauma Guidelines. 9 Nonetheless at present, the role of body fluid biomarkers in TBI is primarily relegated to research studies, and the provision of high quality evidence is paramount to meet regulatory requirements and support their adoption and routine use in clinical practice.
Meta-analysis can exploit the quantity of data collected in separate studies and provide the statistical power to assess more precise estimates of sensitivity and specificity, to determine influence of potential confounding factors on the biomarker diagnostic performance, and to detect differences in the accuracy of different marker tests. Hence, we conducted a systematic review and metaanalysis to comprehensively summarize and critically evaluate the existing body of evidence for the use of blood protein biomarkers for diagnosis of brain injury as assessed by CT in adult patients presenting to the ED after mild head trauma.
We focused on markers for which promising scientific evidence of analytical and clinical validity is available and which therefore, are likely to be rapidly transferable to clinical practice; namely, S100 calcium binding protein B (S100B), glial fibrillary acidic protein (GFAP), neuron specific enolase (NSE), ubiquitin C-terminal hydrolase-L1 (UCH-L1), and tau and neurofilament proteins. As TBI biomarker research and technological and analytical advances are dynamic, we felt that a living systematic review-a high quality, online review that is updated as new research becomes available 10would best fit our purpose. The ''living'' nature of such work will permit the potential inclusions and investigation of novel markers, marker combinations, and more refined diagnostic time windows for which relevant scientific literature/body of evidence will be gained.

Methods
This review is being prepared as a ''living systematic review,'' initiated in the context of the CENTER-TBI project (www.center-tbi.eu). [10][11][12] Following a predefined protocol registered on the PROSPERO database (registration number CRD42016048154), we conducted a systematic review and meta-analysis according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. 13

Information sources
We searched Ovid MEDLINE Ò (1946 to October 2016), OVID Embase (1980  Additional studies were identified by reviewing the reference lists of published clinical trials and relevant narratives as well as systematic reviews. Abstracts from relevant scientific meetings were also examined, and experts in the field were consulted for any further studies. Citations were uploaded into a web-based systematic review program (Covidence, Alfred Health Melbourne, Australia) (http:// www.covidence.org/).

Study selection
Two reviewers independently reviewed the title and abstract of each citation identified by the search strategy. In the second stage, the full text was reviewed and eligible studies selected. Any disagreement between the two authors was resolved through discussion, or where necessary, arbitration by a third party. Studies were included if the article met the prespecified list of eligibility criteria: studies enrolling adult patients presenting to the ED with a history of possible brain injury complying with any authors' definition of mTBI; report of the admission head CT findings; at least one quantitative measurement of the circulating biomarkers of interest (S100B, GFAP, NSE, UCH-L1, tau, and neurofilament proteins) on admission; and relevant accuracy data.
We included studies containing mixed populations; that is, participants with moderate and severe TBI (Glasgow Coma Score [GCS] <13) or pediatric populations. Studies were included irrespective of their geographic location and language of publication. We excluded studies using non-quantitative methods to assess biomarker concentrations (e.g., Western blot or explorative proteomics). Studies with small cohorts (< 50 participants) were excluded, given the high likelihood of their being underpowered and therefore impacting the reliability of findings.

Data extraction and assessment of methodological quality
Two reviewers independently extracted data using a standardized data abstraction form. We abstracted relevant information related to the study design, patient characteristics (demographic and clinical data, including indices of injury severity, presence of extracerebral injuries and polytrauma, and CT findings) and biomarker characteristics (concentrations, sampling time, cutoffs, and statistical levels of diagnostic accuracy [sensitivity and specificity]), analytical aspects of biomarker testing, and study limitations. Details regarding the definition of mTBI and CT abnormality were also extracted.
In the case of multiple studies from the same research group, authors were contacted to ensure that there was no overlap in patient populations. We also contacted authors for clarification of study sample, missing data, or ambiguity in the cutoffs used. If biomarker measurements were taken at multiple time points, we used the sample on admission for analysis.
The methodological quality of the included studies was independently assessed by two reviewers using a modified version of the tool for quality assessment of studies of diagnostic accuracy included in systematic reviews (QUADAS-2), 14 as recommended by the Cochrane Collaboration. Discrepancies were resolved through discussion or arbitration by a third reviewer.

Statistical analysis and data synthesis
The analysis includes a structured narrative synthesis. We constructed evidentiary tables identifying the results pertinent to diagnostic capabilities of the different biomarkers (detection of intracranial lesions as assessed by CT) and study characteristics for all included studies. We conducted exploratory analyses by plotting estimates of sensitivity and specificity from each study on forest plots and in receiver operating characteristic (ROC) space.
Where adequate data were available, we performed metaanalyses for each biomarker, to summarize data and obtain more precise estimates of diagnostic performance. For studies with diverse thresholds, we meta-analyzed pairs of sensitivity and specificity using the hierarchical summary ROC (HSROC) model, which allows for the possibility of variation in threshold between studies, and also accounts for variation among studies and any potential correlation between sensitivity and specificity. 15 For these analyses, we used the NLMIXED procedure in SAS software (version 9.4; SAS Institute 2011, Cary, NC). For studies that reported data at common prespecified cutoff values, we calculated the pooled estimates of sensitivity and specificity (clinically interpretable), by undertaking a random effects bivariate regression approach. 16 We explored heterogeneity through visual examination of the forest plot and the SROC plot for each biomarker. However, as there were insufficient studies, lack of individual data, and/or important variation across studies with simultaneous presence of factors with potentially diverging effects on biomarker accuracy estimates, we did not perform meta-regression (by including each potential source of heterogeneity as a covariate in the bivariate model) as planned.
Sensitivity analyses were performed to check the robustness of the results. We used Cook's distance to identify particularly influential studies, and checked for outliers using scatter plots of the standardized predicted random effects. Then, the robustness of the results was checked by refitting the model excluding any outliers and very influential studies. Sensitivity analyses were also conducted to investigate the impact on biomarker performance of studies including mixed populations, bias in the selection of participants, high prevalence of abnormal CT findings, and different definitions of TBI as assessed by CT.
Data processing and statistical analyses were conducted using Review Manager version 5.3 (Cochrane Collaboration, Copenhagen, Denmark) and STATA version 13.0 (StataCorp, Colleage Station, TX) including the user written commands METANDI and MIDAS.

Quality of the evidence
The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) 17 approach was used to assess the overall quality of evidence of the included biomarker tests. The results were summarized using GRADEPro software (Version 3.2, 2008).

Description of studies
Our search strategy identified a total of 7260 citations. Removal of duplicates resulted in 5567 distinct citations, of which 90 fulltext articles were assessed for eligibility, and 26 articles 3,18-42 were included in the systematic review ( Fig. 1, flow diagram of search and eligibility results, and Table 1). Tables 2 and 3 show the main characteristics of the included publications, and additional details are provided in Tables S1 and S2(see online supplementary material at http://www.liebertpub.com).
Two of the 26 included articles reported biomarker results from the same patient cohort. 34,43 All studies were published in 2000 or later. With the exception of one study published in French, 21 and one published in Italian, 24 all studies were published in English.
The total number of patients with TBI in the included studies was 8127, ranging from 50 28,37 to 1560 42 per study (median 170, interquartile range 104-258). Of those, 865 had positive CT scans, with an average prevalence of 17% (median 13%) (range 5-51%) ( Table 2). Table S2 shows the criteria used for the definition of TBI/ mTBI and positive CT scans (reference standard) in the different studies. In nine articles, the presence of a skull fracture was considered as a traumatic CT abnormality.
The reported mean or median age of the included patients ranged from 32 38 to 83 years, 39 with 10 studies including children and/or adolescents (patient age <18years). The total subject pool was largely male (median 63% across the studies), with the exception of the study by Thaler and colleagues, which was 68.7% female. 39 Two cohort studies included mild to severe TBI patients (GCS 3-15), 29,38 and two other cohorts included mild to moderate TBI patients (GCS 9-15). [34][35][36]40 Six studies enrolled TBI patients with multiple trauma and/or extracranial injuries ( Table 2). Nine of the included articles reported biomarker concentration from different types of control cohorts, including healthy individuals, or nonbrain-injured trauma patients (See Table 3 for details).
Most of the studies defined the specific time frame from injury to blood draw as an inclusion criterion, with the majority of the samples collected within 6 h of injury (16 studies) and with mean or median time ranging from 24.3 min 33 to 5 h (Table 3). 28 In one study, samples were collected within 12 h, 31 and in two studies, they were collected within 24 h. 29,38 A single marker was evaluated in most of the studies (n = 21), while one study simultaneously assessed three markers. 40 Of the eligible studies, 22 reported data on S100B (total number of TBI patients 7754), 4 reported data on GFAP (total number of TBI patients 783), 3 reported data on NSE (total number of TBI patients 314), and 2 reported data on UCH-L1 (total number of TBI patients 347). Fewer data were available for tau (one study that included only 50 patients), 28 and we found no studies evaluating neurofilament proteins that met our inclusion criteria.

Methodological quality
The assessments of the methodological quality and risk of bias of the included studies are presented in Figure 2 and Figure S1(see online supplementary material at http://www.liebertpub.com). Participants neither consecutively nor randomly enrolled, the use of vague definitions of mTBI, or inclusion of an unrepresentative spectrum of patients (pediatric population or patients with GCS <13) may lead to incorporation bias, thus limiting the conclusions that can be drawn by affecting the accuracy estimates and compromising the applicability of the results.
In half of the studies, thresholds were not prespecified, and ROC analyses were used to determine optimal cutoffs, likely resulting in an overestimation of the diagnostic accuracy of the biomarker evaluated. In addition, the inclusion of skull fracture as a CT abnormality may cause inflation of the accuracy estimates of S100B, whereas, using a brain-specific marker as an index test may result in patients with skull fractures being misclassified as false negative. Finally, in different domains, a substantial number of studies were considered to be at unclear risk of bias because of substandard reporting. We investigated the effect of these factors in sensitivity and subgroup analyses.          Negative Control Group: healthy individuals (e.g., healthy volunteers, voluntary blood donors, outpatients for routine blood work) who were checked on their health and potential head trauma status. Positive Control Group: patients with moderate to severe brain injury. Orthopedic Control Group: non-brain-injured patients presenting to the ED with a single-limb orthopedic injury without blunt head trauma. MVA Control Group: patients presenting to the ED after a motor vehicle crash without blunt head trauma BM, biomarker; CV, coefficient of variation; ECLIA, electrochemiluminescence immunoassay; ED, emergency department; ELISA, enzyme-linked immunosorbent assay; GCS, Glasgow Coma Score; GFAP, glial fibrillary acidic protein; H0, within 3 h after the clinical event; H+3, 3 h after the first sampling; IFMA, immunofluorometric assay; ILMA, immunoluminometric assay; IQR, interquartile range; LIA, luminescence immunoassay; LLOD, lower limit of detection; LLOQ, lower limit of quantification; LOD, limit of detection; mTBI, mild traumatic brain injury; MVA, motor vehicle accident; NA, not applicable; NR, not reported; NSE, neuron specific enolase; pts, patients; RIA, radioimmunoassay; S100B, S100 calcium binding protein B; SEM, standard error of the mean; TBI, traumatic brain injury; UCH-L1, ubiquitin C-terminal hydrolase-L1; ULOQ, upper limit of quantification.
In terms of the assays/platforms used, most of the studies (13/22) used an automated electrochemiluminescence immunoassay (ECLIA) on an Elecsys Ò analyzer (Roche Diagnostics), while one used the Cobas 6000 analyzer (Roche Diagnostics). There were four studies conducted using an automated immunoluminometric assay (ILMA) on a Liaison Ò analyzer (Diasorin), and one was conducted on LIA Ò -mat (Sangtec Ò 100); one study used a radioimmunoassay (Sangtec), and one used an enzyme-linked immunosorbent assay (ELISA) platform (Banyan Biomarkers, Inc.) ( Table 3). In one study, the analytical performance of the two automated immunoassays (i.e., Diasorin and Roche Diagnostics assays) was compared and, although not interchangeable, the two methods strongly correlated and appeared usable in a similar manner. 27 Performance of S100B at a 0.10-0.11lg/L cutoff value To obtain clinically relevant estimates of the performance of S100B, we pooled the results from the 16 studies using the cutoff value of 0.10-0.11lg/L. The individual sensitivities and the specificities for each study included in this meta-analysis were between 72% and 100% and between 5% and 77%, respectively (Fig. 5). The following summary estimates were obtained: sensitivity 96% (95% CI 92-98%), specificity 31% (95% CI 27-36%), positive likelihood ratio 1.4 (1.3-1.5) and negative likelihood ratio 0.12 (0.06-0.25). Figure 5 shows the pooled sensitivity and specificity (the solid red spot in the middle) and the 95% confidence and prediction regions (the inner and outer ellipses, respectively).
There was a significant level of heterogeneity in the results, greater for specificity than for sensitivity (Fig. 5). The value for sensitivity was >80% in all the studies but one. 41 The value for specificity was mainly >30%; however, in the remaining studies, the low specificity was accompanied by a very high sensitivity. However, because of important variation across studies with simultaneous presence of factors (time, presence of extracranial injuries, mixed populations) (Fig. S2) with potentially contrasting effects on the accuracy estimates and lack of individual data and/or insufficient number of studies, we were unable to compare patient characteristics and investigate the effect of the planned sources of heterogeneity (see online supplementary material at http://www. liebertpub.com). Poor reporting of patient and study information also contributed to unknown sources of heterogeneity.
One study was an outlier (Zongo and colleagues). 42 Exclusion of this study made no change in sensitivity (96.3% vs 96.1%); however, specificity increased from 31% to 33%. This could be explained by the fact that in this study, including the greatest number of patients, S100B levels were measured in plasma, thus increasing the probability of false positive results (Fig. S3) (see online supplementary material at http://www.liebertpub. com).
To explore the effect of risk of bias in the patient selection domain on the summary estimates, we excluded eight studies considered at high (n = 1) or unclear (n = 7) risk of bias. The exclusion of these studies slightly improved sensitivity (98%) (Fig. S4) (see online supplementary material at http://www.liebertpub.com). A sensitivity analysis was also undertaken to assess the impact of studies containing mixed populations on our findings. We excluded one study (Welch and colleagues), 40 because the authors included patients with moderate TBI (GCS 9-12). There was no impact on our findings. Four studies enrolled a mixed pediatric and adult population. Exclusion of these studies as well as those in which this information was unclearly reported made no difference to our results (Fig. S4).
The prevalence of CT findings was relatively high (> 11%) in seven studies. Excluding these studies resulted in a slight increase in sensitivity and a slight decrease in specificity (98% and 29%, respectively). Finally, eight studies considered skull fracture as a CT abnormality. To explore the impact of the type of reference standard on the summary estimates, we excluded these studies as well as those in which this information was unclearly reported. The exclusion of these studies slightly impacted sensitivity and specificity (93% and 35%, respectively) (Fig. S4).
Quality of evidence of S100B The quality of the evidence for the use of blood S100B levels to diagnose brain injury as assessed by CT scan in patients with mild TBI was moderate (Fig. 6).

GFAP
Eligible studies reporting the accuracy of GFAP for detecting intracranial lesions on CT scan comprised three cohorts with mild to moderate TBI patients and one cohort with mild to severe TBI  4. (A, B) Summary receiver operating characteristic (ROC) plots for S100 calcium binding protein B (S100B) and glial fibrillary acidic protein (GFAP) for detection of CT abnormalities. (C, D) Study estimates of sensitivity and specificity with 95% confidence intervals plotted in ROC space for neuron specific enolase (NSE), and ubiquitin C-terminal hydrolase-L1 (UCH-L1) for detection of CT abnormalities. Each square represents an individual study; the size of the symbol is proportional to the number of patients in each study. The hierarchical summary ROC (HSROC) model was used to estimate a summary curve using Proc NLMIXED in SAS.

FIG. 5.
Summary receiver operating characteristics plot of sensitivity and specificity of S100 calcium binding protein B (S100B) at a 0.10-0.11lg/L cutoff value for detecting intracranial lesions on CT. Each circle represents an individual study; size of the symbol reflects the number of patients in the studies; red solid spot in the middle is summary sensitivity and specificity; inner ellipse represents 95% confidence region, and outer ellipse represents 95% prediction region.
The individual sensitivities were between 67% and 100%, whereas the specificities were between 0% and 89%. Sensitivities were sufficiently homogenous, whereas specificities were clearly heterogeneous. The thresholds used, ranging from 0 ng/mL 40 to 0.6ng/mL 29 were not pre-specified, and were determined from ROC analyses. The summary ROC curve of the accuracy of GFAP across all four studies, regardless of the threshold used, is shown in Figure 3.
The planned comparison between S100B and GFAP diagnostic performance was not possible, because of the limited number of studies and different spectrum of patients available for GFAP.

NSE
The accuracy of NSE for discriminating between TBI patients with intracranial lesions on CT scanning from those without lesions was evaluated in three studies (314 patients). 33,41 Figure 2 shows a forest plot of the individual study estimates of sensitivity and specificity. The sensitivities were between 56% and 100%, whereas the specificities were between 7% and 77%. The studies reported a considerable variation in the threshold adopted, ranging from 9 to 14.7 lg/L (Table 3).

Tau
The accuracy of circulating tau (cleaved tau [C-tau]) for diagnosis of CT abnormalities was evaluated only in one small study (50 patients). 28 The sensitivity was 50%, whereas the specificity was 75%. Among the 10 patients with abnormal findings on CT enrolled in this study, 5 (50%) had no detectable C-tau levels.

Discussion
In this systematic review, we have provided a comprehensive and thorough examination of the literature on protein biomarker diagnostic signatures for traumatic brain lesions to define how to best take advantage of these tests in ED daily patient care. We found that of the six biomarkers explored, current evidence only supports the measurement of S100B to help informed decision making in patients presenting to the ED with suspected intracranial lesion following mild TBI, possibly reducing resource use. There is as yet insufficient evidence that GFAP, NSE, and UCH-L1 are ready for clinical application, despite their unequivocal association with TBI. Further, tau and neurofilament proteins were analyzed in too few studies to draw any meaningful conclusions. Importantly, serious problems were observed in many of the studies, ranging from unfocused design and inappropriate target groups to biased reporting and inadequate analysis. These points are further elaborated in the subsequent discussions.

S100B
Our findings demonstrate the clinical utility of S100B for the intended use of allowing physicians to be more selective in their use of CT without compromising care of patients with mTBI. More specifically, the 16 studies applying the same prespecified cutoff of 0.10-0.11lg/L yielded a pooled sensitivity of 96% (95% CI 92-98%) and specificities of 31% (95% CI 27-36%). Assuming a pretest probability of 10% 44 would mean that, overall, 100 of 1000 tested patients will have a final diagnosis of intracranial lesion. The pooled results obtained for sensitivity and specificity would mean that, of these, between 92 and 98 will test positive (true positives) and 2-8 will test negative (false negatives). Of the 900 with negative CT, between 243 and 324 will test negative (true negatives) and between 576 and 657 will test positive (false positives) (Fig. 6).
Even though this high sensitivity and excellent negative predictive value looks promising, information regarding which lesions could be missed and the associated consequences-if left untreated-is particularly relevant to the broad acceptance and adoption of S100B by the medical community. Accordingly, there is an ongoing debate about the risk of sending home a misdiagnosed patient with a potentially life-threatening condition such as an epidural hemorrhage. From the available data, 3,19,30,32,39,42 we were unable to identify specific types of injury that were systematically missed, albeit subdural hematomas were slightly more frequently misclassified as false negatives. We speculate that this may be because of the brain lesion location and/or extension as well as the pathoanotomical and neurovascular features of the different injuries that cause an altered or delayed leakage of S100B into the circulation. Importantly, one study 30 demonstrated that lesions requiring surgery (one subdural hematoma and one epidural hematoma) were missed by S100B, thereby indicating that this marker-if used alone as a diagnostic tool-is not completely reliable. Given that distinct patterns of injury are linked to patientspecific variability, efforts must to be made to develop advanced multiparameter-based solutions integrating marker signature and patient features. Such multimodal prediction models could be more suitable for an accurate diagnosis, characterization of injury types, and risk stratification of mTBI patients. 45 It will be also critical to estimate the independent and complementary value of biomarkers and determine whether this strategy provides added diagnostic utility when combined with a careful clinical assessment or when integrated into existing clinical decision rules for the selective use of CT, such as the CT in Head Injury Patients (CHIP) model, 46 the New Orleans criteria, 4 or the Canadian Head CT rule. 47 Unless a biomarker-based approach yields an incremental diagnostic value and clearly demonstrates its superiority over standard, readily available patient characteristics, the broad acceptance in medical practice is unlikely. 48 Reliability and reproducibility of S100B results also requires a critical consideration of the comparability and potential variability in biomarker measurements when using assays from different manufacturers. We found the adoption of a relatively uniform and standardized approach for S100B determination, with 14 studies using the ECLIA ElecsysÒ Roche and 2 studies using the ILMA LIA-mat Sangtec 100. These two automated immunometric assays have been demonstrated to have a good correlation, with almost identical diagnostic capability, 27 therefore excluding that this factor could have influenced our conclusions. A comparable level of consistency in analytical methods and assays used is not available for any of the other biomarkers considered in this review.
Our review showed that the results across S100B studies using the prespecified cutoff were consistent in terms of sensitivities and specificities, with only one outlier showing an exceptionally low specificity (12%). 42 A plausible explanation for this anomaly is that in this study, plasma samples were used to measure S100B. This interpretation fits well with evidence from previous literature demonstrating how the interference of the anticoagulant on the immunoreactivity for S100B can alter its levels relative to serum (values higher by *20%). 49 Consequently, in the study of Zongo and colleagues, the use of the prespecified cutoff for serum inevitably resulted in a systematic increase of false positive results. 42 This observation, while complicating the analysis of S100B blood levels, points to the need for a more exhaustive knowledge and understanding of pre-analytical factors as potential confounders and sources of variability, and supports the adoption of different cutoff values, depending on the sample type used. Intriguingly, this observation suggests that plasma could be more suitable and possibly desirable for measuring S100B levels in mild TBI patients, because of very low concentrations in this population. However, even after removing the outlier, a considerable heterogeneity remained, necessitating caution when interpreting analysis results.
Investigations from multiple research groups provided evidence that a series of factors other than the brain injury may influence levels of biomarkers in the circulation and, therefore, the diagnostic accuracies. Such factors encompass biomarker characteristics such as molecular weight; injury-specific release mechanisms and clearance (Table S1); 50,51 patient features including presence of extracranial injuries or polytrauma, intoxication, location of the injury, and even genetic, pre-analytical and laboratory-dependent procedures including all steps from management of equipment to execution of assays manufacturing processes; and post-analytical data handling. 19,[52][53][54] We were not able, however, to systematically investigate these potential sources of heterogeneity, because of a substantial variation across studies, the suboptimal reporting of patient and study information, and the coexistence in the same study of factors with contrasting or controversial effects on the accuracy estimates. Taken together, these findings demonstrate that future research must be refined by improvements in study design as well as standards and characterization of patient selection (See box on page 17) .
In this regard, surprisingly, we noted that to date no attempt has been made to specifically investigate the effect of comorbidities and sex on the diagnostic performance of S100B or any other marker. Sex is recognized as a primary determinant of biological variability, responsible for anatomical, neurochemical, and functional brain connectivity differences, heavily influencing neurobiological and neuropathophysiological response. 55 It is also associated with important differences in hormones, metabolism, and the immunological system, which in turn may interfere with the determination of circulating TBI biomarker. 56 Factoring sex into research designs and analyses is a theme under active debate, and is considered fundamental to rigorous and relevant biomedical research. Hence, we emphasize that this is a critical knowledge gap for future investigation, especially in light of the mounting evidence of the changing gender pattern caused by the shift in the TBI population toward older age, also at risk of multiple comorbid conditions (see Thaler and colleagues). 39 Systematic reviews and meta-analyses of individual participant data (IPD) may represent a powerful approach to overcome some of these gaps and limitations, 57 also supported by the current initiatives to share clinical data and the establishment of common repositories, such as the Federal Interagency Traumatic Brain Injury Research (FITBIR) database (https://fitbir.nih.gov/). 58 Clinical application of S100B implies that choosing the right assessment time point (time between injury and sampling) 59 is an integral part of the test. Based on the results of S100B kinetics studies, guidelines have specifically indicated a time window within 3 9,60 to 6 9 h post-injury for S100B to detect intracranial lesions. A recent study supported a 3 h window for safe rule-out of acute intracranial lesion in clinical practice, showing that a second blood sampling 3 h after the first one is not informative and resulted in a non-trivial loss of sensitivity of *6% (e.g., eight patients with positive CT would have been missed). 27 We were unable to further address this specific issue in this review because of the heterogeneity in study design. In addition to post-injury delays in sampling, the delay from obtaining samples to processing and analysis, and the storage conditions during this delay could both be important modulators of S100B stability and assay results. Age, gender, and comorbidities or their combination can also importantly affect the kinetics of S100B. 61 Future studies should inform whether these variables should be considered, and what the potential influence on biomarker results and interpretation is.
The results of our study expand and corroborate those from previous systematic reviews and meta-analyses, [62][63][64] and confirm that the implementation of S100B might allow a reduction of the number of CT scans by *30%. 3 These considerations also have broad financial implications for healthcare costs. However, none of the studies in our review explored the cost effectiveness of the use of biomarkers, and the few economic studies and data in the literature are controversial. An earlier study by Ruan and colleagues 65 reported a limited effect of S100B on healthcare resources and a potential economic impact only in specific clinical scenarios (i.e., CT scanning rate >78% or a faster turnaround time of biomarker results of at least 96 min compared with CT scan results). Conversely, in a more recent cost analysis conducted in a Swedish regional hospital, the clinical use of S100B incorporated into the Scandinavian guidelines substantially reduced healthcare costs, especially in cases of strict adherence to management recommendations (71e per patient). 66 These results are not generalizable, and must be carefully interpreted according to their specific contexts, because of the differences across countries, healthcare systems, hospital settings, and ensuing care patterns. To refine cost calculations, future studies should take these factors into consideration, as well as CT overutilization and the socioeconomic costs associated with increased cancer risks from CT scans. Clear demonstration of cost saving and added benefits beyond those obtained by current management strategies for mTBI are essential for TBI biomarkers to be adopted and widely used by the medical community.

GFAP
Recent narrative reviews have outlined the potential of GFAP for identifying patients with intracranial lesions after head trauma, 7 but none of these used systematic review methods or metaanalyses. In the meta-analysis reported here, we included four studies, in which the diagnostic accuracy of GFAP reflected sensitivities of 67 29 -100% 36,40 and specificities of 0 40 -100%. 29 Although promising, these results must be approached with caution, because the studies included patients with severe and moderate TBI not representative of the target population of the test (the median prevalence of abnormal CT findings across the studies was 22%), and thresholds were not prespecified, factors that may have inflated the accuracy estimates. 67 For diagnostic validation, it will be fundamental to establish reliable and valid thresholds. Also, GFAP needs be tested in larger clinical studies with a focus on the intended use. 68,69 To this end, it has been argued that studies investigating the implementation of biomarker measurements in guidelines for mTBI management-to avoid use of unnecessary CT-should be limited to patients currently recommended for such examination (GCS [14][15], therefore excluding patients with GCS score of 13 for whom biomarker assessment would not add to 1100 MONDELLO ET AL.
clinical examination. 9 As mentioned earlier, the definition of these setting-specific characteristics is also critical for performing reliable cost analyses and determining the primary economic advantage of using blood biomarkers as a pre-head CT screening tool. A meaningful comparison between GFAP and S100B diagnostic performances was precluded by a substantial difference in study populations. In this context, we note that TBI biomarkers discussed in this review are usually considered individually. Further work should more consistently explore simultaneous assessment of multiple biomarkers providing the framework for comparing the accuracy of tests that have directly been compared in individual studies.

NSE and UCH-L1
The relative dearth of studies evaluating the diagnostic accuracy of NSE, UCH-L1, and Tau in the ED for identifying patients with intracranial lesions following mTBI hampered the possibility of performing meta-analyses. The diagnostic value of NSE remains uncertain, with studies showing remarkable variations and inconsistency. In contrast, the accuracy of UCH-L1 for detecting intracranial lesions on CT scan was evaluated in two studies that yielded an optimal sensitivity (100%) but modest specificities (21-39%). Similar to GFAP, the thresholds used were not prespecified, and the studies included patients with mild to moderate TBI (GCS 9-15). Hence, further studies are required to confirm the reproducibility of these findings and to determine clinical utility in daily bedside care.

Tau and neurofilament proteins
There is insufficient evidence to support the clinical validity of initial circulating c-Tau or neurofilament protein concentrations for the management of patients with mTBI.

Implications for research and practice: Strengths and weakness of the review
Our current insight appreciates the complexity of the pathobiology of TBI most probably requiring multifaceted, multimodal approaches, integrating biomarkers and traditional clinical characteristics to allow a more powerful and accurate characterization and risk stratification of mTBI, 45,70 a premise currently insufficiently reflected in the literature. In addition, if the different biomarkers do indeed reflect different pathophysiological processes 51 with independent information about imaging abnormality, outcome impact, and different diagnostic windows, it is possible that the use of a panel of biomarkers may substantially increase the diagnostic specificity for the end-point of interest. 71,72 Unfortunately, to date, only a few such studies are available. More data are needed to evaluate whether a multi-marker approach based on a panel of biomarkers with distinct time-dependent discriminatory accuracy provides a better performance for the detection and characterization of TBI.
Further, we should be cautious in using CT as a gold standard to judge the performance of circulating biomarkers. When compared with MRI, there is increasing recognition that X-ray CT provides poor sensitivity for structural lesions in TBI such as microbleeds and diffuse axonal injury. 73,74 It follows that we cannot assume that false positivity in detection of CT-visible abnormality equates to false positivity in detection of structural injury, because some of these false positives may be associated with abnormalities on MRI or other advanced neuroimaging, persistent post-concussive symptoms, or long-term neurological, cognitive, and/or neuropsychiatric complications. [75][76][77][78] On the other hand, these considerations suggest a broader clinical application of a biomarker-based strategy for diagnosis and management of mTBI. Biomarkers could be used to provide guidance for prognostic groupings, to refine risk stratification, and to inform and guide different management and treatment decisions including indications for advanced MRI techniques (diffusion tensor imaging [DTI], susceptibility weighted imaging [SWI], functional connectivity MRI [fcMRI]), enrollment into clinical trials, and closer monitoring and follow-up of mTBI patients.
From a clinical perspective, biomarkers are not useful if they do not provide real-time decision support for diagnosis of mTBI at the bedside in the ED. A successful approach to the rapid incorporation into routine patient care will be to develop an automated multiplex point of care (POC) device, capable of providing accurate measurements to the clinician at a reasonable cost and with short turnaround times (*15-20 min). 52,53 The studies discussed in this review focus primarily on adult patients. There is, however, a growing interest in using biomarkers to optimize diagnosis and management of pediatric mTBI, because of the high risk of TBI in children £4 years of age, the difficult functional assessments, and the radiation exposure at a young age with ensuing increased cancer risk. 75,79,80 Future studies and systematic reviews taking current and new evidence into account are urgently needed to elucidate the role of biomarkers and establish their clinical utility in this special and vulnerable population.
Several potential limitations merit consideration. Patient selection is a critical aspect in reviews of test accuracy, as it can alter the spectrum of disease and non-disease and the prevalence in the population, strongly impacting test accuracy. 67 Given the heterogeneous and polymorphous nature of TBI, in particular at the milder end of the spectrum, there has been an inconsistent, sometime controversial, definition of mTBI adopted in the included studies. For example, focal neurological deficit has been considered either as an inclusion or as an exclusion criterion (Table S2). This diagnostic uncertainty may possibly have introduced different biases. Although this is an issue that we cannot solve in this review as we had to rely on the criteria that were listed in the included studies; nonetheless, we were able to assess the robustness of the findings using sensitivity analysis, which even demonstrated an improvement in S100B performance (Fig. S4).
However, with respect to selection of patients and study design, our group endorses the importance of methodological rigor, and advocates the use of standardized protocols and a prespecified set of data analysis both as a means to reduce related biases and inadequate reporting, and as a mandatory prerequisite to ensure successful validation and implementation of TBI diagnostic biomarkers. Also critical consideration for sample size planning based on assay precision, clinical significance, and regulatory considerations is necessary. Involvement of regulatory bodies in driving forward harmonization and standardization is considered essential. A major step forward in this direction is the recently established collaboration between researchers and the United States Food and Drug Administration (FDA) in the context of the TBI Endpoints Development (https://tbiendpoints.ucsf.edu/). Further, despite the broad adoption by the scientific community of the STARD statement (Standards for Reporting of Diagnostic Accuracy studies), 81 we found a number of studies with poor or inconsistent reporting of important information, including patient and specimen characteristics, assay methods, handling of missing data, and statistical analysis methods, in addition to suboptimal descriptions of study findings, which hampered our assessment of potential for bias and interpretation of the results. Our observations are important in raising awareness of key reporting issues in many of the TBI diagnostic studies. The STARDdem Initiative recently proposed an implementation of the STARD statement with guidance pertinent to studies of cognitive disorders, which is expected to contribute to the development of Alzheimer biomarkers. 82 A similar initiative for TBI biomarker studies could increase transparency and the quality of information provided by such studies, enabling evaluation of internal and external validity and, consequently, a more effective translation and application of their findings to clinical practice.
Harmonization and standardization of biomarker assays that can reliably quantify biomarkers with high analytical precision is critical to ensure that measurements are reproducible and consistent across different analytical platforms and multiple laboratories.

Conclusion
Based on this review, we found that measurement of S100B can help informed decision making in the ED with respect to the selection of adults with a mTBI for CT scan, possibly safely reducing resource use. Conversely, there is little evidence for clinical application of GFAP, UCH-L1, NSE, tau or neurofilaments. However, much work remains to evaluate factors that may influence biomarker levels, and a critical confrontation is required with the implications for actual management, clinical impact, and health economic implications. We also found serious problems in the design, reporting, and analysis of many of the studies, emphasizing the importance for the research community to establish methodological standards and acquire extensive high-quality data for TBI biomarker validation. This is an essential prerequisite for drawing firm conclusions about the performance of tests based on these biomarkers and their clinical utility.
Finally, through the extensive and critical review of the current TBI biomarker existing literature, and state-of-the-science discussions with key opinion leaders and subject matter experts, members of our work group collaborated to evaluate the evidence necessary to demonstrate clinical utility of TBI biomarkers, to identify critical gaps for advancing the field, and to lay the foundation for a ''living'' TBI biomarker registry capable of providing an up-to-date list and information on biomarker studies and their results (see Box). Such a strategy, helping to foster collaboration, developing the high levels of evidence needed to support analytical validity and clinical utility, and improving the quality of assessments of novel candidate biomarkers, should establish the solid ground needed for changing biomarker research from data that informs into data that transforms, turning knowledge into a new medical practice. -Increase transparency and quality of reporting by calling on investigators to adopt optimal/consolidated guidelines for reporting biomarker work (http://www.stard-statement.org/).
-Reduce biases by implementing critical appraisal tools for evaluating the quality of research (http://www.quadas.org/). -Develop internationally accepted common reference standards and reference methods to reduce the variability while permitting reliability of biomarker results, reproducibility, and comparability across analytical platforms/laboratories and clinical studies, and the establishment of general exact diagnostic cutoffs.
2. Additional Knowledge Needed to Improve Reliability in the Use of Blood Biomarkers and to Ensure a Successful Validation and Implementation in Clinical Practice -Assess relationships between specific types and patterns of injury and biomarker kinetics.
-Factor primary biological and clinical variables, including sex and comorbidities, into research design and analyses to exhaustively understand their influence on biomarker pathophysiology and levels.
-Take a thorough investigative approach accounting for pre-analytical factors and adoption of different cutoff values and alternative/complementary time points.

Exploration of Novel Opportunities and Strategies for Expanding and Informing Biomarker Clinical Research as a Basis for Developing Multimodal Multidimensional Models to Diagnose Mild TBI
-Simultaneous assessment of multiple biomarkers to compare accuracy and evaluate the performance of multi-marker panels for the detection and characterization of TBI.
-Sharing of clinical data and establishment of common repositories to support individual participant data meta-analyses (IPD-MAs) for more robust development of diagnostic models tailored to specific (sub)populations or settings, and testing their generalizability and usefulness.
-Systematic and rigorous evaluation, quantification, and demonstration of the incremental diagnostic value of TBI biomarkers over standard, readily available patient characteristics, and existing prediction rules for the selective use of CT.
-Combination of brain injury biomarkers and patient characteristics yielding independent and incremental diagnostic information toward a powerful multi-parameter platform to assist and enhance clinical decision making (triage for CT scanning) in patients with mTBI at the bedside in the emergency department.