Prediction of Global Functional Outcome and Post-Concussive Symptoms after Mild Traumatic Brain Injury: External Validation of Prognostic Models in the Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury (CENTER-TBI) Study

The majority of traumatic brain injuries (TBIs) are categorized as mild, according to a baseline Glasgow Coma Scale (GCS) score of 13–15. Prognostic models that were developed to predict functional outcome and persistent post-concussive symptoms (PPCS) after mild TBI have rarely been externally validated. We aimed to externally validate models predicting 3–12-month Glasgow Outcome Scale Extended (GOSE) or PPCS in adults with mild TBI. We analyzed data from the Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury (CENTER-TBI) project, which included 2862 adults with mild TBI, with 6-month GOSE available for 2374 and Rivermead Post-Concussion Symptoms Questionnaire (RPQ) results available for 1605 participants. Model performance was evaluated based on calibration (graphically and characterized by slope and intercept) and discrimination (C-index). We validated ﬁve published models for 6-month GOSE and three for 6-month PPCS scores. The models used different cutoffs for outcome and some included symptoms measured 2 weeks post-injury. Discriminative ability varied substantially (C-index between 0.58 and 0.79). The models developed in the Corticosteroid Randomisation After Signiﬁcant Head Injury (CRASH) trial for prediction of GOSE < 5 discriminated best (C-index 0.78 and 0.79), but were poorly calibrated. The best performing models for PPCS included 2-week symptoms (C-index 0.75 and 0.76). In conclusion, none of the prognostic models for early prediction of GOSE and PPCS has both good calibration and discrimination in persons with mild TBI. In future studies, prognostic models should be tailored to the population with mild TBI, predicting relevant end-points based on readily available predictors.


Introduction
T raumatic brain injury (TBI) is a major health concern with >50,000,000 new cases reported globally every year. 1,2 Approximately 70-90% of patients with TBI present with a Glasgow Coma Score (GCS) of [13][14][15], which falls in the mild TBI category. 3 Although the majority of these patients recover shortly after the incident, a notable percentage continue to have persistent complaints. These complaints can interfere with daily life and social and work activities, 4,5 and *50% of persons with mild TBI do not return to their pre-injury level of functioning 6 months after injury. [6][7][8] The most prominent post-injury disturbances are cognitive, emotional, somatic, and behavioral symptoms, often referred to as post-concussive symptoms, 9 or if the sequelae of symptoms persist over time, post-concussion syndrome (PCS). The concept of PCS has been questioned in recent years, 10 and, therefore, some authors refer to the multiple concurrent post-concussive symptoms several months after TBI as persistent post-concussive/post-concussion symptoms (PPCS). [11][12][13][14] The prevalence of 6-month PPCS after mild TBI varies substantially among studies, partly because of differences in diagnostic criteria, and is typically between 10% and 40% in civilian samples presenting to hospitals. 4,[15][16][17][18] Considering the high percentage of functionally impaired persons with mild TBI, the economic burden of prolonged treatment and decreased productivity, 19 it is important to promptly identify persons who are at high risk of long-term consequences. Therefore, a well-performing prognostic model for outcome prediction after mild TBI is important to assist patients and health-care providers in making well-informed treatment decisions. Before implementation of a model for decision making in clinical practice can be considered, it is crucial to assess its performance in an external validation study. In recent years, there have been initiatives toward external validation of prognostic models for mild TBI, 6,20 but validation studies are still scarce. The Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury (CENTER-TBI) project provides an excellent opportunity for external validation of existing models in a large prospective cohort of contemporary TBI patients from 18 countries across Europe, and Israel. 21 The aim of this study was to examine the performance of existing models for prediction of outcome following mild TBI. We searched for published predictors and prognostic models for functional outcome (Glasgow Outcome Scale Extended [GOSE]) and PPCS for mild TBI and validated selected prognostic models using the CENTER-TBI database.

Study population
The study population consisted of patients from the prospective longitudinal observational CENTER-TBI study (Core data, version 2.0). Data were collected from December 2014 to December 2017 in 58 centers across Europe and Israel. Ethical approval was granted for each recruiting site and informed consent was obtained for all patients by the patients and/or their legal representative/next of kin. Institutions participating in CENTER-TBI were mainly referral centers for neurotrauma. Patients who were not seen in study hospitals were not included. Inclusion criteria for the core study were a clinical diagnosis of TBI, presentation within 24 h after injury, and an indication for computed tomography (CT) scanning. The exclusion criterion was any severe pre-existing neurological disorder that could confound outcome assessments. 21 The core data set included three strata that were differentiated according to care path: patients seen in the emergency room (ER); patients primarily admitted to the intensive care unit (ICU), and patients primarily admitted to the hospital ward (non-ICU).
For this study, 2862 ( ‡ 16 years of age) adults with mild TBI, as  defined by a baseline GCS of 13-15, were included; 2374 of the  records had information on 6-month GOSE, and 1605 had information on some or all 6-month Rivermead Postconcussion Symptoms Questionnaire (RPQ) 22 items, measuring PPCS.

Measurements
Predictors. Sociodemographic, pre-injury and injury characteristics were based on information in hospital charts. Imaging, blood sampling, and neurological assessment were performed in the ER. Post-concussive and psychological symptoms were assessed at 2-3 weeks post-injury (range 10-27 days) in patients admitted to the ER, and in some centers (participating in an additional imaging sub-study), also in patients admitted to a hospital ward other than the ICU. The following instruments were used: RPQ for post-concussive symptoms, PTSD Checklist for Diagnostic and Statistical Manual of Mental Disorders, Fifth edition (DSM-5) (PCL-5) 23 to screen for post-traumatic stress disorder (PTSD), Patient Health Questionnaire (PHQ -9) 24 to screen for depression, and Generalized Anxiety Disorder (GAD-7) 25 to screen for anxiety.
Outcome. The GOSE is widely used as a primary outcome measure in TBI studies. 26 The GOSE provides eight categories of outcome: dead (1), vegetative state (2), lower severe disability (3), upper severe disability (4), lower moderate disability (5), upper moderate disability (6), lower good recovery (7), and upper good recovery (8). The highest score (8) represents a complete return to a pre-injury level of functioning. 27 The GOSE was assessed 6 months post-injury, and when outside the time window (range 5-8 months), it was imputed based on GOSE measurements at other time points (* 30%, described in Steyerberg and coworkers). 2 The RPQ is the most frequently employed self-reported symptom inventory measuring PPCS. 28 The RPQ consists of 16 cognitive, somatic, and emotional symptoms that can be assessed from ''not experienced at all'' (0) to ''severe problem'' (4), and it was administered 6 months post-injury.
The self-report instruments were administered in 18 languages. Prior to the data collection, instruments existing only in English were translated and linguistically validated in the respective languages according to the guidelines of Acquadro. 29 The linguistic validation procedure consisted of multiple steps, including forward translation, cognitive debriefing, backward translation, harmonization, and finalization of translated versions. A manuscript dedicated to the linguistic validations is currently in preparation. Psychometric properties of the instruments have been investigated using criteria of the classical test and item response theory (other publications in preparation). [30][31][32] Selection of prognostic models Eligible prognostic models were selected based on a rapid review with pre-defined search strategy and pre-defined inclusion criteria. Prognostic models and predictors of GOSE or PPCS were identified by a search in MEDLINE Ò , Embase, and the Cochrane Library, which included studies published until May 2019 (Table S1), and reference lists of systematic reviews. [33][34][35] Prognostic models were included if they were developed to predict GOSE or PPCS at 3-12 months post-injury in patients with GCS 13-15 at baseline. Models that were developed in populations that included other TBI severities were also selected if at least a proportion of patients had a GCS between 13 and 15. Moreover, models had to fulfill at least one of the following quality criteria to be considered eligible: (1) large sample size (n > 500), (2) > 10 EXTERNAL VALIDATION OF PROGNOSTIC MODELS FOR MTBI outcome events for each candidate predictor considered and (3) the use of shrinkage and/or some form of internal validation. 36 We extracted predictors of outcome from eligible models and from all studies that explored prediction of 3-12-month GOSE and PPCS in persons with mild TBI.

Statistical analyses
The external validity of the models was assessed with measures of calibration and discrimination. Calibration is the agreement between predicted and observed outcome values and was measured by the calibration intercept and the calibration slope, and visualized by a calibration plot. The calibration intercept expresses calibration-in-the-large: if the outcomes are systematically underestimated (intercept <0) or overestimated (intercept >0), and the calibration slope indicates if the strength of the associations between predictors and outcomes is underestimated (slope >1; ''underfit'') or overestimated (slope <1; ''overfit''). A calibration plot graphically compares mean observed with mean predicted outcomes. In a perfect scenario, the calibration intercept and slope would be 0 and 1, respectively, and combinations of predicted and observed outcomes would be on the 45 degree line. Discrimination refers to the ability to classify patients with a poor versus a good outcome based on a prognostic model, and was assessed by the area under the operator receiver characteristic curve (AUC), which is equal to the concordance (C) index in logistic regression models. The AUC or C-index ranges between 0.50 (no discrimination, equal to chance) and 1.0 (perfect discrimination).
The C-index obtained in validation studies is influenced by differences in both the regression coefficients (slope) and the case-mix heterogeneity. To disentangle their influence on the discriminative ability of logistical regression models, we used the model-based concordance (mbc), which is only influenced by differences in case-mix heterogeneity. 37 All models were validated using patients with GCS 13-15 with all information on the relevant predictors available in the CENTER-TBI data set (''complete case analysis''). When predictors were not registered in CENTER-TBI, and therefore were completely unavailable, their predictor effect was set to 0, and only discrimination and calibration slope were assessed. As a sensitivity analysis, models were also validated in all patients with GCS 13-15, using imputation to complete missing data in predictors (''imputation analysis'' in one complete data set).
All analyses were performed in R (3.5.3, R Foundation for Statistical Computing, Vienna, Austria, 2019) using the rms package for model validation 38 and the mice package for imputation of missing values. 39 The calibration plot was created using val .prob.ci.2 function. 40 The study was conducted and reported according to the criteria of the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement. 41

Model selection
Based on the literature search criteria (Table S1), 417 abstracts were screened. Based on the full-text screen, 43 articles described predictors of 3-12-month PPCS (n = 29), GOSE (n = 11), or both (n = 3), and 5 articles presented prognostic models for prediction of GOSE (n = 9) and PPCS (n = 3) (Tables 1, S2, and S3). The most frequent predictors in prognostic models were age, GCS, and extracranial injuries, and alcohol intoxication ( We validated five models predicting GOSE and three models predicting PPCS (Tables 2 and 3). An additional three models were deemed unsuitable for validation because >70% of predictor variables were not available (CT and Combined Nijmegen model 7 ) or because the model equation was not available (emergency department [ED] UPFRONT model 8 ).

Eligible models predicting GOSE
Models for predicting 6-month GOSE were the Basic and CT models from the Corticosteroid Randomisation After Significant Head Injury (CRASH) trial; 56 clinical models for mild TBI and isolated TBI from the Nijmegen Radboud University Brain Injury Cohort Study (RUBICS) study; 57 and the ED+ model from the UPFRONT study 8 ( Table 2). All models predicted dichotomized GOSE, but with differently defined end-points: severe disability or death (GOSE <5), disability or death (GOSE <7), or complete/upper good recovery (GOSE = 8), respectively ( Table 2). They contained different predictors, but all models included a measure of injury severity (GCS, Injury Severity Score [ISS], or major extracranial injury), and most models also included age and alcohol intoxication. In addition to admission characteristics, the UPFRONT model included 2-week symptoms ( Table 2), which were assessed with different instruments than in the CENTER-TBI study (and therefore rescaled in validation). The predictors neck pain at the ER and coping styles from the UPFRONT model were not assessed in the CENTER-TBI study. The CRASH models were developed in an adult population with GCS 3-14, which partly includes mild TBI. In our study, they were validated in adults with GCS 13-15 and GCS 13-14. Other models were developed only in the population with GCS 13-15. The UPFRONT model was developed in patients with loss of consciousness (LOC) <30 min and post-traumatic amnesia (PTA) <24 h, and no major psychiatric disorders ( Table 2). These inclusion criteria were not used in our validation, but were applicable to the majority of the validation population and therefore were not expected to impact the results.

Eligible models predicting PPCS
Models predicting 6-month PPCS were developed in the Transforming Research and Clinical Knowledge in TBI (TRACK-TBI) pilot study, UPFRONT, and Nijmegen studies ( Table 3). The endpoint was differently defined or measured by different instruments (Table 3). In the TRACK-TBI pilot study, 46 PPCS were assessed with the RPQ and dichotomized according to International Classification of Diseases (ICD) criteria for PCS; that is, a score ‡2 on at least three of the following symptoms: headache, dizziness, fatigue, irritability, sleep disturbances, poor concentration, forgetfulness, poor memory, frustration, or depression. In the UPFRONT study, dichotomization of PPCS was done in a similar way, but measured using the Head Injury Severity Checklist (HISC). The Nijmegen study defined high PPCS as a score ‡2 on 13 out of all 16 RPQ items. The TRACK-TBI Pilot model only included admission characteristics as predictors, whereas the other models also included symptoms measured *2 weeks post-injury (Table 3). These symptoms were assessed by different instruments (Table S6). In addition, there were some differences between development and validation studies in the definition and measurement of pre-injury mental and physical health, headache, and nausea (Table S6). The TRACK-TBI Pilot study excluded patients with major psychiatric, neurological, or life-threatening diseases; UPFRONT included patients who sustained LOC or PTA, and without substance addiction; and Nijmegen included patients age 18-60 and LOC <30 min ( Table 3). The validation population was not
restricted based on age, psychiatric disorder, LOC, or PTA. Substance addiction and LOC >30 min were reported for only a small number of patients in the validation population. As sensitivity analyses, the validation population for the UPFRONT model was restricted to sustained LOC and/or PTA, and the validation population for the Nijmegen model was restricted to age group 18-60 (Table S9).
Subsamples without available 6-month outcomes did not differ from the overall cohort in the majority of baseline characteristics, but patients with completed 6-month RPQ were somewhat more educated, and had more CT abnormalities at baseline, a higher proportion of PTA and LOC, and a slightly lower percentage of psychiatric disorders (Table S7).
Sample with both 2-3 week symptoms and 6-month outcomes available differed from the total cohort: patients were mostly discharged after ER and had a median age of 51 years , and there was a higher proportion of females, more patients with GCS 15, and a smaller proportion of patients with major extracranial injuries and with CT abnormalities (Table S7). More than 70% of persons achieved good recovery (GOSE ‡7), with 49% of persons completely returning to their pre-injury level of functioning (GOSE = 8). Nevertheless, 11% experienced severe disability or had died (GOSE <5) at 6 months, 43% had mild to severe PPCS (ICD classification for PCS), and 22% had moderate to severe PPCS (ICD classification for PCS). Distributions of some predictors and outcomes differed in the CENTER-TBI compared with the development studies, particularly for models from the CRASH trial (Table S6).

Model performance in CENTER-TBI study
Models predicting GOSE. The CRASH models showed poor calibration and good discrimination for the outcome GOSE <5, which was observed in only 11 % of patients. Percentage of death/unfavorable outcome was overestimated (Basic model: 20% vs. 11%, calibration intercept = -0.82; Table 4), particularly for the CT model (27% vs. 11%; calibration intercept = -1.38; Table 4). In a population with GCS 13-14; that is, the patient selection that was used in the development study, calibration-in-the-large was better (calibration intercept = -0.26 for Basic, -1.13 for CT, Table 4). The calibration slope was close to 1, indicating similar effects of predictors compared with the CRASH trial. Models showed good discriminative ability, especially the CT model (C-index = 0.79; Table 4). The discriminative ability of the CRASH models was somewhat reduced by the more homogeneous patient population of CENTER-TBI compared with the development population, as expressed by the expected C-index if the model was correct (mbc = 0.79-80 vs. C-index of 0.81-0.83 in the development data, Table 4).
The Nijmegen clinical models showed relatively good calibration, with slight underestimation of proportions of unfavorable outcome (GOSE <7; 26% vs. 28%; 22% vs. 26%; Table 4). The slopes suggested smaller effects of predictors (slope = 0.82-0.83; Table 4) and slightly worse discriminative ability in the CENTER-TBI than in the Nijmegen study (C-index = 0.66-0.69). The mbc indicated a somewhat more heterogeneous patient case-mix in the CENTER-TBI study than in the Nijmegen study (mbc = 0.72-0.70 vs. C-index 0.71-0.69; Table 4), which increased the ability to correctly discriminate between patients with GOSE <7 and those with GOSE ‡7.
For the ED+ model, calibration-in-the-large was not assessed because several predictors were not registered in CENTER-TBI. Discrimination was assessed, but it was expected to be lower because of the absence of several predictors in the CENTER-TBI data. The ability to discriminate patients with complete recovery (GOSE = 8) was lower than in the development study (C-index = 0.70; Table 4). Analyses of C-indices and slope suggested smaller

200
MIKOLIĆ ET AL.  effects of predictors in CENTER-TBI and substantial overfitting (slope = 0.5; Table 4). If the regression coefficients were valid for the CENTER-TBI sample, the model would have a good discriminative ability (mbc = 0.80; Table 4), even slightly better than in the development study (C = 0.77, Table 4) because of the more heterogeneous case-mix in the CENTER-TBI study.
Models predicting PPCS. The TRACK -TBI Pilot model correctly estimated the proportion of patients with PPCS, defined as having three or more mild to severe symptoms at 6 months (42% vs. 42%; Table 5), but showed overfitting (slope <0.5; Table 5) and poor discriminative ability (C-index = 0.58; Table 5). The mbc was substantially higher than the observed C-index (mbc = 0.74 vs. 0.58, Table 5) and was equivalent to the C in the development study (C = 0.74, Table 5). This pattern suggested that predictor effects (regression coefficients) differed between studies, whereas casemix heterogeneity was comparable.
Models for prediction of PPCS, which included 2-week postinjury symptoms were validated in a smaller sample of CENTER-TBI patients, for whom that information was available, and they performed well (C-index 0.75-0.76). For the UPFRONT model, calibration-in-the-large was not assessed because of an unmeasured predictor (neck pain) in the CENTER-TBI study. The discrimination ability (C-index = 0.75; Table 5) and the effects of predictors were equivalent to UPFRONT (slope = 1.0; Table 5). The Nijmegen model was well calibrated, but it slightly overestimated the proportion of persons with high PPCS at 6 months (19% vs. 15%; Table 5). The calibration slope was close to 1, indicating similar effects of predictors. Discrimination was even slightly higher than in the development study (C-index = 0.76; Table 5) because of the somewhat more heterogeneous patient case-mix of the CENTER-TBI study compared with Nijmegen.
Calibration plots are shown in a Figures S1-S7. The performance of models was consistent in analyses after imputation of missing values, except for models containing 2-week post-injury symptoms, which showed lower performance (Table S8). The sensitivity analyses with the additional inclusion criteria used in the development study showed somewhat better performance of the UPFRONT and Nijmegen models for PPCS (Table S9).

Discussion
This study identified predictors and prognostic models for 3-12 month GOSE and PPCS in persons with mild TBI, and examined the performance of five models for predicting 6-month GOSE outcome and three models for predicting 6-month PPCS in an independent data set of mild TBI patients from the CENTER-TBI study. Overall, the definitions of unfavorable outcome differed among studies, and the ability of the models to distinguish between favorable and unfavorable outcome varied substantially (C-index 0.58-0.79). The CRASH models predicting severe disability or death discriminated best, but they were poorly calibrated to the population of mild TBI patients. For prediction of PPCS, the models that included 2-week post-injury symptoms showed the best discriminative ability and were well calibrated. In models with reasonable discriminative ability, the most frequent predictors were age, GCS, and extracranial injuries for GOSE, and pre-injury health and post-injury symptoms for PPCS.
CRASH models discriminated well but they largely overestimated the percentage of persons with poor outcome, and used an end-point (GOSE <5) that might not be appropriate for the mild TBI population. It was developed for mostly moderately to severely injured patients, whereas the validation population consisted of mildly injured patients. In the previous external validation in persons with mild TBI, 6 CRASH models showed good discriminative ability and miscalibration in the population with GCS 13-14, consistent with this study, but discriminated poorly in the total mild TBI population. The Nijmegen model (2008) for GOSE showed somewhat lower discriminative ability and some overfitting in our study, and low performance in the previous external validation, 6 which could be partly because of the high number of candidate predictors and lack of internal validation in the model development.
The performance of the UPFRONT model could not completely be assessed in the CENTER-TBI data.
The model for PPCS based on admission characteristics (TRACK-TBI, 2008) showed poor performance, consistent with the previous external validation. 20 A relatively small sample size for the development of the prognostic model, a particularly effective sample size for binary outcome, might have led to unstable regression coefficients, and, consequently, differences in performance between development and validation studies. 46 In addition, true differences in populations might also have contributed to the differences in the effects of predictors among studies. The performance of models containing post-injury symptoms (UPFRONT 2017, Nijmegen, 2008 were in line with their performance in the development studies. Nevertheless, the CENTER-TBI sample in which these models have been validated (both 2-3 week post-injury symptoms and 6-month PCS scores available) had lower injury severity, younger age, lower percentage of CT abnormalities, and higher GOSE than the overall mild TBI population in the CENTER-TBI study. Therefore, the performance of the models may have been different in the total mild CENTER-TBI population.
Although post-injury symptoms substantially improve prediction of outcomes, they are measured several days or weeks post-
injury, which does not routinely happen across hospital centers and for all persons with mild TBI. The majority of centers only follow persons that were admitted to hospital, and frequently schedule appointments a month or later following injury, 61 when symptoms are already persisting. The clinical applicability of a model containing predictors measured after discharge is therefore debatable for some hospital settings, and when the intention is to make predictions at the time of presentation/admission. Symptoms measured weeks after injury may be particularly helpful for making decisions about rehabilitation and specialized care. A model based on measures of medical history, injury characteristics, and early symptoms, which are easily obtainable and have shown associations with outcomes following TBI in previous studies, may be more universally useful for the early prediction of outcome. For example, protein biomarkers are currently considered to have potential for diagnosis and prediction in the context of TBI. 1,62,63 However, their prognostic value for longer-term outcomes following mild TBI is yet to be established. In addition to difficulties in the selection of appropriate predictors, problematic practices and lack of agreement in assessment and definition of outcomes hinder development of prognostic models for both GOSE and PPCS. The models for functional outcome used different cutoffs of GOSE to define the end-point, which could partly explain the variability in performance among them. It may be more difficult to discriminate between persons with mild TBI who have incomplete and those who have complete return to preinjury functioning (e.g., GOSE <8) than between persons with and without disability (GOSE <7 or even GOSE <5), 6 and different predictors may be relevant for predicting upper good overall recovery (GOSE = 8) versus disability/death (GOSE <5). In addition, using GOSE as an ordinal outcome seems to have added value over dichotomization. 64 It is of note that the overall utility of using GOSE as an outcome measure in persons with mild TBI has been disputed, because the measure may not be sensitive enough to capture different health disturbances despite good overall functioning. Usage of a broad battery of different measures in CENTER-TBI and TRACK-TBI studies, which cover health-related generic and disease-specific quality of life, return to work and daily activities, and cognitive and psychological functioning, provide new opportunities for prognostic modeling of outcome following mild TBI. 1,65,66 Moreover, composite measures based on several instruments, and encompassing different symptoms together with global functioning, have been proposed as an alternative to GOSE. 55 Nevertheless, our study confirms that a significant percentage of persons with mild TBI do not return to baseline global functioning 6 months post-injury. 6,8 Similarly, there is no agreement regarding the clinical criteria or operational definition of PCS or PPCS. 17 The Common Data Elements (CDEs) initiative, which aims to standardize data collection in TBI, recommends the RPQ for assessing post-concussive symptoms, but does not provide further guidance. 67 For example, PCS can be mapped to ICD-10 based on several RPQ items, 17,20,46,68 thereby using different scoring criteria (mild or worse and moderate or worse symptoms); composed from all RPQ items, 58 or based on a cutoff of the total RPQ score. 69 According to the classification methods and criteria in use, associations with predictors and other outcome measures (such as GOSE) vary substantially. 17 In our study, models for PPCS used different definitions of outcome and/or different instruments for measuring post-concussive symptoms. Therefore, a sensible and uniform definition of the PCS or PPCS end-point is a prerequisite of a good model.
A limitation of this study is that some of the predictors from validated models were not assessed in the CENTER-TBI study (e.g., early neck pain and coping styles), which prevented assessment of calibration intercepts and could have influenced other performance indices. Moreover, some predictors and outcomes were measured by different tools and instruments (e.g., medical history, psychological symptoms). The differences emphasize the importance of incorporating newly discovered predictors into the CDEs and using uniform instruments in TBI research. Additionally, the prognostic models we validated were selected based on our search strategy and eligibility criteria, and do not necessarily represent all existing prognostic models for mild TBI.
Further, a substantial percentage of CENTER-TBI patients did not have an assessment of 2-3 week post-injury symptoms and 6-month outcomes; therefore, the models that included 2-3 week symptoms were validated in a smaller and more favorable subsample. The response rate at 6 months was, however, in line with that in other observational studies in the field, and comparable with the response rate in the development studies. Patients with and without 6-month RPQ differed in some baseline characteristics, but without a clear pattern that would suggest a substantial systematic influence on the validation results. Further, CENTER-TBI core study included neurotrauma centers, recruitment was not consecutive, and patients without an indication for CT were not considered eligible. Therefore, the participants might not be representative of mild TBI patients in other hospital and non-hospital settings. The self-report instruments were administered in several languages and in several European countries, but the linguistic and cultural comparability was good (unpublished data). A major strength of this study is the use of a large sample of contemporary patients from different countries and numerous medical centers. In addition, all important indices relevant for external validation studies are reported. 70

Conclusion
We assessed the performance of several prognostic models for GOSE and PPCS. None of the models predicting GOSE have both good discriminative ability and good calibration in persons with mild TBI. Models for PPCS based on admission characteristics perform poorly, whereas models that included post-injury symptoms perform better in terms of discrimination and calibration. TBI-related and psychological symptoms collected at 2 weeks improve prediction and should be collected when possible. Novel predictors obtainable at admission, such as biomarkers, could be incorporated in future model developments. Future studies should improve prediction following mild TBI by developing models that (1) distinguish well between persons who will have longer-term negative outcomes and those who will not, (2) are calibrated to the population of mild TBI, (3) use relevant cutoffs and end-points for persons with mild TBI, such as return to normal life without TBIrelated symptoms, and (4) use predictors available at admission or before discharge, which are feasible to collect in clinical practice for early detection of persons with longer-term consequences. These models could be extended with symptoms collected at 2-3 weeks for later stage outcome prediction.
The CENTER-TBI study (EC grant 602150) has been conducted in accordance with all relevant laws of the European Union if directly applicable or of direct effect and all relevant laws of the country where the recruiting sites were located, including but not limited to, the relevant privacy and data protection laws and reg-ulations (the ''Privacy Law''), the relevant laws and regulations on the use of human materials, and all relevant guidance relating to clinical studies from time to time in force including, but not limited to, the ICH Harmonised Tripartite Guideline for Good Clinical Practice (CPMP/ICH/135/95) (''ICH GCP'') and the World Medical Association Declaration of Helsinki entitled ''Ethical Principles for Medical Research Involving Human Subjects''. Informed consent by the patients and/or the legal representative/next of kin was obtained, according to the local legislations, for all patients recruited in the core data set of CENTER-TBI and documented in the electronic case report form (e-CRF). Ethical approval was obtained for each recruiting site. The list of sites, ethical committees, approval numbers, and approval dates can be found on the Web site https://www.centertbi.eu/project/ethical-approval

Author Disclosure Statement
No competing financial interests exist.

Supplementary Material
Supplementary Figure Table S1  Supplementary Table S2  Supplementary Table S3  Supplementary Table S4  Supplementary Table S5  Supplementary Table S6  Supplementary Table S7  Supplementary Table S8  Supplementary Table S9