Longitudinal Internal Validity of the Quality of Life after Brain Injury: Response Shift and Responsiveness

The Quality of Life after Brain Injury (QoLIBRI) questionnaire was developed and validated to assess disease-specific health-related quality of life (HRQoL) in individuals after TBI. The present study aims to determine its longitudinal validity by assessing its responsiveness and response shift from 3 to 6 months post-injury. Analyses were based on data from the European longitudinal observational cohort Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury study. A total of 1659 individuals recovering from TBI were included in the analyses. Response shift was assessed using longitudinal measurement invariance testing within the confirmatory factor analyses framework. Responsiveness was analyzed using linear regression models that compared changes in functional recovery as measured by the Glasgow Outcome Scale–Extended (GOSE) with changes in the QoLIBRI scales from 3 to 6 months post-injury. Longitudinal tests of measurement invariance and analyses of discrepancies in practical significance indicated the absence of response shift. Changes in functional recovery status from three to six months were significantly associated with the responsiveness of the QoLIBRI scales over the same time period. The QoLIBRI can be used in longitudinal studies and is responsive to changes in an individual’s functional recovery during the first 6 months after TBI.


Introduction
Health-related quality of life (HRQoL) refers to the assessment of an individual's perceived mental, physical, and social well-being and the overall ability to perform daily life activities [1]. Ideally, HRQoL instruments provide standardized information about the status and extent of limitations in a patient's subjective experience of a medical condition or its treatment [2], capturing valuable insight into self-perceived health status and recovery patterns [3]. Measures of HRQoL can be generic or disease-specific. While generic measures allow for comparisons between different conditions and the general population, diseasespecific instruments are more sensitive to selected health conditions, more precise in assessing their symptoms [4], and allow for a better prognosis, therapy, and rehabilitation recommendation compared with generic HRQoL measures. Due to this gain in nuanced information, disease-specific HRQoL is a recommended outcome that should be assessed in complex, heterogeneous diseases, such as traumatic brain injury (TBI) [3,5].

1.
Evaluation of response shift from three to six months after TBI using the longitudinal measurement invariance testing approach; 2.
Assessment of responsiveness as the ability of the QoLIBRI to detect changes in the patient's functional recovery status, as measured by the Glasgow Outcome Scale-Extended (GOSE) [46], 3 to 6 months after injury.
Evidence of the absence of response shifts and the presence of responsiveness would suggest that the QoLIBRI can be used for longitudinal assessment of disease-specific HRQoL after TBI.

Materials and Methods
The present study utilizes data (core data set 3.0) from the prospective, longitudinal, observational Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury study (CENTER-TBI; EC grant 602150; clinicaltrials.gov NCT02210221) aimed to improve characterization and classification of TBI. Inclusion criteria were a clinical diagnosis of TBI, clinical indication for a CT scan, presentation within 24 h after injury, and informed consent which was obtained according to local and national requirements. To avoid confounding outcome assessments, individuals with severe preexisting neurological disorders were excluded from the study [47]. Overall, N = 4509 (99.9% civilians) eligible patients included in the core study were stratified into 3 strata based on the clinical care pathways: emergency room (ER; discharge after ER admission), ward (admission to a hospital ward), and intensive care unit (ICU; admission to the ICU). Details on core sample characteristics are described elsewhere [48].
The present study focused on individuals after TBI who were at least 16 years old and had a GOSE status of 3 or higher at 3 and 6 months after injury. Responsiveness analyses were conducted with individuals for whom the QoLIBRI total score and 6 scale scores could be calculated at 3 and 6 months after injury (Sample 1: N = 1659). Confirmatory factor analyses and measurement invariance analyses were conducted only with complete QoLIBRI item data at 3 and 6 months post-injury (Sample 2: N = 1390). For more details on study sample attrition, see Figure 1.
of an instrument varies across the literature, with the significance of change being potentially susceptible to the evaluator's subjective judgment [44,45].
The present study aims to evaluate the longitudinal internal validity of the QoLIBRI. The assessment is conducted in two steps: 1. Evaluation of response shift from three to six months after TBI using the longitudinal measurement invariance testing approach; 2. Assessment of responsiveness as the ability of the QoLIBRI to detect changes in the patient's functional recovery status, as measured by the Glasgow Outcome Scale-Extended (GOSE) [46], 3 to 6 months after injury.
Evidence of the absence of response shifts and the presence of responsiveness would suggest that the QoLIBRI can be used for longitudinal assessment of disease-specific HRQoL after TBI.

Materials and Methods
The present study utilizes data (core data set 3.0) from the prospective, longitudinal, observational Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury study (CENTER-TBI; EC grant 602150; clinicaltrials.gov NCT02210221) aimed to improve characterization and classification of TBI. Inclusion criteria were a clinical diagnosis of TBI, clinical indication for a CT scan, presentation within 24 h after injury, and informed consent which was obtained according to local and national requirements. To avoid confounding outcome assessments, individuals with severe preexisting neurological disorders were excluded from the study [47]. Overall, N = 4509 (99.9% civilians) eligible patients included in the core study were stratified into 3 strata based on the clinical care pathways: emergency room (ER; discharge after ER admission), ward (admission to a hospital ward), and intensive care unit (ICU; admission to the ICU). Details on core sample characteristics are described elsewhere [48].
The present study focused on individuals after TBI who were at least 16 years old and had a GOSE status of 3 or higher at 3 and 6 months after injury. Responsiveness analyses were conducted with individuals for whom the QoLIBRI total score and 6 scale scores could be calculated at 3 and 6 months after injury (Sample 1: N = 1659). Confirmatory factor analyses and measurement invariance analyses were conducted only with complete QoLIBRI item data at 3 and 6 months post-injury (Sample 2: N = 1390). For more details on study sample attrition, see Figure 1.

Sociodemographic Data
Sociodemographic data used in the present study were collected at enrollment and included the age, sex, marital status, educational level, and employment status of the study participants.

Injury-Related Data
The severity of TBI was assessed using the Glasgow Coma Scale (GCS) [49], with values of 13-15 indicating mild, 9-12 moderate, and 3-8 severe TBI. It is often accessed several times within the first 24 h after injury (at the scene of the accident, in the first hospital, if the patient did not arrive directly at the study hospital, at the ER of the study hospital, and by post-stabilization). The GCS score was centrally imputed using IMPACT methodology [50]: The first post-stabilization value was taken and, if absent, the next available value back in time towards values at the accident scene. The GCS score was combined with the presence of intracranial abnormalities (ICA) detected by CT. Individuals were grouped into 4 categories according to the following cut-offs: uncomplicated mild (GCS ≥ 13 without ICA), complicated mild (GCS ≥ 13 with ICA), moderate (9 ≤ GCS ≤ 12), and severe TBI (GCS ≤ 8).
Injury severity was measured with the Injury Severity Score (ISS) [51], identifying three of twelve mostly injured body regions. The ISS is calculated as a sum of squares of the three body regions with the highest score. The maximal score for the ISS is 75. If any of the regions are assigned a score of 6, the ISS is automatically set to 75. Other TBI-related factors included the cause of injury (fall, road traffic accident, violent/other), clinical care pathways (emergency room (ER), admission, intensive care unit (ICU)), and the length of the hospital stay (in days).
Functional recovery status after TBI was rated using the GOSE [46] at three and six months after injury. The GOSE scores range from 1 to 8, covering the following functional recovery status: 1 = death, 2 = vegetative state, 3-4 = severe disability, 5-6 = moderate disability, and 7-8 = good recovery. The GOSE score was computed as a composite score combining information from the interview or, if not available, from the postal questionnaire (GOSE-Q [52] completed either by individuals after TBI or caretakers) or based on interviewer ratings for survivors. Since the GOSE-Q cannot distinguish between a vegetative state (score of 2) and a lower severe disability (score of 3), both categories were collapsed into one. Only patients with values of three and above participated in our study. Based on changes in the recovery status between three and six months post-injury, individuals after TBI were divided into three groups: if recovery status had changed to better, they were assigned to the "improved" group. Those with unchanged recovery status were attributed to the "stable" group, and the remainder to the "worsened" group.
The QoLIBRI [26,27] was used to assess TBI-specific HRQoL. The instrument comprises 37 Likert-type scaled items with five response options ("Not at all", "Slightly", "Moderately", "Quite", and "Very") forming six domains (Cognitive, Self, Autonomy & Daily Life, Social Relationships, Emotions, and Physical Problems). The total score can be calculated with more than one-third of the responses and scaled to vary from 0 (worst possible HRQoL) to 100 (best possible HRQoL).

Sample Characteristics
We tested if patients who responded to the QoLIBRI at 3 and 6 months post-injury assessment differed from other TBI patients within six months after injury concerning age, sex, and injury-related characteristics such as clinical care pathways, TBI, and injury severity. The Welch two-sample t-test was applied for metric variables and Pearson's chi-squared tests or permutation-based chi-squared tests (n < 5 observations per cell; N = 5000 permutations) for categorical ones. The significance level was set to α = 0.05.

Confirmatory Factor Analysis
An optimal instrument structure at 3-and 6-month assessments was tested via confirmatory factor analysis (CFA) with the robust weighted least squares estimator (WLSMV) using the lavaan-package [53] in R [54]. Model fit was assessed with the scaled chi-square statistics, comparative fit index (CFI), root mean square error of approximation (RMSEA) with a 90-percent confidence interval, and standardized root mean square residual (SRMR). The standard cut-offs for CFI (>0.95), RMSEA (<0.06 for an excellent and 0.05 < RMSEA < 0.10 for a mediocre fit), and SRMR (<0.08) [55][56][57], indicating good model fit, have not been yet validated for the WLSMV estimator. Therefore, we tested four concurrent factorial structures (with 1. one factor, 2. two correlated factors, one factor that underlies positively formulated items and one negatively, 3. six correlated factors corresponding to QoLIBRI scales, 4. six correlated factors, and one common factor of higher order) and compared all fit indices across models. For the scaled chi-square difference test, α-level was set to 0.05.

Response Shift
Response shift of each QoLIBRI scale was determined by testing for measurement invariance across the two time points within the framework of CFA for ordinal variables [58]. The content interpretation was performed based on Oort (2005) [59].
First, we tested a configural model with the same number of latent factors and the same pattern of zero and non-zero loadings across two time points. Failure of the model would indicate that participants reconceptualized their understanding of the HRQoL construct by the follow-up, attributing items to other factors.
Next, a loading model, constraining loadings of the configural model to be equal across time points, was investigated. The scaled chi-square test (α = 0.05) was used for model comparison. A worsened model fit would mean a response shift due to reprioritization: by the follow-up, the importance of some items has changed for the construct estimation.
Then, in the threshold model, we constrained the loading model's thresholds to be equal across time points. A worsened model fit would mean a response shift due to recalibration (i.e., a change in the interpretation of response options): even if a participant reports the same level of HRQoL, he or she would score differently at the second time point. Invariance on the threshold level would be sufficient to conclude that the observed differences in the score means are due to the differences in the latent factor means [60].
Finally, in the residual model, we additionally constrained residual variances to be equal across time points. In the case of invariance, one can conclude that the responses showed no response shift, and all observed differences in the score means, variances, and covariances came from the corresponding differences in the latent factors.
In all models, latent factors were allowed to freely covary across time points. Residuals were allowed to freely correlate with themselves but not with other residuals across time points. The model specification was based on the marker item approach implying selecting marker variables based on the smallest difference in loadings over time [58]. In case where the loading, threshold, or residual model is non-invariant, Liu et al. (2017) [58] suggest testing if violations of invariance have a practical significance. The estimated parameters of each invariance model can be applied to calculate probabilities of choosing a particular response category under the corresponding model. In the case that, for example, the loading model holds, but the threshold model does not, the practical significance of an invariance violation can be attached through the differences in estimated response probabilities under the loading and threshold model for each item and all time points. Differences not exceeding 5% were considered negligible.

Responsiveness
Responsiveness of an instrument reflects the extent to which changes in the measure relate to corresponding changes in an external reference measure over a defined time course. Regression analyses have proven useful in assessing responsiveness, as the regression coefficient (b) provides an easily interpretable index, and a goodness-of-fit assessment can be employed to check for the plausibility of the model [45]. Therefore, to evaluate the responsiveness of the QoLIBRI, linear regressions were calculated to find associations between the change in the QoLIBRI total and scale scores from three to six months postinjury and the change in the GOSE score from three to six months, using the R-package 'lme4' [61]. The GOSE change was included as an ordinal variable using two orthogonal contrasts to detect linear and non-linear (i.e., quadratic) associations with the QoLIBRI scores. Age, sex, marital status, education level, employment status, TBI severity, ISS, cause of injury, clinical care pathways, and length of hospital stay were used as covariates. Multiple imputations with the R-package 'mice' [62] was used to handle missing covariate data. The level of significance was set to α = 0.05 for the total score and was Bonferronicorrected for the scale scores (α = 0.05/6 = 0.008).
The reliability of change scores was evaluated by receiver operating characteristic (ROC) curves with the R-package 'pROC' [63] containing information on the sensitivity and specificity of the QoLIBRI change scores by discrimination between stable and improved patients. The area under the curve (AUC) was calculated; its values can vary between 0.5 and 1.0, with 1.0 indicating perfect discrimination and 0.5 discrimination not better than by chance [64].
Finally, we compared QoLIBRI total and scale scores between three and six months post-TBI using the Wilcoxon signed rank test for dependent samples to provide an overview of changes in the reported scores. The effect size r was calculated using the 'wlicox_effsize' function from the R package 'rstatix' [65]. Interpretation of values was based on conventional cut-offs: 0.10 ≤ r < 0.30 (small effect), 0.30 ≤ r < 0.50 (medium effect), and r ≥ 0.50 (large effect) [66]. Visualization was performed using strip plots showing trajectories of changes in HRQoL from three to six months.

Study Participants
A total of 1659 patients (64.9% male) with a mean age of 49.61 years (SD = 19.15) were included in the present study (Sample 1, see Figure 1). The majority (approx. 77%) were either admitted to a hospital ward or an ICU after injury, while the remaining 23% were discharged after visiting the ER. Based on the GCS and information on intracranial abnormalities on the CT scans, 32.9% of the sample were diagnosed with uncomplicated mild TBI (i.e., GCS ≥ 13 and no abnormalities), 30.2% with complicated TBI (i.e., GCS ≥ 13 and visible intracranial abnormalities), 6.8% with moderate, and 14.2% with severe TBI; 6 months after TBI, 13.6% of participants showed a worsened functional state as rated by the GOSE, while for 55.3%, a stable, and for 31.1%, an improved functional status was observed. The patients' mean ISS score was 18.04 (SD = 14.85). Participants included in the analysis sample differed significantly from those not included in all characteristics except for age. For more details, see Table 1. For the sample characteristics of the reduced sample (Sample 2, see Figure 1) as well as for the comparison between included and excluded individuals, see Appendix A, Table A1. Table 2 provides the CFA results for 3 and 6 months post-injury, respectively. At both time points, model fit indices remarkably improved by the 6-factor and secondorder models with CFIs above 0.949 and RMSEA values under 0.069. The scaled chisquare test identified that the 6-factorial structure fits best the QoLIBRI data at time points 3 (χ 2 (614) = 4121.54, p < 0.001, CFI = 0.954, RMSEA = 0.064, CI 90% [0.062; 0.066]) and 6 months post-injury (χ 2 (614) = 3775.89, p < 0.001, CFI = 0.963, RMSEA = 0.061, CI 90% [0.059; 0.063]). Therefore, this model was used for further analyses. Table 3 presents the results from the longitudinal measurement invariance test for the QoLIBRI scales from three to six months after TBI. For each scale, we estimated four invariance models. In all cases, the CFI was high (≥0.971), and the SRMR did not exceed the cut-off of 0.06. Both indices showed minimal variation across invariance models for each scale, suggesting an adequate fit. However, the RMSEA of the scales Self (configural and loading model) and Social Relationships (configural model) was slightly increased (>0.10), indicating a worse model fit. Note. n = absolute frequencies, p = p-value, M = mean, SD = Standard deviation. Welch's t-test was used for all continuous variables due to non-normal distribution; χ 2 -tests and permutation-based χ 2 -tests (n < 5 observations per cell; N = 5000 permutations) were used for categorical data.  The scaled chi-square difference test indicated for all scales that adding loading invariance constraints did not significantly worsen the model fit when compared to the configural baseline model. For three of the six scales (Daily Life & Autonomy, Social Relationships, and Physical Problems), the scaled chi-square difference test indicated that the threshold model fit does significantly worsen the data in comparison to the loading invariance model. As the threshold invariance model did not hold up, an analysis of practical significance was conducted for all three scales [58]. Predicted probabilities for each of the scales showed only minimal differences between invariance models cross-sectionally and longitudinally. Analysis of the practical significance of invariance violation identified that the discrepancies in the estimated probabilities to choose a particular response option under concurrent models (e.g., loading and threshold invariance models for the Social Relationships scale) showed absolute differences not exceeding 2% (see Supplemental Material S2, Practical Significance-Discrepancies between Invariance Models). According to Liu and colleagues [58], small discrepancies in the predicted probabilities (<0.05) can be neglected as they only represent relatively few individuals. Therefore, we can assume that violations from threshold invariance for the Daily Life & Autonomy, Social Relationships, and Physical Problems scales were not caused by a response shift from three to six months post injury, and threshold invariance can be assumed.

Response Shift
For the remaining scales (Cognition, Self, and Emotions), the threshold invariance model held up compared to the loading invariance model. Consequently, the residual (unique factor) invariance model was tested. Scaled chi-square difference analysis indicated non-invariant residuals for all three scales. Again, an analysis of the practical significance of the failure of the residual invariance model was conducted. Predicted probabilities for each of the scales showed only minimal differences from three to six months (see Supplemental Material S2, Practical Significance-Discrepancies between Invariance Models). Consequently, their contribution to the latent construct HRQoL assessed by the QoLIBRI remained unchanged between the two measurement occasions, and observed differences in the scores over time can be attributed to true changes in the HRQoL.
The most pronounced changes between three and six months were observed on the Daily Life & Autonomy scale, where the participants became more likely to choose a higher response category. For instance, the predicted probability of choosing the answer "very" to the question "How satisfied are you with your ability to carry out domestic activities?" increased from 0.41 to 0.47 in the loading invariance model. The predicted probability for the question "How satisfied are you with the extent of your independence from others?" changed from 0.36 to 0.42 from 3 to 6 months in the response category "very" in the loading invariance mode. Furthermore, the item "How satisfied are you with your ability to get out and about" showed an increase in the endorsement of the higher response category from 0.46 to 0.52. For details, see Supplemental Material S2, Practical significance (Daily Life & Autonomy). These two questions represent the largest deviations in predicted probabilities for all scales over time, indicating that the scale may be more sensitive to measure change.

Responsiveness
According to the change in GOSE score between 3 and 6 months after TBI, recovery improved in 31.1% (n = 516) of our sample, 55.3% (n = 918) were classified as stable, and 13.6% (n = 225) had worsened recovery status. Change in the GOSE score, considered as a linear effect, contributed significantly to the change in the QoLIBRI total score, B = 4.129, t(1637) = 6.073, p < 0.001. At the scale level, a significant effect of GOSE change was observed for all scales except the Emotions scale. The proportion of variance in the change of the QoLIBRI score explained by the change in the GOSE scores was 4% for the total score and varied from 1% (Emotions scale) to 6% (Daily Life & Autonomy scale). Results showed no significant influence of the change in the GOSE score, considered as a quadratic effect. For details, see Table 4 and Appendix A, Table A2.  Table A2). B = unstandardized regression coefficient, SE = standard error, b = standardized regression coefficient, t = t-value, p = p-value. Values in bold are significant at 5% (total score) or at 0.8% (scale scores; Bonferroni-adjusted marked with *).
The average scores for the change in the QoLIBRI in relation to the change in the GOSE scores are shown in Figure 2. The most pronounced increase in the QoLIBRI score in relation to the GOSE score was observed in the Daily Life & Autonomy scale, followed by the Physical scale for those with improved functional recovery. For those with a worsened recovery status, the greatest decrease in the QoLIBRI scores was found in the Cognition and Self scales.  Table A2). B = unstandardized regression coefficient, SE = standard error, b = standardized regression coefficient, t = t-value, p = pvalue. Values in bold are significant at 5% (total score) or at 0.8% (scale scores; Bonferroni-adjusted marked with *).
The average scores for the change in the QoLIBRI in relation to the change in the GOSE scores are shown in Figure 2. The most pronounced increase in the QoLIBRI score in relation to the GOSE score was observed in the Daily Life & Autonomy scale, followed by the Physical scale for those with improved functional recovery. For those with a worsened recovery status, the greatest decrease in the QoLIBRI scores was found in the Cognition and Self scales. Additional ROC analysis showed that based on the QoLIBRI change scores, 60.4% of the individuals after TBI were correctly classified as presenting an improved vs. a not improved (i.e., stable or worsened) recovery status. The cut-off for the QoLIBRI change score, which maximizes sensitivity (55.4%) and specificity (63.9%) by distinguishing between improved and non-improved patients, was 2.5. For details, see  Additional ROC analysis showed that based on the QoLIBRI change scores, 60.4% of the individuals after TBI were correctly classified as presenting an improved vs. a not improved (i.e., stable or worsened) recovery status. The cut-off for the QoLIBRI change score, which maximizes sensitivity (55.4%) and specificity (63.9%) by distinguishing between improved and non-improved patients, was 2.5. For details, see Figure 3. The correctness of patients' classification for scales ranged from 55.1% (Social Relationships) to 62.7% (Daily Life & Autonomy). The results of ROC analyses on the scale level are shown in Appendix A, Figure A1. We observed a significant difference in the QoLIBRI total score from 3 to 6 months after TBI (V = 518408, p < 0.001, r = 0.12 corresponding to a small effect). The effect was mainly driven by the difference in the Daily Life & Autonomy and Physical Problems scales, both showing significant improvement from 3 to 6 months (p < 0.001). For visualization, see Figure 4 (QoLIBRI total score) and Figure A2 (for the scale scores).  We observed a significant difference in the QoLIBRI total score from 3 to 6 months after TBI (V = 518408, p < 0.001, r = 0.12 corresponding to a small effect). The effect was mainly driven by the difference in the Daily Life & Autonomy and Physical Problems scales, both showing significant improvement from 3 to 6 months (p < 0.001). For visualization, see Figure 4 (QoLIBRI total score) and Figure A2 (for the scale scores). We observed a significant difference in the QoLIBRI total score from 3 to 6 months after TBI (V = 518408, p < 0.001, r = 0.12 corresponding to a small effect). The effect was mainly driven by the difference in the Daily Life & Autonomy and Physical Problems scales, both showing significant improvement from 3 to 6 months (p < 0.001). For visualization, see Figure 4 (QoLIBRI total score) and Figure A2 (for the scale scores).

Discussion
The present study aimed to assess the longitudinal internal validity of the QoLIBRI measuring TBI-specific HRQoL. This is the first study to assess response shift and responsiveness of the QoLIBRI for individuals between 3 and 6 months post TBI. This time frame is critical for patients as recovery mostly occurs in the first six months. For example, Gardner et al. [67] found that the majority of individuals after TBI achieve good to moderate recovery within the first half year after injury, with 70.9% following a gradual trajectory between the time points. Other studies also suggest that at least moderate recovery is reached within six months across all TBI severity groups [68,69].
However, most research has focused on functional recovery, thus neglecting domains that are additionally relevant to better understanding patients' needs and facilitating the recuperation process [35]. In this context, especially the administration of PROMs (e.g., HRQoL measurement) can be considered as a comprehensive, economical, and reliable source of information complementary to the GOSE. Assessing both potential response shifts and responsiveness is critical to learning about actual changes over time when using PROMs longitudinally. In the present study, the absence of response shift and the demonstrated responsiveness of the QoLIBRI to the GOSE-assessed recovery status suggest that the instrument is useful for follow-up assessments during at least the first six months after injury. Some further aspects will be discussed in the following paragraphs.

Response Shift
For three QoLIBRI scales (i.e., Daily Life & Autonomy, Social Relationships, and Physical Problems), the longitudinal loading invariance model was attained, indicating that changes over time in the expected means measured by these QoLIBRI scales are entirely attributable to changes in the common latent factors over time [58]. Thus, the latent construct "HRQoL" estimated by the QoLIBRI remained unchanged between two measurement occasions, and observed differences in the scores over time can be attributed to true changes in the HRQoL. The other three scales (i.e., Cognition, Self, and Emotions) reached threshold invariance, pointing out that the observed differences in the score means are due to the differences in the latent factor means [60].
To better understand the discrepancies in response behavior from three to six months, a sensitivity analysis of the practical significance of the failure of invariance was conducted for all scales. The predicted probabilities revealed minor changes in response categories from three to six months post-TBI, with the largest variations on the Daily Life & Autonomy scale, where participants tended to endorse higher response categories indicating better HRQoL at six months post-TBI compared to three months. This may be explained by the progressive recovery process increasing their satisfaction with the autonomy they have gained during recovery. In our study, items measuring satisfaction with the level of ability to perform domestic activities, the level of independence from others, and the ability to get out and about showed increased endorsement six months after TBI. These findings are supported by previously published research suggesting considerable improvement in daily living activities during the first year post-injury [70].
In addition, discrepancies in the predicted probabilities between the retained and rejected measurement invariance models were calculated for each scale. However, none of the discrepancies between invariance models exceeded the 5% threshold, indicating that these differences can be neglected and the assumption of longitudinal measurement invariance can be retained.
The absence of response shift may also be explained by our study sample characteristics, as most of our patients had experienced a mild TBI. The intensive recovery processes for mild TBI occurs within the first weeks/months after injury [71]. In our sample, almost 60% of the individuals after TBI presented a good recovery at 3 months post-injury, which is comparable to other studies [67,68]. Moreover, patients after moderate and severe TBI who participated in the study might have felt generally better already when entering the study than those who did not take part. Therefore, adaptation and coping processes, which generally cause response shifts [43], might not have been pronounced in the individuals in our study sample.

Responsiveness
Linear regression modeling indicated that the linear change in the GOSE from three to six months was significantly associated with the change in HRQoL. This is in line with a recent study showing that the QoLIBRI was one of the most sensitive instruments for recovery status at 3 different time points (3,6, and 12 months after TBI) across different patient groups [35]. Furthermore, some other studies have shown a significant association between unfavorable recovery and reduced (TBI-specific) HRQoL [29,48,[72][73][74].
The only exception among the QoLIBRI scales was the change in the Emotions scale, which showed no significant association with the change in recovery status after Bonferroni adjustment for multiple testing. This may be explained by the fact that the GOSE focuses rather on functional (dis)ability, independence, social and leisure activities, and return to normal life after TBI and neglects the emotional status. As emotional well-being is crucial for the healing and recovery process and improvement of HRQoL in individuals after TBI [75], its assessment and, if necessary, treatment are mandatory in individuals after TBI. However, recent rehabilitation studies indicate a lack of services and treatment for post-TBI mental health conditions at all levels of severity [76,77].
Overall, the QoLIBRI and its scales appear to be sensitive to positive and negative changes in the participants' HRQoL. Additional ROC analysis indicated that a QoLIBRI change score of 2.5 or higher could indicate significantly improved functional status and vice versa. Overall, a QoLIBRI change score of 2.5 correctly identified a significant change in recovery status in 60% of participants.
Finally, since the QoLIBRI can be considered responsive to changes in recovery from TBI and since we can assume that it measures true changes in TBI-specific HRQoL longitudinally (i.e., in a time frame of six months after injury), we can conclude that HRQoL improves significantly between three and six months, especially in terms of autonomy in daily living and physical problems.

Strengths and Limitations
To our knowledge, this is the first study systematically analyzing response shifts and responsiveness of the QoLIBRI and its scales. The main advantage is the large sample size, which also reflects the epidemiological distribution of the TBI severity in the general TBI population, allowing us to draw reliable conclusions. Some limitations should nevertheless be mentioned. Although advantageous for comparing with the general TBI population, the uneven distribution of TBI severity may pose some problems. The relatively small number of individuals after moderate and severe TBI does not allow for additional investigation of response shifts and responsiveness in these groups. Considering that higher TBI severity may be associated with a more pronounced decrease in TBI-specific HRQoL [78], additional analyses within the severe TBI group would be beneficial to gain more insight into potential changes in their HRQoL over time and the ability of the QoLIBRI to capture them. Furthermore, as participants included in the sample differed significantly from those not included concerning all characteristics except for age, the results should be interpreted with caution. Finally, the proportion of variance explained (4%) in the change in QoLIBRI score by the change in GOSE suggests that other factors not considered in the present study also contribute to changes in HRQoL. Therefore, the results should not be overinterpreted.
Future research using instruments other than QoLIBRI to assess TBI-specific HRQoL (e.g., TBI-QOL [79]) and data from other studies for external validation should be conducted to provide further evidence of the longitudinal validity of the QoLIBRI. Furthermore, the inclusion of additional time points after TBI would be beneficial to gain more insight into the variability of changes in TBI-specific HRQoL. Due to the design of the CENTER-TBI study, individuals after TBI who were admitted to an ER and subsequently discharged were only included in follow-up analyses up to six months after injury. Therefore, we were unable to perform analyses with this substantial group (i.e., 23%) beyond this time frame, which would have introduced sample bias due to the overrepresentation of severe or complex cases [80]. Thus, we decided to limit our analyses to the first six months after TBI. In addition, analyses of responsiveness to other relevant clinical comorbid conditions (e.g., depression, anxiety, post-traumatic stress disorder, post-concussion symptoms) may provide a more complete picture of how changes in TBI-specific HRQoL are related to outcomes other than post-TBI functional recovery. First, studies on the simultaneous consideration of these outcome domains point to their relevant impact on TBI-specific HRQoL [35,74]. Additional evidence of recovery using objective approaches such as CT and/or MRI, which were not available in the present study sample, would be beneficial for a more accurate, externally validated assessment of recovery.

Conclusions
Given the long-term impact of TBI on an individual's life, and the heterogeneous pathways to symptom resolution and its potential negative impact on HRQoL, it is crucial to monitor outcomes such as HRQoL over time and not to rely solely on changes in functional status (i.e., the GOSE). To evaluate developmental trends straightforward, longitudinal validity, including response shift and responsiveness of the instruments, had to be established. Our results indicate the QoLIBRI can detect a true change in the underlying HRQoL construct and is sensitive to detecting changes in functional recovery status. The QoLIBRI, therefore, can be considered a valuable instrument to gain nuanced insight into the longitudinal development of recovery patterns and self-perceived health status in patients affected by TBI.

Institutional Review Board Statement:
The CENTER-TBI study (EC grant 602150) has been conducted in accordance with all relevant laws of the European Union (EU) if directly applicable or of direct effect and all relevant laws of the country where the recruiting sites were located. Informed consent by the patients and/or the legal representative/next of kin was obtained, accordingly to the local legislation, for all patients recruited in the Core Dataset of CENTER-TBI and documented in the electronic case report form (e-CRF). For the full list of sites, ethical committees, and ethical approval details, see the official CENTER-TBI website (https://www.center-tbi.eu/project/ethical-approval, accessed on 4 November 2021).

Informed Consent Statement:
Written informed consent was obtained from all subjects involved in the study.
Data Availability Statement: All relevant data are available upon request from CENTER-TBI, and the authors are not legally allowed to share it publicly. The authors confirm that they received no special access privileges to the data. CENTER-TBI is committed to data sharing and, in particular, to responsible further use of the data. Hereto, we have a data sharing statement in place: https://www.center-tbi.eu/data/sharing (accessed on 27 April 2023). The CENTER-TBI Management Committee, in collaboration with the General Assembly, established the Data Sharing policy and Publication and Authorship Guidelines to ensure the correct and appropriate use of the data as the dataset is hugely complex and requires the help of experts from the Data Curation Team or Bio-Statistical Team for correct use. This means that we encourage researchers to contact the CENTER-TBI team for any research plans and the Data Curation Team for any help in the appropriate use of the data, including sharing of scripts. Requests for data access can be submitted online: https://www.center-tbi.eu/data (accessed on 27 April 2023). The complete Manual for data access is also available online: https://www.center-tbi.eu/files/SOP-Manual-DAPR-2402020.pdf (accessed on 27 April 2023).

Acknowledgments:
The authors would like to thank all study participants and investigators of the CENTER-TBI study. The authors would also like to thank Anastasia Gorbunova for the very helpful preparatory work she did for this publication.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.