Psychometric Properties of the German Version of the Quality of Life after Brain Injury Scale for Kids and Adolescents (QOLIBRI-KID/ADO) Using Item Response Theory Framework: Results from the Pilot Study

Health-related quality of life (HRQOL) is an important indicator for recovery after pediatric TBI. To date, there are a few questionnaires available for assessing generic HRQOL in children and adolescents, but there are not yet any TBI-specific measures of HRQOL that are applicable to pediatric populations. The aim of the present study was to examine psychometric characteristics of the newly developed Quality of Life After Brain Injury Scale for Kids and Adolescents (QOLIBRI-KID/ADO) questionnaire capturing TBI-specific HRQOL in children and adolescents using an item response theory (IRT) framework. Children (8–12 years; n = 152) and adolescents (13–17 years; n = 148) participated in the study. The final version of the QOLIBRI-KID/ADO, comprising 35 items forming 6 scales, was investigated using the partial credit model (PCM). A scale-wise examination for unidimensionality, monotonicity, item infit and outfit, person homogeneity, and local independency was conducted. The questionnaire widely fulfilled the predefined assumptions, with a few restrictions. The newly developed QOLIBRI-KID/ADO instrument shows at least satisfactory psychometric properties according to the results of both classical test theoretical and IRT analyses. Further evidence of its applicability should be explored in the ongoing validation study by performing multidimensional IRT analyses.


Introduction
Pediatric traumatic brain injury (TBI) is a leading cause of death and disability among children and adolescents worldwide [1]. It represents a substantial burden for those affected and their next of kin. Short-or long-time consequences cover a broad range of functional [2], cognitive [3], developmental [2], and behavioral [4] problems and impair health-related quality of life (HRQOL) [5]. In particular, moderate to severe TBI in children and adolescents can hamper family functioning longitudinally [6].
HRQOL is an important indicator for recovery after adult and pediatric TBI [7]. One can distinguish between generic and disease-specific HRQOL instruments. Generic instruments typically cover a broader area of health [7], whereas specific instruments focus on a

Ethical Approval
The QOLIBRI-Kid/Ado study was conducted in accordance with all relevant laws of Germany including but not limited to the ICH Harmonized Tripartite Guideline for Good Clinical Practice ("ICH GCP") and the World Medical Association Declaration of Helsinki ("Ethical Principles for Medical Research Involving Human Subjects"). The study attained

Ethical Approval
The QOLIBRI-Kid/Ado study was conducted in accordance with all relevant laws of Germany including but not limited to the ICH Harmonized Tripartite Guideline for Good Clinical Practice ("ICH GCP") and the World Medical Association Declaration of Helsinki ("Ethical Principles for Medical Research Involving Human Subjects"). The study attained ethical clearance at each recruitment center and informed consent for all participants according to the German law for data protection (General Data Protection Regulation, DSGVO). The Ethics Committee of the University Medical Center Göttingen approved the study (application no.: 19/4/18).

Sociodemographic and Injury-Related Data
Participants' characteristics were gathered at the time of study enrollment. Sociodemographic data comprised information on gender and age. Injury-related information included TBI severity (mild, moderate, severe), number of lesions (no lesions vs. at least one), and time since injury in years. With the Kings Outcome Scale for Childhood Head Injury (KOSCHI) [23], functional recovery after TBI was assessed using the following categories: 1 = 'dead', 2 = 'vegetative state', 3a = 'lower severe disability', 3b = 'upper severe disability', 4a = 'lower moderate disability', 4b = 'upper moderate disability', 5a = 'good recovery', or 5b = 'intact recovery'.

QOLIBRI-Kid/Ado
The newly developed QOLIBRI-KIDO/ADO questionnaire [19] comprises 35 items associated with 6 scales assessing 'Cognition' (7 items), Self' (5 items), 'Autonomy and Daily Life' (7 items), 'Social' (6 items), 'Emotions' (4 items), and 'Physical' (6 items) domains. Items are answered on a five-point Likert scale format ('not at all', 'slightly', 'moderately', 'quite', 'very'). Whereas the first four scales assess the satisfaction in the respective domains (i.e., 'How satisfied are you with . . . '), the last two scales focus on feelings of being bothered (i.e., 'How bothered are you with . . . ') in the respective arears. The calculation of the scale and the total scores comprises a transformation to a 0-100 scale for better interpretation, with higher values indicating better HRQOL. For more details on the scales and items of the questionnaire, see Appendix A, Table A1-QOLIBRI-KID/ADO: Scales and Items.

Statistical Analyses
First, descriptive statistics for sociodemographic, injury-related, and scale characteristics for both age groups (i.e., children and adolescents) and the total sample were obtained. Then, response patterns based on age groups (i.e., children and adolescents), TBI severity groups (i.e., mild, moderate, and severe TBI), and study setting (i.e., offline vs. online) were investigated. The distribution of the response behavior indicated whether all five response categories were equally chosen by the participants of our study sample.
Considering the ordered ordinal nature of item responses, we applied a unidimensional partial credit model (PCM) [24] to investigate psychometric properties of the QOLIBRI-KID/ADO on the scale level. The PCM is an extension of a dichotomous Rasch model comprising estimation of thresholds, which are also known as item step difficulty parameters [25]. The item parameter (β) measuring item difficulties was estimated using conditional maximum likelihood [25]. Person parameter (θ) measuring latent trait (i.e., the extent of TBI-related HRQOL) estimation occurred using joint maximum likelihood [25].
Category characteristic curves (CCC) and person-item maps were used to visualize probabilities of response categories' endorsements in relation to person parameters and, subsequently, to detect potential threshold-ordering problems. Person-item maps provided summarized information of the item parameters on the scale level. Model assumptions comprising unidimensionality (also referred to as item homogeneity), person homogeneity, local independence, and monotonicity were investigated and evaluated according to the criteria listed below.
Unidimensionality assumption postulates that all items of one scale measure the same latent trait as reflected by the person parameter (θ). Unidimensionality was assessed using Martin-Loef's likelihood ratio test (LRT) [26] applying internal split criterion (i.e., the median). Non-significant test values indicate no violation of unidimensionality assumption.
Monotonicity assumption states that, with a higher level of the latent trait (θ), the probability of choosing a higher response category increases monotonically [27,28]. Contrary to previous assumptions [27], the monotonicity assumption also applies to polytomous response formats [28]. Therefore, this was investigated by relating the probability of answering to the latent trait. For this purpose, the so-called restscore was used. The restscore of an item i implies the differences between the total score and the score of the item i, which is used as an approximation for the person score on the latent trait [29]. The monotonicity assumption was retained if the value did not exceed 0.03 [29].
The items infit and outfit statistics [30] were calculated to assess the consistency of the response behavior. The infit statistic is sensitive with respect to low variation among participants concerning item endorsement. In contrast, the outfit statistic is sensitive with respect to outlying observations; for example, when participants with a high latent trait endorse frequently low response categories. Evaluation of the results applied the following cut-off values: the expected value was 1 (no deviation between observed and estimated patterns of the response matrix), though values < 0.5 were acceptable considering possible low separation ability of the items; values ranging from 0.5 to 1.5 were satisfactory; values from 1.5 to 2.0 were acceptable; and values > 2 indicated a problem [31].
Person homogeneity was assessed using the Andersen's LRT [32] with three different external split criteria (i.e., age: children vs. adolescents; TBI severity: mild vs. moderate/severe; study setting: offline vs. online). This approach allows for the identification of items displaying differential item functioning (DIF) [25]. Non-significant test values indicated no difference between the examined groups, allowing for simulations evolution of the HRQOL across these groups.
Local independence was measured using the Q 3 statistic, which represents the correlation of the residuals among items, i.e., the variance explained by the person parameters (θ) after outpartialization. Under local independence, we expected the value of Q 3 to be close to zero. In the present study, the more robust statistic, i.e., adjusted aQ 3 (Q 3 ij − Q 3 mean ), was reported. A value < 0.20 indicated no violations [33]. In line with other research studies on PROMs, we additionally considered values ranging from 0.20 to 0.30 as acceptable [34,35].

Sample Characteristics
The study sample contained 152 children (62% males) and 148 adolescents (57% males). Most participants sustained a mild TBI (72%) with no lesions visible on a CT scan (69%). The majority experienced TBI between 4 and 10 years ago, with 76% having been fully functionally recovered when entering the study. For details, see Table 1.

Response Behavior
Participants were more likely to endorse response categories reflecting higher levels of satisfaction or lower levels of being bothered. Therefore, ratings ranging from 'moderately' to 'not all' in four scales (Cognition, Self, Autonomy & Daily Life, and Social: <5%) and ratings 'moderately' to 'very' in the scales Emotions and Physical (5-20% dependent on item) were less frequently chosen. Consequently, some items lacked endorsements in the category 'not at all' across both age groups (Cognition: 'Orientation', Self: 'Accomplishment') and in some age and TBI severity groups. For more details, see Appendix B, Tables A2-A7.

Threshold Disorder
Overall, 15 out of 35 items exhibited no threshold disorders, indicating that the probability of choosing a higher response category increased with an increasing latent trait. For other items, the order of the thresholds was violated mostly at one point and in some cases at two. The Cognition scale was the most affected (six out of seven items), whereas the Self and the Emotion scales each contained only one item that did not meet the requirements. For details on the number of disordered thresholds per item, see Table 2. For threshold parameters, see Appendix C, Table A8. Person-item maps provided in Figure 2 visualize threshold parameters in relation to the latent trait for QOLIBRI-KID/ADO scales. The distribution of the person parameters (upper part of respective scale figure) indicates a slight rightward trend toward reporting higher HRQOL in the scales measuring satisfaction. The estimated person parameters of both bothered scales were largely normally distributed. For CCCs of the items, please see Online Supplement S1-Category Characteristic Curves. Person-item maps provided in Figure 2 visualize threshold parameters in relation to the latent trait for QOLIBRI-KID/ADO scales. The distribution of the person parameters (upper part of respective scale figure) indicates a slight rightward trend toward reporting higher HRQOL in the scales measuring satisfaction. The estimated person parameters of both bothered scales were largely normally distributed. For CCCs of the items, please see Online Supplement S1-Category Characteristic Curves.  Person-item maps for the QOLIBRI-KID/ADO scales. Upper parts of the respective scales' graphs depict person parameter (θ) distribution. Lower parts present item thresholds (unfilled circles) and item location parameters or average item parameters (filled circles). Blue-colored items additionally marked with an asterisk (*) indicate thresholds disorder. The order of the items corresponds to their position on the latent trait dimension (from lowest to highest).

Unidimensionality
The unidimensionality assumption could be retained for four scales (Cognition, Self, Autonomy & Daily Life, and Social), showing non-significant results (p > 0.05) according to the Martin-Loef's LRT using an internal split criterion. However, the Emotion and Physical scales revealed a significant violation of the unidimensionality assumption (p = 0.008 and p = 0.027, respectively). For more details, see Table 2.

Monotonicity
In general, all items of the QOLIBRI-KID/ADO fulfilled monotonicity requirements. Two exceptions were observed concerning the item 'Anger' (0.08) (Emotion scale) and the item 'Seeing/Hearing' (0.05) (Physical scale), which displayed problems in two and one response categories, respectively. For more details, see Table 2.

Item Fit Statistics
Most of the items (32 out of 35) showed no irregularities regarding both item infit and outfit statistics. Exceptions were the items 'Appearance' (infit: 0.765, p = 0.040; outfit: 0.786, p = 0.048) and 'Energy' (infit: 1.302, p = 0.006; outfit: 1.304, p = 0.002) from the Self scale as well as the infit value of the item 'Family relationship' (infit: 1.285, p = 0.004) from the Social scale. However, none of the values exceeded the cut-off of 1.5. For more details, see Table 2.

Person Homogeneity and DIF
According to the non-significant results of the Andersen's LRT (p > 0.05), all scales showed no evidence of violation of the person homogeneity assumption, indicating that there is no DIF between children and adolescents. However, due to missing responses in some response categories either in both or in one of the investigated age groups, complete scale analyses were only possible for Emotions and Physical scales. Testing for person homogeneity in Self, Autonomy & Daily Life, and Social scales, was carried out only for those items with exhausted responses.
An investigation of TBI severity groups indicated no presence of DIF in all scales except for the Physical scale, which revealed significant differences (p < 0.001). In addition, analyses for the Social scale could not be performed because at least one response was missing for each item either in the group after mild TBI or in the group after moderate/severe TBI.
The assumption could be maintained for the study setting investigations, indicating no difference in response behavior between children interviewed offline and those participating in the study via video calls. All scales except for the Social scale showed non-significant LRT Andersen's results (p < 0.05). The latter was affected by missing responses in one or both groups in five out of six items, so DIF could not be tested. For more details, see Table 2.

Local Independence
The Cognition and Emotion scales fully met the requirements of local independence, with adjusted Q3 scores not reaching the 0.20 value. The other scales exceeded the value of 0.20, but not the cut-off value of 0.30. Residual correlations of the following items were responsible for the slightly increased values: 'Appearance' and 'Self-Esteem' (−0.23) of the Self scale; 'Getting out and About' and 'Social Activities' (−0.22) of the Autonomy and Daily Life scale; 'Attitudes of Others' and 'Demands from Others' (0.25) of the Social scale; as well as 'TBI effects' and 'Pain' (0.21), 'TBI effects' and 'Clumsiness' (0.21), 'Clumsiness' and 'Seeing/Hearing' (0.27), and 'TBI effects' and 'Other Injuries' (0.29) of the Physical scale. For more details, see Table 2.

Discussion
The present study examined psychometric properties of the newly developed QOLIBRI-KID/ADO instrument assessing disease-specific HRQOL in children and adolescents after TBI, applying IRT methods. In general, the results suggest that the instrument widely fulfils the assumptions of the polytomous Rasch model. The results show that the QOLIBRI-KID/ADO is equally applicable to children and adolescents. Therefore, aggregate analyses of both age groups as well as longitudinal HRQOL measurements from childhood to adolescence are possible. However, different TBI severity groups in pediatric samples should be treated separately since children and adolescents after mild vs. moderate/severe TBI showed a different probability of endorsing the items measuring physical TBI-related HRQOL. Some further results should be discussed in more detail.
Response categories representing higher HRQOL were more often selected by the participants. This is in line with previous findings in children [43][44][45] and adults [46,47] from TBI and general population samples showing that the distribution of the generic and TBI-specific (HR)QOL scores tends to be skewed in favor of a higher quality of life rating. A closer look at the response patterns within the TBI groups reveals the tendency of the more severely injured participants reporting lower HRQOL. This is again consistent with previously reported research indicating that lower HRQOL associates with greater TBI severity [48]. The distribution of the TBI severity groups in our sample was unequal, with the vast majority having suffered a mild TBI. Additionally, most participants experienced a TBI two to ten years prior to study enrolment. These two facts could partly explain the relatively high level of TBI-specific HRQOL, resulting in the underutilization of response categories that capture impairment.
Disruption of the thresholds refers to the fact that the probability of selecting a higher response category is not always associated with an increase in the latent trait (i.e., TBIspecific HRQOL). A closer look at the order of the thresholds reveals that, in most cases, one threshold per item displayed irregularities. Thereby, the probability of endorsing the second-lowest category was higher than choosing the lowest one. This pattern can again be attributable to the non-exhausted selection of responses and high prevalence of the mild TBI cases in the study sample. However, the analyses of monotonicity indicate that the probability of endorsing a higher response category monotonically increased with a higher level of the latent trait across all items on the scale level, with only very few exceptions in the lowest and highest category of the item 'Anger' and the second highest category of the item 'Seeing/Hearing'.
Item fit statistics of three items showed almost no violations. Despite significant pvalues, they can still be considered as satisfactory according to the proposed cut-off values. The slightly increased infit and outfit values can also be caused by a relatively low number of observations in some response categories [31].
Combining the results discussed above with the findings on the utilization of the response categories, we conclude that the QOLIBRI-KID/ADO scale scores are widely able to capture TBI-specific HRQOL, with higher values indicating a higher expression of the latent trait. However, further investigation of the psychometric properties of the QOLIBRI-KID/ADO is warranted in more severely injured children and adolescents and in more acute stages of the TBI. In addition, further studies should address the topic of the number of responses in greater detail. Comparing different possible response formats (e.g., three response categories vs. five) in pediatric TBI samples should strengthen the applicability of the response scales of the instrument.
The positively worded scales assessing satisfaction fulfilled the assumption of unidimensionality, whereas the negatively worded items of the Emotional and Physical scales measuring being bothered with symptoms, impairments, and TBI-related problems did not. These results can probably be explained by different subdomains covered by the items of the scales. In addition to the questions directly related to TBI (e.g., 'TBI effects'), the Physical scale contains items measuring headaches, other pain, and other injuries. Because impairments in these areas may have a cause other than TBI, this could be reflected in the violation of unidimensionality. For example, chronic headaches and migraines in youths [49], other pains [50] or other injuries (e.g., burns [51] or fractures [52,53]) have been shown to negatively affect generic HRQOL, which may also be relevant for TBI-specific HRQOL. The same explanation can be applied to the Emotion scale, which, in addition to emotional aspects that may be related to affective disturbances, asks about perceived disturbances due to anger. Furthermore, the scale length (i.e., four items) might have influenced the results since the median split, which is dividing the items in two equal groups for further analyses, was used as an internal criterion for unidimensionality testing.
Although no differences in item parameters were found between the two age groups, items that were excluded due to missing responses in some categories could not be considered. The results of analyses using logistic ordinal regression of differential item functioning (LORDIF) reported by Steinbuechel et al. [19] suggested no differences in responses between children and adolescents when using all items of the questionnaire. Therefore, an aggregation of children's and adolescents' responses seems appropriate. Differences in responses between participants with mild and moderate-to-severe TBI regarding the Physical scale may be explained by differences in HRQOL scores according to injury severity [48]. Because the assumption of person homogeneity was only partially maintained in the present study, the authors suggest that the QOLIBRI-KID/ADO needs further validation in future studies for both age and TBI severity groups using the IRT framework. The absence of DIF in relation to the study setting indicates that, at least in our sample, the responses of children and adolescents after TBI were not affected by these (mainly forced by the COVID-19 pandemic) circumstances. This finding may be useful in encouraging researchers to conduct studies online when there is no opportunity for on-site testing. However, careful consideration should be given to whether participants are able to participate online and whether the methods (e.g., measurement tools, questionnaires, and surveys, etc.) are appropriate for an online study setting.
Finally, most items of the QOLIBRI-KID/ADO scales were locally independent. However, responses of some items seem to not be fully independent from each other. In particular, items measuring the feeling of being bothered by consequences of TBI may be associated with other problem areas (i.e., suffering from pains, being clumsy, and having problems caused by other injuries). Given that different types of pain can occur after a TBI and impact the HRQOL [54], this finding is not surprising. Therefore, special attention should be paid to these items in the final validation study. Furthermore, analyses of subgroups (e.g., based on TBI severity) or latent classes may facilitate the identification of characteristics to explain these dependencies.

Strengths and Limitations
The present study is the first study investigating psychometric properties using the IRT framework, which has several aforementioned advantages over classical test theoretical investigations. However, the results should be interpreted with caution. The children and adolescents originally completed questionnaires with 83 and 87 items, respectively, which were used in the pilot study phase. Thus, the final reduced 35-item version of the QOLIBRI-Kid/Ado was not administered directly to the study participants. Longer scales may affect response quality; for example, in terms of motivation [55]. Therefore, a final validation with the 35 items extracted for the final version of the QOLIBRI-KID/ADO is currently ongoing. Thus, we consider the analyses of the present study to be preliminary. Further studies applying the QOLIBRI-KID/ADO should pay particular attention to assumptions met with some restrictions, especially the problem of threshold disorders and the unidimensionality of the scales measuring bother.
In addition, the response rate was rather low (approximately 6% of those contacted responded and fully met the inclusion criteria). Although the distribution of sociodemographic and clinical characteristics of the TBI population is largely adequately reflected (i.e., predominantly mild TBI cases, more males than females, high recovery rates), the study may have suffered from self-selection bias. Therefore, further validation of this newly developed instrument with samples that more accurately reflect the pediatric TBI population is highly recommended, especially for more severe TBI. This also applies to the patient groups that were excluded due to the study exclusion criteria (i.e., patients with severe systemic diseases, epilepsy, or severe polytrauma). Further research is strongly encouraged to better understand the impact of TBI on HRQOL in these groups.

Future Perspectives
In the final validation study of the QOLIBRI-KID/ADO, the structure of the questionnaire could be investigated using multidimensional IRT approaches, such as the multidimensional random coefficients multinomial logit model [56] or multidimensional mixed Rasch models (e.g., Fischer and Molenaar, 1995 [57]). Such multidimensional models would allow the interdependencies between the scales and the DIF to be examined, while considering the latent structure of the QOLIBRI-KID/ADO. This would allow us to account for the different HRQOL facets being measured by the instrument. Finally, psychometric evidence for the proxy version of the QOLIBRI-KID/ADO using both the CTT and IRT frameworks would provide an alternative way to examine HRQOL after pediatric TBI, especially when children are unable to respond on their own.

Conclusions
The newly developed QOLIBRI-KID/ADO instrument displays satisfactory psychometric properties according to the results of both the item response and classical test theoretical analyses. Therefore, the application of the instrument in the current form is justified. Further evidence of its applicability should be explored in the final validation study while considering implications that arose from the pilot investigations. Meanwhile, the QOLIBRI-KID/ADO may already find its application in clinical practice and research to assess HRQOL in children and adolescents.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the University Medical Center Göttingen has approved the study (application no.: 19 April 2018).

Data Availability Statement:
The data presented in this study are available on request from the corresponding authors. Data are not publicly available due to data protection reasons.