Psychometric properties of the quality of life in short statured youth (QoLISSY) questionnaire within the course of growth hormone treatment

Background The Quality of Life of Short Stature Youth (QoLISSY) questionnaire is a patient- and parent-reported outcome measure assessing health-related quality of life (HRQOL) in short stature youth. This study evaluates the psychometric properties of the QoLISSY questionnaire within a German prospective trial of short statured children treated with human growth hormone (hGH). Method The instrument was administered to children with idiopathic growth hormone Deficiency (IGHD) and small for gestational age (SGA) before and after 12 month of hGH treatment. Children with idiopathic short stature (ISS) served as a reference group receiving no treatment. Psychometric testing included scale distribution characteristics, reliability (internal consistency), criterion-and convergent validity (correlations with the generic KIDSCREEN-Index, inter-correlations among QOLISSY subscales), known-group validity (treatment status, height SDS), and responsiveness analysis (ability to detect change). Results One hundred fifty-two parents and 66 children/adolescents completed both HRQOL assessments. The QoLISSY demonstrated good reliability with Cronbach’s alpha > .70. Moderate significant correlations between QoLISSY domains and the KIDSCREEN-10 Index supported criterion validity. Statistically significant differences in HRQOL were observed between treatment groups at baseline with children who were about to start treatment reporting a significantly lower HRQOL compared to the children who will not receive treatment. No significant differences were found between the level of short stature based on height SDS scores (≤ − 2 SDS, > − 2 SDS). Furthermore, the instrument detected significant changes in HRQOL between the treated and the untreated group in patient-reports. Conclusions In conclusion, the scales showed satisfactory reliability, adequate validity and ability to detect change in self-reported HRQOL within GH treatment. Findings support QoLISSY’s further use in clinical trials, offering the opportunity to adequately assess HRQOL from the patients’ and caregivers’ perspective to improve patient-centered care.


Background
In recent years, health-related quality of life (HRQOL) has become an important health outcome indicator for use in clinical trials as well as in epidemiological analysis and health service research. Furthermore, it received increased recognition as a relevant health indicator in children with chronic conditions [1]. Using patient-reported outcome (PRO) instruments to assess HRQOL can highlight patients' unmet needs, improve communication between clinicians and patients, capture the individual and societal impact of a disease, and provide additional information about intervention outcomes directly from the patient's perspective. Hence, HRQOL instruments are being increasingly used in adults and children to assess information about the subjective perception of health and to evaluate the effects of treatments from the patients' and caregivers' perspectives [2][3][4][5][6][7].
The multidimensional concept of HRQOL describes the subjective perception of health status and includes physical, social, emotional and mental domains of wellbeing and functioning [2,8]. In addition to generic HRQOL instruments, covering the full range from excellent to poor health, disease-specific measures are necessary to capture the burden and experience of defined health conditions [9][10][11][12]. Particularly when assessing HRQOL in chronic and rare diseases, such as short stature in youth, generic PRO measures might be not sensitive enough to detect aspects of diagnosis and treatment that affect the patients' HRQOL, underlining the need for disease-specific measures [2,7]. Assessments of HRQOL in short stature youth have been conducted using both generic and disease-specific instruments such as the PedsQL Generic Core Scales [13,14] and the disease-specific Quality of Life of Short Stature Youth (QoLISSY) questionnaire [3,12,15]. Although research has shown that using a generic HRQOL instrument in short stature children can detect differences in HRQOL and psychosocial functioning when compared to children with normal stature [13,14,16], it has been shown that short stature-specific instruments can detect changes in HRQOL in children who received treatment, when compared to untreated children, while generic instruments were not sensitive enough to detect these changes [17]. This finding underlines the importance of assessing HRQOL in longitudinal studies with diseasespecific instruments, because if significant differences in HRQOL are no longer detected at least one year after children start treatment, then a positive effect of treatment on HRQOL can be reasonably assumed.
Short stature is defined as a height being more than 2 standard deviations (SD) below the mean height of the corresponding population [18,19] and has been associated with lower HRQOL [20,21]. In order to improve patients' height, the short statured children can be treated with human growth hormone (hGH) within the accepted medical indications [22][23][24]. According to the European Medicine Agency (EMA) this treatment option is only available for defined diagnoses, including idiopathic growth hormone deficiency (IGHD) and being born small for gestational age (SGA), but not for children with idiopathic short stature (ISS) [25]. Primary clinical outcomes of hGH treatment such as the increase in height have been complemented by HRQOL as a relevant endpoint [26]. Results from studies examining the effects of hGH treatment on HRQOL are contradictory; some studies report an increase in HRQOL due to treatment [17,27], while others report no changes in HRQOL [28]. Furthermore, the use of different assessment approaches (generic and disease-specific) within these studies makes a comparison quite challenging [29]. When evaluating the change of HRQOL in intervention studies, the responsiveness (ability to detect change) of HRQOL instruments is an important characteristic.
To examine if the condition-specific QoLISSY questionnaire is a valid, reliable and responsive disease-specific instrument that measures HRQOL of short stature youth in a longitudinal trial setting and can be used as a health outcome indicator of hGH interventions, this study aimed to evaluate the psychometric performance of the QoLISSY questionnaire within a prospective observational study. Thus, reliability, validity and responsiveness to change were investigated in a sample of children and adolescents with IGHD and SGA at start and within 1-year after hGH treatment, documenting HRQOL in untreated children/ adolescents with ISS for comparison purposes.

Participants and study procedure
After receiving ethical approval from the respective ethics committees of the participating centers (medical chambers of Hamburg, Saxon, Hessen, Nordrhein and the ethics committees of the University Hospitals of Erlangen -Nürnberg, Cologne, Magdeburg and Munich), eleven pediatric endocrinologists from various children hospitals and medical practices in Germany agreed to recruit patients for the present study. Included were newly diagnosed children and adolescents and their parents with IGHD and SGA before hGH treatment was initiated. Children diagnosed with ISS served as a reference group in this study. Children aged 8-12 years and adolescents with a late diagnosis > 13 years were invited to complete HRQOL instruments. For the parent-report, parents of children, 8-12 years and > 13 years were recruited for the study as well as parents of younger children aged 4-7 years. Participants were excluded from the study if they were diagnosed with any other condition that results in short stature (e.g. skeletal dysplasia, chromosomal abnormalities, etc.), if they did not meet the respective age groups, if they were cognitively impaired or had a lack of adequate linguistic competency to complete HRQOL assessments. Informed consent from all participants was obtained before the start of the study. Sample size calculations, using the Software PASS 2008, suggested that with a power of 80% to detect changes in the QoLISSY total score (sum of the scales physical, social and emotional) and including drop outs in the calculations, the desired sample size should include each N = 160 children/adolescents in the intervention and control group and N = 240 parents. Children in the intervention group (diagnosed with SGA or IGHD) were treated with daily subcutaneous injections with hGH, whereas children in the reference group (diagnosed with ISS) received no treatment.
The parents and their children were asked to provide HRQOL assessments before the start of hGH treatment (baseline, T0) and at 12 month after the start of treatment (T1). The questionnaires were handed out to the participants in the respective center and were completed on-site at both points of measurement. Thereafter, the staff sent the questionnaires by post to the study center in Hamburg. In addition to HRQOL questionnaires, sociodemographic and clinical data such as age, gender body weight and height were assessed by a clinician at both timepoints. Height standard deviation scores (SDS) was calculated with the LMS formula [30] based on reference data by Kromeyer-Hausschild [31]. Children diagnosed with ISS not undergoing hGH treatment followed the same procedure.

Measures
HRQOL was assessed with the generic KIDSCREEN-10 index and with the condition-specific QoLISSY questionnaire.
The QoLISSY questionnaire was developed in accordance with the guidance on PRO instrument development of the US Food and Drug Administration [32] within a cross cultural study aiming to assess HRQOL in children and adolescents with ISS and IGHD. The questionnaire is available in self-report for short statured children and adolescents aged 8-18 years and for parents of children aged 4-18 years (observer-report). The instrument assesses HRQOL in the core domains physical (6 items), social (8 items) and emotional (8 items), which are summed up in the total score (22 items). Additionally, three domains measure predictors of quality of life (coping, 10 items; beliefs, 4 items and treatment, 14 items) in child-and parent-report. The parent's version contains two supplementary domains that refer to the parent's worries about their child's future (future, 5 items) and the impact of the child's condition on the parents' wellbeing (effects on parents, 11 items). The response scale consists of a five-point Likert scale ranging from "not at all/never" to "extremely/always". Psychometric results from the original QoLISSY study prove satisfactory reliability with Cronbach's ranging from α = .82 (coping scale) to α = .92 (total QoL score) in self-report and from α = .86 (physical scale) to α = .95 (total QoL score) in parent-report [33]. Psychometric performance of the QoLISSY questionnaire has also been proven to be satisfactory in further cross-sectional validation studies in the US, Italy, Belgium and in the Netherlands [34][35][36][37]. At present, QoLISSY has also been validated for use in children being born small for gestational age (SGA) and achondroplasia [38,39].
The KIDSCREEN-10 index is a unidimensional generic measure to assess the HRQOL in children and adolescents (8-18 years) from the child-and parent perspective on a five-point Likert scale. Psychometric properties of the index as assessed in a prior study with a sample of children and adolescents within the original QoLISSY study were satisfactory in self -and parent-reports with good validity and reliability (Cronbach's alpha in child-report α = .81, proxy-report α = .80) [40,41].

Statistical analysis
All data analyses were conducted using the Statistical Package for the Social Sciences, v.21 [42]. The significance level was set at p < .05 and all assessed data of the participants was pseudonymized for the analysis. First, descriptive statistics were calculated for sociodemographic and clinical variables. The homogeneity of sample characteristics across the diagnostic groups at baseline was examined by χ 2 tests (categorical variables) or independent-samples analysis of variance (continuous variables).
Raw QoLISSY scores were transformed into 0 to 100 scores with higher values representing higher HRQOL. The QoLISSY total score was calculated by summing up the core domains physical, social and emotional. Mean scale scores were computed and missing data that were random and less than 20% of the values were replaced with the individual mean score for each variable.
Descriptive QoLISSY statistics were calculated on a scale level at baseline (T0) and after 12 month after the onset of hGH treatment (T1) for child-and parent-report including mean (M), standard deviation (SD), skewness, kurtosis and floor/ceiling effects. Floor and ceiling effects were considered to be present if more than 15% of respondents achieved the highest or lowest possible score, respectively [43].
To evaluate the reliability of the QoLISSY scales, Cronbach's alpha coefficients (α) were calculated for each scale at baseline and at T1 for both patient-and parent reports, considering α > .70 as an indicator of good internal consistency [44].
The inter-rater reliability was examined at the individual and the group levels [45], by using, respectively, intraclass correlation coefficients (ICC; two-way mixed model, absolute agreement, 95% confidence interval [CI]) and multivariate analyses of covariance (MAN-COVA) for repeated measures, entering the rater (parent vs. child) as the within-subject factor and children's age, gender, height deviation group and treatment status (only at T1) as covariates. ICC reference values were: ICC < .40 as poor agreement, ICC between .41 and .60 as moderate agreement, ICC between .61 and .80 as good agreement, and ICC > .81 as excellent agreement [46].
The criterion validity was examined by calculating Pearson correlation coefficients (r) between the QoLISSY scales and the KIDSCREEN-10 Index at both points of measurement for the child-and parent-report. Although the KIDSCREEN instruments can be considered as gold standard instruments to assess HRQOL in children and adolescents, they measure HRQOL at a generic level, while the QoLISSY questionnaire assesses HRQOL at a disease-specific level; therefore, moderate positive correlations (r > .30) were considered indicators of good criterion validity [44]. To assess convergent validity as part of the construct validity, inter-correlations between the QoLISSY scales were calculated using Pearson's correlation coefficient as well.
To examine known-groups validity, multivariate analyses of covariance (MANCOVA) were performed at both measurement points for the child-and parent-report to compare the HRQOL dimensions across the treatment status (children with IGHD or SGA who were treated vs. untreated children with ISS) and height deviation groups of normal stature (> − 2 SDS) vs. short stature (≤ − 2 SDS), while controlling for children's age. For the QoLISSY total score and when multivariate effects were significant, univariate analyses of covariance (ANCOVA) were performed to examine which QoLISSY scales were significantly different between groups.
Finally, to test the responsiveness of the QoLISSY questionnaires to detect changes in HRQOL dimensions within the course of hGH treatment, a repeated measures MAN-COVA was performed. The measurement points (baseline vs. T1) were entered as the within-subjects factor, while the treatment status (children with IGHD or SGA who were treated vs. untreated children with ISS) as the between-subjects factor, and children's age and time difference between both measurement points as covariates. A repeated measures ANCOVA was performed for the QoLISSY total score and for the generic KIDSCREEN-10 Index, respectively. Furthermore, a repeated measures MANCOVA testing the HRQOL changes from baseline and 1-year follow-up (T1) between patients who reached normal height and patients with current short stature at T1 was calculated. Children's age and time difference between both measurement points were entered as covariates. Presented effect-size measures were analyzed based on Cohen (1988), considering a small effect when ŋ p 2 ≥ .01, a medium effect when ŋ p 2 ≥ .06, and a large effect when ŋ p 2 ≥ .14 [47]. In addition, Pearson's correlation analysis between height SDS increase (i.e, height SDS at T1height SDS at baseline) and the change in the QoLISSY total score (i.e., QoLISSY total score at T1 -QoLISSY total score at baseline) were performed with child and parent data to determine the relationship between height change and HRQOL change.

Sample description
A total of 154 participants were recruited at baseline, with 66 children/adolescents and 152 parents completing both HRQOL assessments. Of these and with a drop-out rate of 15.6% (N = 24), 130 participants also completed the HRQOL assessments at T1 with 70 patient-reports and 126 parent-reports available. Since patients grow older during the study process, more patient-reports were available at T1. The average time frame between the baseline assessment and T1 ranged from 6 to 18 month (M = 12.54, SD = 2.2). Clinical and sociodemographic sample characteristics of the participants at both measurement points are presented in Table 1. At baseline, SGA patients were significantly younger than patients diagnosed with IGHD or ISS (F (2) = 9.21, p < .01). This trend was continued at T1 with SGA patients being significantly younger than patients diagnosed with ISS (F (2) = 6.74, p < .01). Furthermore, SGA patients were significantly smaller at baseline (F (2) = 12.74, p < .01) and at T1 (F (2) = 5.59, p < .01) compared to patients with IGHD or ISS. With regard to height SDS, patients with SGA and IGHD were significantly the smallest compared to patients with ISS at baseline (F (2) = 8.70, p < .01). Moreover, a significantly higher percentage of patients with ISS had already reached a normal height by definition at baseline (SDS > − 2), compared to children/adolescents with IGHD or SGA before treatment (χ (2) 2 = 13.92, p < .01). At T1 there were no significant differences in height SDS (F (2) = 7.58, p = .47) or in height deviations groups (χ (2) 2 = .85, p = .65) across diagnoses. Also, there were no significant differences in gender (χ (2) 2 = 5.93, p = .052) or treatment length (F (2) = .516, p = .59) across diagnoses groups.

Descriptive statistics and reliability
Most QoLISSY dimensions showed a slight skew to the right, indicating a predisposition for high HRQOL (ranging from 0 to 100, with higher values representing higher HRQOL) in child-and parent-report at baseline and T1 across all diagnoses groups. However, the parent-reported coping and effects on parents scale at baseline and the parent-reported treatment scale at T1 as well as the child-reported beliefs scale at T1, showed a slight skew to the left, indicating lower HRQOL. The Cronbach's alpha values were higher than α > .70, except for the child-reported coping scale at T1 (α = .65), indicating good internal consistency for raters and assessment points. Highest consistency was found in the total score (α > .90). Considering the threshold of 15% of respondents scoring at the lowest or highest possible categories, no ceiling or floor effects were found for any of the patientor parent-reported QoLISSY scales (Table 2).
Regarding inter-rater reliability, the ICC values indicated moderate parent-child agreement, except for the coping scale and treatment subscale where a higher disagreement of HRQOL assessment between children and parents was found. At a group level, the MANCOVA for repeated measures showed no significant multivariate differences between raters at baseline, Pillai's trace = .11, F (5, 47) = 1.19, p = .33, ŋ p 2 = .11, and at T1, Pillai's trace = .30, F (6, 30) = 2.10, p = .08, ŋ p 2 = .30. The univariate effects are presented in Table 3. In addition, no significant multivariate effects of the interaction between rater and children's sex at Criterion and convergent validity Table 4 summarizes the correlations between the QoLISSY scales and the KIDSCREEN-10 Index. For patient-reports, moderate positive correlations were found between the disease-specific and generic HRQOL, with coefficients ranging from r = .28 (belief scale at T1) to r = .46 (coping scale at T1); for parent-reports, weak to moderate positive correlations were found, with coefficients ranging from r = .13 (belief scale at T1) to r = .41 (emotional scale at baseline). Regarding convergent --Note: M = Mean, SD = Standard deviation, SDS=Standard deviation score, IGHD = idiopathic growth hormone deficiency, SGA = short for gestational age, ISS = idiopathic short stature, T0 = baseline, T1 = 12 month after start of hGH treatment a children with SGA were significantly younger than children with IGHD or ISS at baseline (F (2) = 9.21, p < .01) b children with SGA were significantly younger than children with ISS at T1 (F (2) = 6.74, p < .01) c children with SGA were significantly smaller than children with ISS or IGHD at baseline (F (2) = 12.74, p < .01) and at T1 (F (2) = 5.59, p < .01) d children with SGA and IGHD were significantly the smallest compared to children with ISS at baseline (F (2) = 8.70, p < .01) e a significant higher percentage of children with ISS already reached a normal height at T0 (χ (2) 2 = 13.92, p < .01) * p < .05; ** p < .01, two-tailed validity as assessed with inter-correlations among the QoLISSY subscales, moderate to strong positive correlations were found across the physical, social, emotional and beliefs scales, in both patient-and parent-reports, as well as across the future and effects on parents scales of the parent-reported version. In both patient-and parent-reports, the coping scale was weakly to moderately positively correlated with all QoLISSY scales and with the total score, while the treatment scale did not correlate significantly with any of the scales nor with the total score. However, it needs to be noted, that the treatment subscale was only applied in the treated sample including children with IGHD and SGA, and the other QoLISSY subscales are being tested in a more heterogeneous sample, including non-treated patients with ISS.

Known-groups validity
The MANCOVA for patient-reported QoLISSY scales, controlling for children's age and height deviation group,  17. However, a significant univariate effect was found in the physical, social and emotional scale, with children who were about to start treatment reporting a significantly lower HRQOL at baseline compared to the ISS children not receiving treatment. A significant univariate effect was also found for the total score F (1, 158) = .7.44, p = .008, ŋ p 2 = .11 (see Table 5).
Regarding parent-reports a significant multivariate effect of treatment status was found on QoLISSY scales, Pillai's trace = .12, F (7, 117) = 2.31, p = 0.03, ŋ p 2 = .12. Subsequent univariate analysis showed significant differences between the treated and the untreated group in the physical, social, emotional, coping and future domain, with parents of children in the treated group reporting a lower HRQOL in these domains at baseline, compared to parents of children who remained untreated. In addition, a significant effect of treatment status was found in the total score, F (1, 141) = 7.70, p = .006, ŋ p 2 = .05, with parents of treated children reporting significantly lower HRQOL than parents of children with ISS (p < .01) (see Table 5).

Responsiveness
The repeated measures MANCOVA for the patient-reported QoLISSY scales, controlling for the effects of age and treatment length, yielded no significant multivariate main effects of time, Pillai's trace = .18, F (5, 43) = 1.84, p = .13, ŋ p 2 = .18, or treatment status, Pillai's trace = .13, F (5, 43) = 1.33, p = .27, ŋ p 2 = .13. However, the multivariate interaction effect between time and treatment status was statistically significant, Pillai's trace = .27, F (5, 43) = 3.24, p = .01, ŋ p 2 = .27. The subsequent univariate analyses (Table 7) revealed that treated children/adolescents in the intervention group (diagnoses IGHD or SGA) reported a significant increase of physical, social and emotional HRQOL from baseline assessment throughout one year of treatment, while untreated patients of the control group (diagnosis ISS) reported a decrease in these HRQOL domains over time. These results of differences between the treated and the untreated sample were supported by the univariate repeated measures ANCOVA for the QoLISSY total score, which also revealed a significant interaction effect between time and treatment status, F (1, 50) = 9.72, p < .01, ŋ p 2 = .16, while controlling for the effects of age at baseline, F (1, 50) = .03, p = .86, ŋ p 2 = .00, and treatment length, F (1, 50) = 3.97, p = .05, ŋ p 2 = .07. Apart from the significant results in patient-reports, the parent-reports revealed no multivariate main effects Note: M = Mean, SD = Standard deviation, F=F-value, p = P-value, Ŋp2 = partial eta square, IGHD = idiopathic growth hormone deficiency, SGA = short for gestational age, ISS = idiopathic short stature Children's age and height deviation group were included as covariates a sum of the scales physical, social, emotional Note: M = Mean, SD = Standard deviation, SDS=Standard deviation score, F=F-value, p = P-value, Ŋp2 = partial eta square Children's age and height deviation group were included as covariates a sum of the scales physical, social, emotional of treatment status, Pillai's trace = 0.14, F (7, 88) = 2.07, p = .06, ŋ p 2 = .14, or interaction effects between time and treatment status, Pillai's trace = .06, F (7, 88) = .85, p = .55, ŋ p 2 = .06. But there was a significant multivariate main effect of time, Pillai's trace = .19, F (7, 88) = 2.91, p = .01, ŋ p 2 = .19, which revealed a significant improvement from baseline to T1 on the effects on parents scale, for both intervention and control groups, F (1, 94) = 7.75, p < .01, ŋ p 2 = .08. The univariate analyses for the interaction effect are presented in Table 7. Although differences were not statistically significant the results demonstrate a trend that HRQOL scores improved in all QoLISSY domains in the treated sample, while scores decreased in the social, coping, beliefs, future, effects on parents scale and in the total score in the untreated sample.
Finally, the analysis of the relationship between height change and the HRQOL change prove significant positive correlation between height SDS increase (height difference between T1 and baseline) and total HRQOL gain (difference between QoLISSY total score at T1 and at baseline) in both the child (r = − 0.38, p ≤ 0.01) and parent-reports (r = − 0.18, p = 0.04). This suggests that a larger increase in body height is associated with greater improvement in the total HRQOL which reflects physical, social and emotional HRQOL.

Discussion
Among the different health outcomes that can be measured, HRQOL has become a universally recognized endpoint to assess because of the potential of HRQOL instruments to reflect treatment benefits that affects patients actual value [7]. However, for rare conditions such endocrine short stature, use of generic HRQOL instruments is limited. Although generic instruments are well-validated and allow for comparison across populations, they often fail to detect small, but clinically significant changes over time in HRQOL due to the absence of disease-specific aspect of affected patients' lives that have an impact on their HRQOL [7]. This aspect is supported by the results of this study. While the generic KIDSCREEN instrument is not able to detect changes in HRQOL in this sample, the disease-specific QoLISSY questionnaire detects changes of HRQOL within the course of hGH treatment between the treated and the untreated group in the patient-report. Treated children/ adolescents who are diagnosed with IGHD or SGA reported a significant increase of physical, social and emotional HRQOL from baseline assessment throughout one year of treatment, while untreated patients of the control group diagnosed with ISS reported a decrease in these HRQOL domains over time. However, effect sizes for these results were rather small. Although differences were not statistically significant in parent-report, the results demonstrate a trend that parents of children who were treated with hGH reported an increase in their children's HRQOL, while parents of children who were not treated reported a decrease in most domains throughout treatment. Besides, results of the correlation analysis between changes in the QoLISSY total score and height SDS gain shows that increased height results in improved physical, social and emotional HRQOL. Similar outcomes were found in studies, showing that disease-specific instruments are able to detect changes in HRQOL in populations affected by rare health conditions. Lem et al. (2012) also revealed, that only the disease-specific instrument (TACQOL-S) showed improvement in HRQOL in treated children with hGH compared to untreated children, while the generic instrument did not reveal any HRQOL changes. Thus, when measuring the impact of treatment and to capture the change over time, a psychometric solid and diseasespecific instrument is needed [7,48].
Results of the repeated measures MANCOVA testing the HRQOL changes from baseline and 1-year follow-up (T1) between patients who reached normal height and patients with current short stature at T1 showed no significant results in the current study. This might indicate that reaching a normal height, either because of treatment or because of normal development, did not have a significant effect on the overall HRQOL or any of its specific domains. Although the QoLISSY questionnaire was not able to detect these longitudinal HRQOL changes from baseline to 1-year after treatment between patients who reached normal height and patients with current short stature after treatment (see Table 8), the questionnaire was able to detect differences between height deviation groups at baseline ( Table 6 social HRQOL in self-and parent-report). On the one hand, this might indicate, that it is very likely that the change in HRQOL from baseline to 1-year after treatment is not depending on the treatment or the increase in height, on the other hand, the questionnaire might lack sensitivity to detect clinically significant changes and that further studies are necessary to clarify this issue.
In addition to the debate between using generic or disease-specific HRQOL instruments, another aspect under discussion is the benefits of using self-reports vs. proxy-reports (ex. parents) when assessing HRQOL in pediatric populations. Self-reports are generally the preferred source of HRQOL data because of the concern of inaccurate reporting by proxies [12,49]. However, when comparing the reported data between the child-and parent-report of the QoLISSY instrument in this study, analyses of ICC values indicated moderate parent-child agreement, except for the coping and treatment subscale, where the highest disagreement was observed. These findings underline, that especially psychosocial aspects of children's HRQOL (such as coping strategies) cannot be observed and reported by a proxy as reliably as more obvious aspects (e.g. physical aspects) [12,50,51]. Thus, this study provides on the one hand support to use both self-and observed-reported HRQOL data in order to gain a comprehensive and more complementary view of the child's experience of illness. On the other hand, when using child reported data in longitudinal studies, the cognitive development process of the children [52] needs to be considered. Opinions, feelings and attitudes might change quickly resulting in score changes that influences the responsiveness when assessing HRQOL.   [53]. Also in this study, cognitive development changes might have influenced the responsiveness positively, since the child-report yielded significant changes over time in HRQOL, while the parent-report did not. Internal consistency, as assessed by Cronbach's alpha, was satisfactory in all scales with α > .70, except for the coping scale at T1 in child-report (α = .65). The Cronbach's alpha values found in this study were similar to those observed in the original QoLISSY study [33], with the exception of the coping scale in the child-report, in which α was > .70 in the original study. A possible explanation for the inadequate Cronbach alpha value for the coping scale might be that the item content of this scale may be irrelevant or not correctly portray the experience of a patient population that receives treatment because of their short stature, thus leading to inconsistent responses. Furthermore, the lower Cronbach's alpha value might suggest a poorer interrelatedness between the items of this scale, reflecting that coping is a heterogeneous construct, consisting of many facets and various coping strategies that are difficult to capture within one scale. Besides, the lower value might reflect difficulties in understanding item wordings, which might have let to inconsistent responses.
Also the scale inter-correlation values of the QoLISSY questionnaire were very similar to the values found in the original development and validation study of the QoLISSY questionnaire [9], including nonsignificant or low inter-correlations of the coping and treatment scales with all other scales of the QoLISSY instrument. With the exception of a few scales, most QoLISSY scales showed significant moderate correlations (r > .30) with the KIDSCREEN-10 Index, supporting the criterion validity of the instrument and that both instruments measure the construct of HRQOL. Nevertheless, the moderate correlations also indicate towards a discrepancy between both measures, which further supports the importance of using disease-specific instruments that are able to catch unique aspects experienced by the affected people that generic instruments would miss.
Comparisons of HRQOL scores between participants that were treated with hGH compared to participants that were not treated demonstrated known-group validity for the total score in child-and parent-report, with children who were about to start treatment and parents of these children reported a lower HRQOL, compared to children who remained untreated at baseline. After treatment, no significant differences in HRQOL between the treated and the untreated sample were found in child-and parent-reports. Contradictory to results of known-group validity analysis found in the original development and validation study of the QoLISSY questionnaire [47], QoLISSY was not able to distinguish between the severity level of short stature (height SDS) at both points of measurement in the current sample. This might be due to the fact, that at baseline most children in the sample were short statured (height SDS ≤ − 2). Furthermore, this study was designed as a longitudinal study and thus a cross-sectional analysis as conducted for the known-group validity might be not as appropriate as a responsiveness analysis that combines both measurement points. As already discussed above, QoLISSY proves to be able to detect changes in child reported HRQOL within the course of treatment.
One limitation within this study is that a confirmatory factor analysis was not calculated because of the small sample size. Nevertheless, the factor structure of the QoLISSY instrument was previously ascertained for identifying the final scale structure of the instrument. The initial confirmatory factor analysis produced the three core quality of life domains (physical, social, emotional) which composed to the QoLISSY total score, while the other domains were used as determinants [33]. In general, the smaller sample size and drop-out rate might have biased the results. However, keeping in mind that endocrine short stature is a rare disease, this study included a relatively large sample size. Still, the sample is very selective because all of them contacted growth clinics seeking for treatment options. Therefore, it can be assumed that the sample is highly motivated and not necessarily representative of the overall target population. Since this study was an observational study based on "natural" treated and untreated groups complying with standard clinical practice and appropriate prescribing guidelines (determined by European Medicine Agency), we were not able to include a normal statured, age-matched not treated control group with IGHD or SGA for comparison as this would be considered unethical. Furthermore, the sample description (Table 1) shows that some patients, especially patients in the control group with the diagnosis ISS, already reached a normal height by definition (< − 2 SDS) at baseline. We used the common German reference values of Kromeyer-Hausschild et al. (2001) to calculate the height SDS, which might resulted in slightly different scores than the clinician used when diagnosing the patient. Thus, the sample included also a few patients who had normal height at baseline, which might have biased the results. In general, significant results should be interpreted with caution because of the small sample and small effect sizes. Another point is, that due to the design of the study our responsiveness analyses were limited to 12 months of treatment. However, greater changes in HRQOL might also appear after some time having finished the treatment.

Conclusions
These psychometric analyses of the QoLISSY questionnaire support the reliability, validity, and ability of the instrument to detect change over time and thus support its usefulness to measure HRQOL in a pediatric endocrine