Introduction

Sustained health budget constraints necessitate the comparison of alternative programmes and interventions in terms of costs and consequences [1]. Cost–utility analysis (CUA) remains the preferred form of economic evaluation aimed at informing the allocation of finite resources for decision-making agencies around the world [2,3,4,5,6]. In CUA, the quality-adjusted life-year (QALY) is the preferred measure of health outcome where the QALY combines preference-based health-related quality of life (HRQoL) outcomes, or health utility values, associated with health states, with the length of time in those states [7]. The generic nature of health utilities (and thus QALYs) allows comparison of healthcare interventions across disparate health conditions and populations. Health utilities are indexed on a cardinal scale where 0 represents death and 1 represents perfect health [1].

Valuation methods for utility assessment can be divided into two broad categories: direct and indirect. Direct methods combine the valuation and measurement process into a single step, and include the standard gamble (SG) technique and the time trade-off (TTO) approach [1]. The visual analogue scale (VAS) is another direct valuation method, although considered by some economists not to be a health utility measurement approach since the valuation procedure is not choice based and does not involve decision making under uncertainty [8]. Indirect valuation methods use multi-attribute utility instruments (MAUIs) such as the EQ-5D [9], Health Utilities Index (HUI) [10], SF-6D [11], Quality of Well-Being Scale (QWB) [12] and Assessment of Quality of Life (AQoL or AQoL-5D) [13]. MAUIs ask respondents to describe their health state using a health status classification system containing several dimensions, each with multiple levels. Algorithms, or tariff sets, elicited from representative populations using direct valuation methods such as the TTO, are then applied to convert responses into health utility values.

Unique methodological challenges arise when utility assessments are conducted for childhood (age < 18 years) health conditions or states [14, 15]. For example, infants (age < 2 years), as well as pre-adolescents (age < 12 years) and adolescents (age ≥ 12 and < 18 years) with developmental delays or cognitive deficits arising from neurodevelopmental disorders, lack comprehension to complete direct or indirect valuation methods, thus requiring proxy assessment. Moreover, bio-psychosocial development during childhood means that dimensions relevant to HRQoL change rapidly by age [16]. This means that classification systems embedded into adult-specific MAUIs (targeted at those aged ≥ 18 years), such as the EQ-5D and SF-6D, may not incorporate health dimensions relevant to developmental stages through childhood. MAUIs with childhood-specific classification systems have been developed to meet this challenge, including the EQ-5D-Y (Youth) [17], 16-Dimensional Health-Related Measure (16D) [18], 17-Dimensional Health-Related Measure (17D) [19], AQoL-6D [20] and Child Health Utility 9-Dimensions (CHU9D) [21]. Moreover, several MAUIs, such as the HUI2, HUI3 and QWB, have classification systems compatible with both adult and childhood health states. Many measures with classification systems designed specifically for or compatible with childhood, such as the EQ-5D-Y, HUI3 and QWB, rely on tariffs derived from adult populations. Thus, another methodological concern relates to potential differences between children and adults in how they value health states contained within MAUIs [22].

There has been increasing recognition of these methodological challenges in international health technology assessment (HTA) guidelines. The 2013 National Institute for Health and Care Excellence (NICE) methods guidance recognised that its preferred measure, the EQ-5D, lacked a classification system designed for use in children and recommended use of the EQ-5D-Y for children aged 7–12 years, although no separate tariff set for this measure existed [2]. The 2016s US Panel on Cost-effectiveness in Health and Medicine acknowledged the challenges surrounding childhood utility measurement and discussed the relative advantages of alternative instruments, including HUI2/3, EQ-5D-Y and CHU9D, but without recommending a preferred approach [6]. Despite these developments, adult-specific health utilities are still frequently applied to childhood health states within economic evaluations. Montgomery and Kusel [23] reviewed all NICE health technology appraisals in England until June 2015 and identified 29 submissions directly related to paediatric health, only six of which applied childhood-specific utilities. It is generally unclear how developments of national HTA guidelines have influenced priorities and designs, including choice of valuation methods, in primary assessments of childhood health utilities.

In line with the preference of several HTA agencies, CUA is now the leading analytic approach for economic evaluation of health interventions targeting children, with the number of CUAs overtaking that of cost-effectiveness analyses (CEAs) contained in the Paediatric Economic Database Evaluation (PEDE) in 2009 [24]. Although health utility data constitute vital inputs into CUAs, it is rare for analysts to estimate utility values using primary research methods unless the economic evaluation is conducted alongside a prospectively designed study with individual-level data [25]. Decision-analytic models typically contain several health states of interest that require sourcing of utility estimates from primary studies or systematic reviews. In such cases, the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) guideline [26] recommends that analysts identify and extract health utilities from multiple sources in the published literature and synthesise if appropriate.

Catalogues and reviews of health utility values for childhood populations are available, but have largely focussed on a relatively small number of conditions, such as acute lymphoblastic leukaemia [27, 28], asthma [29], neurodisability [30] and childhood obesity [31], or on a single valuation source, such as EQ-5D [32], HUI3 [33] and SG [34]. The PEDE project is an alternative source of childhood health utility values and contains 2112 values from 857 CUAs in paediatric populations published between 1980 and 2016 [35]. However, only 8.5% (n = 73) of these CUAs conducted primary estimation of utility values [36]. Furthermore, published CUAs represent only one source of childhood health utility values.

This paper addresses three aims against a backdrop of rapid development of measurement approaches and methods guidance in this area. First, it systematically describes the patterns of primary studies and samples measuring childhood health utilities, by study design and sample characteristics, as well as patterns in the source of tariffs when MAUIs compatible with or specific to children are used. Second, linear trends in the numbers of studies and samples, their associations with the number of paediatric CUAs and changes in proportions of studies and samples before and after key transition timepoints are explored. Third, associations between valuation methods and sample age and other methodological factors are explored.

Methods

Systematic review

The systematic review followed PRISMA guidelines [37] and covered all studies published by 30th June 2017. The search strategy (Supplementary Material Table S1), inclusion and exclusion criteria and databases included—PubMed, Embase of OVID Medline, Web of Science, PsycINFO, Cochrane Library, CINAHL and EconLit—were based on the study by Kwon and colleagues [38], which covered a period up to 31 December 2015 and focused only on static descriptions of included studies and samples alongside a meta-regression of childhood utility values. The search strategy was developed and piloted prior to implementation and included an intersection of health utility, valuation method and childhood search terms. Non-English language articles were excluded. The PEDE database was also searched to identify CUAs published between 1980 and 2016 that incorporated primary estimation of health utilities.

Articles that met the inclusion criteria were English language primary studies reporting health utilities for childhood populations or for childhood conditions or descriptors using direct or indirect valuation methods. Two reviewers (JK and SWK) independently assessed titles and abstracts. Articles with two approvals proceeded to the next stage; those with one were referred to a third reviewer (SP) for arbitration. At the second stage, the same primary reviewers analysed full-text articles with disagreements referred to the third reviewer. Conference abstracts were included if they reported original health utilities. Studies reporting primary VAS values were included despite disagreement about their validity for QALY construction [39].

Data extraction

From each study that met the inclusion criteria, the variables listed in Supplementary Material Table S2 were extracted using a proforma. These included bibliographic details, study design, study setting, valuation method, tariff applied to MAUIs (including source population and valuation method), respondent type, administration mode, sample target age(s), whether the health state was experienced or hypothetical, sample size and geographic setting. An International Classification of Diseases 10th revision (ICD-10) code was allocated to each sample according to the health condition studied.

Descriptive analyses

The distribution of mean and median utility and VAS (rescaled to 0–1) scores extracted from included samples was calculated. The number of studies and samples by publication year was estimated together with the annual number of CUAs included in the PEDE database. Valuation methods applied to studies within each ICD-10 chapter were described. The numbers of samples by respondent type, administration mode, target age(s) of sample and geographical factors (including continent of origin and national income levels, with income classification taken from the World Bank [40]) were estimated. Study designs and valuation methods were grouped into key categories. For application of MAUIs that are compatible with or specific to childhood populations, the tariffs applied and sources of their values (age group and setting/nationality) and valuation method (e.g. VAS, SG, TTO) were described by MAUI.

Statistical analyses

Linear regression tested for linear trends in numbers of PEDE-based CUAs, utility studies and utility samples by year of publication. Controlling for these annual trends, associations between number of PEDE-based CUAs and number of utility studies and samples were also estimated. We tested the hypothesis that there were significant changes to the composition of childhood utility data categorised by study- and sample-level characteristics, including study design, health condition, valuation method, age of target population, respondent type, administration mode and valuation of hypothetical health states. Tests of proportions, at the 95% confidence level, were used to assess whether the proportion in each category of study- and sample-level characteristic changed significantly before and after a pre-specified transition point of 2009, which was when CUAs became the prominent analytic approach for paediatric economic evaluations within the PEDE database [24]. In a separate analysis involving only UK studies and samples, 2013 was specified as a transition point when NICE identified EQ-5D-Y as its reference instrument for utility assessment in children aged 7–12 years, while reaffirming EQ-5D as its reference instrument for those aged 13 years and over [2]. The hypothesis was that the guidance had a significant effect on the design of UK-based primary studies marked by a greater proportion of samples using the EQ-5D-Y for pre-adolescent populations (age < 12 years) and smaller proportions using non-reference direct or indirect valuation methods. Finally, two-way Pearson’s Chi-square tests were used at the 95% confidence level to assess the associations between valuation method and sample age, respondent type, administration mode and valuation of hypothetical health states. The hypothesis tested was that there are significant associations between methodological factors selected by researchers.

Results

Systematic review

Figure 1 presents the PRISMA flow diagram for the systematic review. The main reasons for exclusion were secondary studies using decision-analytic models, previous systematic reviews (which were kept for manual searching), targeting of adult populations and focus on non-preference-based health outcome measures. A total of 274 articles were included for data extraction. Manual searching yielded 45 further articles, whilst a search of PEDE yielded a further 16 articles. In total, data were extracted from 335 articles, which are summarised by health condition, intervention type (where applicable), country of study population, valuation method, respondent type, administration mode and age of target population in Supplementary Material Table S3.

Fig. 1
figure 1

PRISMA flow diagram. Note PRISMA: Preferred reporting items for systematic reviews and meta-analyses. HRQoL Health-related quality of life. PEDE paediatric economic database evaluation

Descriptive analyses

Distribution of sample means and medians

Descriptive statistics for health utilities were extracted from 3974 samples contained within the 335 studies. The majority of studies (306 out of 335) contained two or more samples delineated by health condition/state, sociodemographic factors (e.g. gender, age) or methodological factors. Figure 2a depicts the frequency distribution of 3573 mean utility and VAS scores (excluding 191 samples, which only reported mean change in utility scores or regression coefficients) with each bar stratified by valuation method, while Fig. 2b does so for 870 median utility and VAS scores (some samples report both mean and median scores). The difference in distributional characteristics between mean and median scores is visible, with the negative skew much greater in median scores. Among samples reporting mean scores, 0.34% (12 out of 3573) demonstrated the ceiling effect, namely a mean score of 1. The corresponding proportion was 7.36% (64 out of 870) among samples reporting median scores. From visual inspection, there appears to be greater concentrations of trade-off-based direct valuation methods (TTO, SG and their variants) at the upper end of both mean and median utility score ranges. Trade-off-based direct valuation methods comprised 17.9% of all samples generating mean utility scores higher than 0.800, but 10.5% of all samples with mean scores ≤ 0.800. The corresponding proportions were 33.0% and 14.0% for all median utility scores higher than 0.800 and ≤ 0.800, respectively.

Fig. 2
figure 2

a Distribution of mean utility and VAS scores (n = 3573) by valuation method. Note VAS: visual analogue scale; TTO: time trade-off; SG: standard gamble; MAUI: multi-attribute utility instrument; NPB: utility mapped from non-preference-based instrument. b Distribution of median utility and VAS scores (n = 870) by valuation method. NoteVAS visual analogue scale, TTO time trade-off, SG standard gamble, MAUI multi-attribute utility instrument

Distribution of samples by health condition and valuation method

Table 1 summarises the distribution of included samples by ICD-10 chapter. All ICD-10 chapters relevant to childhood health were covered by samples included in the systematic review. The ICD-10 mental and behavioural disorders chapter contained the highest number of samples (n = 698), followed by general childhood population health (n = 501) and cancer (n = 442). Across all ICD-10 chapters, 180 unique ICD-10 codes were used to label samples.

Table 1 Number of samples by ICD-10 chapter and the most frequently used valuation methods by ICD-10 chapter (% of samples in chapter)

Twenty-three unique valuation methods were identified. These were grouped into six key categories: (i) VAS—EQ-5D VAS (number of samples = 348; 8.8%), EQ-5D-Y VAS (n = 232; 5.8%) and stand-alone VAS (n = 252; 6.3%); (ii) trade-off-based direct valuation methods—TTO (n = 171; 4.3%), SG (n = 227; 5.7%), chained gamble and adjusted SG (n = 143; 3.6%); (iii) adult-specific MAUIs—EQ-5D (n = 424; 10.7%), SF-6D (n = 34; 0.9%), AQoL-5D (n = 16; 0.4%) and 15D (n = 2; 0.05%); (iv) MAUIs compatible with both childhood and adult populations—QWB (n = 224; 5.6%), HUI2 (n = 482; 12.1%), HUI3 (n = 822; 20.7%), modified HUI (10-dimension variant of HUI [41], HUI3 with ‘worst imaginable health’ as 0 instead of death [42]; n = 8; 0.2%) and ABC-UI (Aberrant Behaviour Checklist Utility Index) [43] (n = 1; 0.03%); (v) childhood-specific MAUIs—EQ-5D-Y (n = 108; 2.7%), CHU9D (n = 231; 5.8%), 16D (n = 73; 1.8%), 17D (n = 39; 1.0%), AQoL-6D (n = 50; 1.3%), PAHOM (Pediatric Asthma Health Outcome Measure) [44] (n = 69; 1.7%) and CH-6D (Child Health-6 Dimensions) [45] (n = 3; 0.08%);and (vi) mapping non-preference-based clinical measures to utility indices [46,47,48,49,50] (n = 15; 0.4%).

Table 1 describes the three most frequently used valuation methods by ICD-10 chapter. HUI2 or HUI3 was used by 89.1% of cancer samples, which is consistent with these measures originally being developed for paediatric cancer patients and survivors [51, 52]. Childhood-specific MAUIs represented the most frequently used valuation method for only 3 out of 20 chapters: CHU9D (27.4% for general health), PAHOM (35.8% for respiratory system disorders) and 16D/17D (20.0% for digestive system disorders). MAUIs compatible with childhood and adult populations were the most frequently used methods for eight chapters: HUI2 for chapter 2, HUI3 for Chaps. 6, 8, 16, 17 and combined chronic diseases and QWB for chapter 19.

Distribution of samples by methodological factors, age and geographical setting

Table 2 presents the number of study samples by respondent type, administration mode, target age of children and geographical setting. Thirty-seven percent of samples (n = 1498) used self-assessment by children. A total of 456 samples (11.5%) allowed children to generate responses together with proxies. The remainder of the samples relied on proxy assessment, most commonly by parents (n = 1091; 27.5%). Administration modes fell broadly into two categories: (i) self-administered surveys in school or clinical settings, by mail, online or by Delphi elicitation of clinicians [53]; and (ii) interview-administered surveys by face-to-face meeting or by telephone. Target age groups spanned the whole age spectrum of childhood. Six percent of samples contained infants (aged < 2 years), while some samples (n = 41) had a minimum age of 18 years even though they specified adolescents as the target group. A significant proportion of samples (14.4%) did not report any information on age. As for geographical setting, the skew towards high-income countries was clear (91.6%). The US produced the highest number of samples (n = 970; 24.4%) followed by Canada (n = 674; 17.0%) and the UK (n = 634; 16.0%).

Table 2 Number of samples by respondent type, mode of administration, age of children and geography (% of all samples)

MAUIs and tariff application

Table 3 lists the tariffs that were used in applications of MAUIs compatible with or specific to childhood populations and their valuation populations and methods. All 16D and 17D samples that provided information applied tariffs derived from children or their proxies (Finnish schoolchildren aged 12–16 years using VAS [19] for 16D or their proxies (Finnish parents of children aged 8–11 years) using VAS [20] for 17D). For AQoL-6D, 72.0% of samples applied adolescent-derived tariffs [54], while 28.0% applied the general adult-derived tariff [55]. Adolescent-derived tariffs [56, 57] were used in 46.3% of CHU9D samples and general adult-derived tariffs [58] in 52.8%. Only a single EQ-5D-Y sample applied childhood-derived tariffs [59], while 78.7% of samples applied general adult-derived tariffs from 10 countries [60,61,62,63,64,65,66,67]. Overall, 54.3% of samples using childhood-specific MAUIs applied childhood-derived tariffs, 40.0% applied adult-derived tariffs and 5.7% gave no information on underpinning tariffs. For MAUIs compatible with both childhood and adult populations, 26.4% of samples applied childhood-derived tariffs, while 67.5% applied adult-derived tariffs. HUI2 offers both childhood- [68,69,70] and adult-derived [71] tariffs, while the HUI3, QWB and ABC-UI only offer adult-derived tariffs [72,73,74].

Table 3 Multi-attribute utility instruments (MAUIs) developed for children and tariff valuation population and method

Statistical analyses

Linear trend in utility studies and samples and PEDE CUAs

Figure 3 jointly depicts trends in number of utility studies and samples and number of paediatric CUAs in the PEDE database. It also marks out years 2009 and 2013, which are the transition timepoints for the two separate periodic changes analysed below. The upward trends in all three groups are visible and confirmed by statistical tests for linear trend. The coefficients for year of publication were 1.15 (95% confidence interval: 0.96 to 1.36; P < 0.001) for number of utility studies, 12.70 (95% CI 9.32 to 16.08; P < 0.001) for number of utility samples and 3.63 (95% CI 3.01 to 4.25; P < 0.001) for number of PEDE CUAs. The number of utility studies increased at a steady rate across the whole review period, while the numbers of utility samples and PEDE CUAs increased markedly in the late 2000s.

Fig. 3
figure 3

Number of studies and samples from 1990–June 2017 and CUAs in PEDE 1990–2016. Note Category 2017 denotes papers published up to 30th June 2017; PEDE: Paediatric Economic Database Evaluation; CUAs: cost–utility analyses; NICE: National Institute for Health and Care Excellence

Association between PEDE CUAs and utility studies and samples

Linear regression of the number of utility studies on number of CUAs, controlling for linear time trend, showed no association (coefficient on CUAs: 0.114; standard error 0.763; P value: 0.097). There was similarly no association between number of utility samples and number of CUAs (coefficient: 0.624; SE: 1.189; P value: 0.605). No associations were similarly found between proportion of paediatric economic evaluations that were CUAs and the number of primary utility studies and samples.

Periodic change in composition of utility studies and samples

Table 4 summarises the numbers of studies and samples by categories of study design, health condition, valuation method, target age, respondent type, administration mode and valuation of hypothetical health state. Supplementary Materials Tables S4-S7 present the results across 5-year publication intervals. Comparisons between two broader periods, 1990–2008 and 2009–June 2017, saw a significant decline in the proportion of patient case series (10.2% point decrease; P = 0.013).

Table 4 Periodic change in proportion of studies and samples in each study design and sample characteristic category

In terms of health condition, comparisons between the two periods revealed significant changes in proportion of samples in all ICD-10 chapters except musculoskeletal system disorders. Categories that saw the greatest change were cancer (18.7% point decrease; P < 0.001), general health (15.1% point increase; P < 0.001), mental and behavioural disorders (12.1% point increase; P < 0.001) and injury (11.4% point decrease; P < 0.001).

There were equally significant changes for valuation method, respondent type, administration mode and valuation of hypothetical health states. For valuation method, childhood-specific MAUIs saw the greatest increase (15.6% point increase; P < 0.001) whilst trade-off-based direct valuation methods saw the greatest decrease (15.7% point decrease; P < 0.001). However, there was also a significant increase in the use of adult-specific MAUIs (4.5% point increase; P < 0.001) and VAS approaches (7.4% point increase; P < 0.001). MAUIs compatible with both childhood and adult populations experienced a significant decrease in use (12.0% point decrease; P < 0.001). There was a significant shift towards self-assessment of health status by children (24.0% point increase; P < 0.001), and the use of self-administered surveys in school and clinic settings (29.9% point increase; P < 0.001). Similarly, there was a significant increase in the number of samples valuing experienced rather than hypothetical health states (16.8% point increase; P < 0.001). As for age of target population, there was a significant shift in focus to pre-adolescent populations (4.2% point increase; P = 0.011) and a significant fall in the proportion of samples failing to provide information on age (7.1% point decrease; P < 0.001).

Influence of HTA guidance on primary samples

Table 5 presents similar tests of proportions comparing two periods, 1990–2012 (before the publication of the 2013 NICE HTA guidance) and 2013–June 2017, in 355 UK samples containing pre-adolescents (mean/median or minimum age 12 or below) and 207 UK samples not containing pre-adolescents. Samples containing pre-adolescents increased significantly as a proportion of all samples from 45.0% in 1990–2012 to 85.1% in 2013–June 2017 (40.1% point increase; P < 0.001). Seventy-two samples in 1990–2012 did not report age of target population compared to none in 2013–June 2017. The evidence supporting our hypothesis that publication of the NICE guidance would increase the use of reference instruments in relevant age groups and reduce the use of non-reference instruments was mixed. For samples containing pre-adolescents, the use of EQ-5D-Y increased from 3.9% of samples in 1990–2012 to 16.9% in 2013–June 2017 (13.0% point increase; P < 0.001). Moreover, the use of the non-reference HUI2/3 and trade-off-based instruments fell from 75.4% and 5.8% of samples, respectively, in 1990–2012 to no samples in 2013–June 2017. However, the use of EQ-5D also increased from 11.1 to 32.4% (21.3% point increase; P < 0.001), while the uses of the non-reference CHU9D (34.5% point increase; P < 0.001) and VAS approaches (12.3% point increase; P < 0.001) both increased.

Table 5 UK samples by valuation method and period before and after 2013 NICE guideline

Associations between methodological factors

Table 6 displays the association between valuation method, target age and three other methodological factors—respondent type, administration mode and valuation of hypothetical health states. In all four tests, there is significant evidence to reject the null hypothesis of no association, with P values all less than 0.001. Results suggest that VAS approaches are more likely to be used in pre-adolescent samples and to be assessed by children themselves using self-administered surveys. However, VAS approaches are also more likely to be used to assess hypothetical rather than experienced health states. Trade-off-based direct valuation methods are more likely to be used in older adolescent samples and/or involve proxy assessment. They are also more likely to be administered by interview and for assessing hypothetical health states. The associative patterns varied for the three categories of MAUIs. Adult-specific MAUIs were more likely to be used in adolescent samples, while MAUIs compatible with or specific to childhood populations were both more likely to be used in pre-adolescent samples. Adult-specific MAUIs were more likely to involve self-assessment by children, and MAUIs compatible with both populations more likely to involve proxy assessment. All MAUI categories were more likely to be self-administered than interviewer-administered and be used to assess experienced rather than hypothetical health states.

Table 6 Association between valuation method and (1) age of sample; (2) respondent type; (3) administration mode and (4) valuation of hypothetical states

Discussion

This study represents the most comprehensive systematic review of primary studies reporting health utilities for childhood conditions, covering all studies published up to 30 June 2017. All ICD-10 chapters relevant to childhood health, 23 valuation methods, all childhood ages, 12 respondent types, 8 administration modes and 42 country settings were observed across 3974 samples from 335 studies. There were strong upward linear trends in numbers of utility studies and samples and PEDE CUAs. There was no statistically significant association between numbers of PEDE CUAs and utility studies and samples after controlling for linear trend. Adopting year 2009 (when CUA became the prominent analytic approach for paediatric economic evaluations [24]) as a key transition point, the study found evidence of significant changes in composition of primary samples across health condition, valuation method and other methodological factors. There was also evidence of more refined approaches in primary utility research as reflected by greater pre-adolescent population coverage, target age reporting and valuation of experienced health states. For a subset of UK samples and using the year 2013 as a transition point (when NICE HTA guidance [2] was published), there was weak evidence that primary utility research adhered to national guidance. Finally, tests of association found that sample age, respondent type, administration mode and valuation of hypothetical health state varied significantly by valuation method.

Previous systematic reviews of primary studies assessing childhood health utilities have largely focused on specific diseases [27,28,29,30,31] or valuation methods [32]. Reviews that assembled data across heterogenous childhood health conditions, valuation methods and other methodological factors are few. A review by Tarride and colleagues [54] included 77 studies measuring utilities published before 2008, but limited its scope to asthma, cancer, combined chronic diseases, diabetes and skin disease and to a limited number of valuation methods, namely HUI, EQ-5D, SG and TTO. Temporal patterns in included studies and samples were not explored. A review by Thorrington and Eames [55] included 90 studies published in 1994–2013 measuring childhood health utilities using direct or indirect valuation methods. Temporal patterns in the data led to similar findings to this review: adult-specific valuation methods increased in use despite the availability of childhood-specific measures, whilst childhood-derived tariffs were seldom applied. However, the data were not categorised by health condition or age of sample, and statistical tests were not performed to evaluate the variations over time or associations between factors. Finally, a review by Kwon and colleagues [38] used the same search strategy to identify 272 studies published by 31 December 2015, thus overlapping significantly in data sources with this study. However, their descriptive and statistical analyses focused on non-temporal patterns in studies and samples and on associations between mean utility values and health conditions and methodological factors through meta-regression. Moreover, no comparison was made to PEDE CUAs, whilst the impact of HTA methods guidance on primary research was not assessed.

To our knowledge, this review is the first to summarise the distribution of all mean and median utility scores relevant to childhood health. The results show a clear disparity in distributional features between alternative central statistics of mean and median utilities. The ceiling effect, known to be significant in distributions of individual-level utility data [56], was visible in the distribution of medians but small in that of means. Negative skew in individual-level utility would produce downward adjustment of means relative to medians, and this is likely reflected in the less pronounced ceiling effect in the distribution of means. Caution is thus warranted when using aggregate or pooled outcomes in economic evaluations since they contain heterogeneous distributional features depending on central statistic chosen, which may also differ from the distribution of individual-level data [57]. The distributional skew also varied by valuation method, with means and medians from trade-off-based direct valuation methods (TTO, SG and variants) concentrated at the upper end of the utility range. This is consistent with several meta-regression studies [58,59,60,61] on health utilities for diverse conditions, which also show higher utilities from TTO and SG than from other valuation approaches, potentially due to biases in making trade-offs such as loss and risk aversions [62].

The review is also the first to categorise all primary samples by ICD-10 chapter and code and document their compositional changes over time. Ascertaining how well the distribution of utility samples by ICD-10 chapter reflects prevalence of health needs in children is of interest. Were and colleagues [63] noted transitions in childhood morbidity patterns from infectious diseases to congenital anomalies, injuries and non-communicable diseases, including obesity, diabetes, cancer and respiratory diseases, while Laski and colleagues [64] highlighted mental health as an emerging priority for adolescent populations. These observations are reflected in our review data: the proportion of samples for infectious diseases fell across two primary time periods, 1990–2008 and 2009–June 2017, while the proportions for endocrine, nutritional and metabolic disorders, mental and behavioural disorders, respiratory system disorders and congenital malformations increased significantly, although those for cancer and injuries fell. The distribution of samples by health condition can also be compared to the focus of paediatric CUAs. Although the latter may reflect policy interest more than underlying health needs, collection of primary data should support such secondary research. The report by Sullivan and Ungar [65] catalogued the distribution of paediatric CUAs by ICD-9/10 chapter. Infectious diseases remained the dominant topic, accounting for 40.6% of all paediatric CUAs between 1980 and 2013. This was followed by nervous system disorders (8.9%), respiratory system disorders (7.8%) and diseases of the blood (5.9%). This contrasts with our observations for utility data, with the above categories only accounting for 4.9%, 2.9%, 4.8% and 2.1% of samples, respectively. Mental health accounted for the largest share of utility samples (17.6%) but just 4.3% of paediatric CUAs. The lack of concordance is consistent with the lack of association found between the number of paediatric CUAs and numbers of utility studies and samples, controlling for linear trend. Future studies should assess the extent to which primary utility data facilitate CUAs and the extent to which CUAs motivate primary utility research.

Another contribution of the review is documenting the change in patterns of valuation method over time. A clear pattern was visible in the movement away from trade-off-based direct valuation methods towards MAUIs that require only simple responses on a classification system and separate valuations using tariffs derived from representative populations. The proportion of samples using trade-off-based direct valuation methods fell from 23.7% in 1990–2008 to 8.0% in 2009–June 2017, whereas the proportion of samples using MAUIs of any type increased from 59.9% in 1990–2008 to 67.9% in 2009–June 2017. This perhaps reflects the concern over feasibility of applying cognitively demanding assessments in children [15], as well as the increasing number of national guidelines that recommend MAUIs over direct valuation methods [2,3,4,5,6]. The most significant growth occurred for childhood-specific MAUIs, which increased from 4.4% of samples in 1990–2008 to 20.0% in 2009–June 2017. Tests of association also showed that childhood-specific MAUIs were more likely to be self-assessed, self-administered and focussed on experienced health states. The question remains as to whether this represents an unambiguous improvement in childhood health utility assessment. Childhood-specific MAUIs have psychometric properties suited to children [15,16,17,18,19] and are recommended by some decision-making bodies [2, 6]. However, this raises a concern over comparability of their utilities to those of adult-specific MAUIs, particularly when used as inputs into life-course decision models. Moreover, a review of childhood-specific MAUIs found no study that maps across childhood- and adult-specific utilities [22].

The above concern over comparability may explain the finding that the publication of HTA guidance by NICE had a mixed effect on primary utility assessments. In the UK samples containing pre-adolescent populations, the proportion using the adult-specific EQ-5D increased by more than the proportion using the childhood-specific EQ-5D-Y after the guidance publication in 2013. Adlard and colleagues [66], after reviewing 43 paediatric CUAs set in the UK and published between 2004 and 2010, noted the paucity of childhood-specific utility data, and recommended increased funding for primary research that would follow NICE guidance. There is only weak evidence that this has occurred since 2013. It is conceivable that NICE guidance may have encouraged the use of an adult-specific reference instrument (i.e. EQ-5D) in both primary and secondary research, particularly for life-course models that prioritise comparability of utility scores over the whole age spectrum.

This study also catalogues all tariffs applied to MAUI-based studies, the population from whom they were derived and their valuation method. Around one-quarter (23.7%) of samples using MAUIs provided no information on the tariff applied, and this proportion increased from 19.2% in 1990–2008 to 26.0% in 2009–June 2017. Among samples that used MAUIs compatible with or specific to childhood populations, 60.0% applied adult-derived tariffs, and this proportion increased from 50.3% in 1990–2008 to 64.6% in 2009–June 2017. Only one sample using the EQ-5D-Y applied childhood-derived tariffs, while none was available for the HUI3. However, it is difficult to conclude that application of childhood-derived tariffs is more appropriate than adult-derived tariffs even for MAUIs compatible with or specific to childhood populations. Children are not autonomous legal, social and economic agents and do not bear the burden of financing public healthcare. Hence their preferences, even when conflicting with those of the general adult public, may arguably be of less importance. That said, analysts should recognise that applications of childhood and adult-derived tariffs may produce enough variation in utility values to alter policy decisions [67]. They should therefore include utility values from both tariffs where they are available in sensitivity analysis to observe the impact of alternative tariff populations.

One potential-related research area is assessing the quality of utility research and its improvement over the review period. However, the tests of association between sample-level characteristics showed that choice in one methodological area (e.g. use of proxy assessment) was strongly associated with other methodological choices (e.g. use of cognitively challenging trade-off-based method) and target population age, which makes it challenging to assess quality and its improvement in terms of methodological choices (e.g. the proportion of samples using proxy assessment). This is more so in the absence (to our knowledge) of any validated quality assessment tool for primary utility studies. That said, the increase in the proportion of samples reporting target population age over the review period was an unambiguous improvement, as was the decrease in the proportion of samples valuing hypothetical health states that may result in significant overestimation of disease burden relative to valuation of experienced health states in childhood [38]. There were also areas of concern. First, there was a lack of studies and samples set in lower and middle-income countries (LMICs). Only 8.2% of samples were from LMICs, consistent with the paucity of paediatric CUAs in LMICs, despite their substantially higher paediatric disease burdens [68, 91]. Among LMIC samples, 28.4% used VAS measures despite concerns over their validity for QALY construction [39], a proportion higher than for all samples (20.9%). Second, among the samples using MAUIs, 48.5% applied tariffs derived from populations foreign to the primary study setting, though the proportion fell from 51.7% in 1990–2008 to 46.9% in 2009–June 2017. Among LMIC samples that used MAUIs, this proportion was 75.5%. Clearly, there is a pressing need for more national tariff derivation studies, and for more accurate reporting of tariff information in publications.

There are several limitations to the study. First, some eligible studies may have been missed during the systematic review process, even though the search strategies were extensively piloted to maximise sensitivity. Second, several studies did not report or did not clearly specify important covariates, such as age and comorbidities, which may affect utility values. Third, the broad criteria for selection of valuation methods resulted in inclusion of modified forms of methods (e.g. for TTO, SG and HUI) and preference-based disease-specific instruments (e.g. PAHOM and ABC-UI), which complicated between-method comparisons. Fourth, the choice of 2009 as a transition point for all studies and samples was not based on potential causal or associative mechanisms determining the design and volume of primary utility research. Finally, cross-tabulations between methodological factors were limited to two-way Chi-square tests for association, and further research should consider application of multivariate regressions.

Conclusion

This systematic review reveals significant growth in both volume and diversity of childhood utility studies over the past three decades. HTA agencies should note the weak adherence to guidance concerning the use of reference case valuation methods by primary studies. There is a need for studies that derive tariffs in settings relevant to primary utility collection. Geographic coverage of utility assessment is heavily skewed to high-income countries, and further research in LMICs is pressing.