Handling missing data in a composite outcome with partially observed components: simulation study based on clustered paediatric routine data

Composite scores are useful in providing insights and trends about complex and multidimensional quality of care processes. However, missing data in subcomponents may hinder the overall reliability of a composite measure. In this study, strategies for handling missing data in Paediatric Admission Quality of Care (PAQC) score, an ordinal composite outcome, were explored through a simulation study. Specifically, the implications of the conventional method employed in addressing missing PAQC score subcomponents, consisting of scoring missing PAQC score components with a zero, and a multiple imputation (MI)-based strategy, were assessed. The latent normal joint modelling MI approach was used for the latter. Across simulation scenarios, MI of missing PAQC score elements at item level produced minimally biased estimates compared to the conventional method. Moreover, regression coefficients were more prone to bias compared to standards errors. Magnitude of bias was dependent on the proportion of missingness and the missing data generating mechanism. Therefore, incomplete composite outcome subcomponents should be handled carefully to alleviate potential for biased estimates and misleading inferences. Further research on other strategies of imputing at the component and composite outcome level and imputing compatibly with the substantive model in this setting, is needed.


Introduction
Composite measures combine information from multiple measures into a single summary score [11,29,37,20,8,6]. In health care settings, composite measures are used as scorecards to measure and benchmark performance and quality of care in neonates [30] and cardiovascular care among adults, [8,6,12,14]. In 2010, Profit et al. [30] presented a conceptual framework on composite indicator development in paediatrics care. More recently, Opondo et al. [25] developed and validated the Paediatric Admission Quality of Care (PAQC) score; a 7-point composite score aimed at benchmarking processes of care among children admitted with common childhood illnesses in low-income settings. In the validation study, PAQC score was shown to be a good proxy for outcome of care [26].
Besides gain in statistical efficiency, composite scores reduce the amount of data processed thus providing global insights and trends about complex and multidimensional quality of care processes [29,37,14]. In addition, the issue of multiple testing is avoided [31,15]. Although composite outcomes complement single measures, weak theoretical and statistical assumptions may undermine the overall reliability [6,30]. For instance, use of inappropriate methods to deal with partially observed subcomponents may impede the validity and reliability of the composite measure in subsequent analyses and inferences [29,20,6,14,10].
In the literature, multiple imputation (MI), proposed by Rubin [36], offers a good, often best practice, solution in dealing with partially observed outcomes and covariates [22,39,7,13]. In particular, handling missing data in single outcomes (with no subcomponents) is straight forward because the imputation model is usually equivalent to the analyst's model [18]. On the other hand, dealing with missing data in composite outcome context has not received the same level of attention with no consensus on whether to impute at the composite score level or at the missing components level [20,38].
For instance, in a previous analysis of routine paediatric data with PAQC score as the outcome of interest [16], unobserved quality of care indicators (subcomponents) across three domains of care (i.e. assessment, diagnosis and treatment domain) were treated in the conventional approach described in the subsequent section. In another study, MI at item level was used to handle missing PAQC score subcomponents in the treatment domain [17]. Although the proportion of missingness was small, a comparison of the two study results showed notable differences in parameter estimates.
Through a range of simulation conditions, that is, three missing data rates under two missing data mechanisms, we sought to assess the implications of the missing data method (MI versus the conventional method) employed in addressing missing PAQC score subcomponents. Specifically, the amount of bias in regression coefficients and corresponding standard errors attributable to missing PAQC score subcomponents across the simulation scenarios was obtained and compared between MI and the conventional method.

Relevant data
In previous studies mentioned above, routine pneumonia data collected in a clusterrandomized trial in 12 Kenyan hospitals between March and November 2016 [2] were analysed. In total, 2127 pneumonia patients aged 2-59 months were admitted by 378 clinicians across the 12 hospitals. The composite outcome of interest (PAQC score) was adjusted from its original form to encompass the new pneumonia treatment guidelines recommended by the World Health Organization in the year 2013 [27].
In these studies, PAQC score was constructed using 12 pneumonia care indicators using the conventional approach presented in Table 1. The 12 indicators comprised of 9 signs and symptoms in the assessment domain, 1 indicator in the diagnosis and classification domain and 2 indicators in the treatment domain. Specifically, 6 binary indicators were created with 1 representing adherence to recommended childhood pneumonia guidelines and 0 representing inappropriate care. The binary indicators were then summed across domains of care to obtain an ordinal PAQC score ranging between 0 and 6. Variation on the 7point scale was due to missing data and/or inappropriate care across the three domains of care. That is, missing data in patient's weight, oral amoxicillin dose, and/or frequency of drug administration among oral amoxicillin recipients. In this study, inappropriate care refers to undocumented primary and secondary signs and symptoms in the assessment domain, incorrect severity classification, undocumented oral amoxicillin prescription or prescription of the drug in the wrong dose or frequency [25]. The predictor variables of interest included an interaction between the intervention arm and follow up time (in months), hospital-level factors (i.e. malaria prevalence status and paediatric admission workload), clinician level factors (i.e. gender and cadre). At patient level, gender, age and the number of comorbid illnesses were considered [16].

Simulation study
The aim of this study was to simulate data mimicking the observed pneumonia trial data set. However, simulating a standard data set based on model parameters while preserving the correlation structure was a challenge due to the complex multilevel structure of the trial data set. This was in addition to mixed variable types in covariates and outcome subcomponents in the observed trial data set. To circumvent this challenge, missing data were generated in a complete subset of pneumonia trial data.
Specifically, the simulation study targeted subcomponents in the treatment domain namely patient's weight, oral amoxicillin dose prescribed and frequency of oral amoxicillin administration. The three pneumonia care indicators are required in the calculation of correctness of the prescribed oral amoxicillin dose. To create a subset of pneumonia trial data, complete in the treatment domain subcomponents of interest, we excluded 65/2127 (3.1%) case records with missing oral amoxicillin prescription. Of the remaining 2062 (96.9%) pneumonia case records, 1036 (50.2%) were prescribed oral amoxicillin while 1026 (49.8%) pneumonia cases were not. Amongst patients prescribed oral amoxicillin, we further excluded 61/1036 (5.9%) cases for whom weight (n = 30), amoxicillin dose (n = 4) or frequency of amoxicillin administration (n = 27) were missing. Therefore, the base data set used in the simulation study consisted of 2001(94.1%) pneumonia patients nested within 372 admitting clinicians in 12 hospitals. Although the data set was complete in the outcome subcomponents of interest, one patient and two clinician level covariates still had missing data. Specifically, patients' gender was missing in < 1% of the case records while clinicians' cadre and gender were missing in 22.3% (83/372) and 25.1% (82/372) cases, respectively.

Standard parameter estimates
The base data set was used to estimate standard parameter estimates as follows. First, pneumonia PACQ score was constructed for each patient using the procedure described above. Thereafter, missing covariates were imputed 10 times using the latent normal approach within multilevel joint model imputation framework [7,33]. A two-level imputation model for the ith, (i = 1, . . . , 2127) patient nested within the jth,(j = 1, . . . , 378) clinician in hospital l, (l = 1, . . . , 12) was defined by where Y (1) ijl and Y (2) jl are vectors of partially observed level 1 covariates (patient's sex) and level 2 covariates (clinician's sex and cadre), respectively. Level 1 predictors (X (1) ijl ) included fully observed covariates (i.e. an interaction term between follow-up time and intervention arm, hospital workload and malaria prevalence status, patient's age and number of comorbid illnesses) and PAQC score. The corresponding level 2 predictor variables denoted by X (2) jl included an interaction between follow-up time and intervention arm, hospital workload and malaria prevalence status. A clinician random intercept (b jl ) was included to account for clustering at clinicians' level and to ensure compatibility with substantive models of interests. MI was conducted using jomo [32] package in R version 3.5.4. Imputed data set were then analysed using proportional odds random intercepts model implemented in R's Ordinal package [9]. The model corresponded to (2) where α k , k = 1, . . . ,6 are PAQC score specific intercepts, β are estimated regression coefficients and b jl are clinician's random intercepts. Final parameter estimates were pooled according to Rubin's rules [35]. The pooled estimates henceforth referred to as standard estimates and denoted by (β * MI )were used as reference estimates against which results from different simulation scenarios were benchmarked.

Simulation scheme
As earlier mentioned, missing data were induced in the base data set targeting three outcome subcomponents in the treatment domain, that is, patient's weight, oral amoxicillin dose prescribed and frequency of oral amoxicillin administration. Missingness was generated assuming missing completely at random (MCAR) and missing at random (MAR) mechanisms respectively. Binary missing data indicators were generated by sampling random numbers from a random binomial distribution with success probability rates of 3%, 10% and 40%. A 3% missing data rate was selected to evaluate the impact of low proportion of missingness while 10% and 40% were chosen to assess the extent of bias in moderate to high rates of missingness.
Under MCAR mechanism, missing values in the target treatment domain subcomponents were induced independent of other variables in the base data set, that is, covariates and outcome subcomponents in the assessment and clinical diagnosis domains. For the MAR condition, probabilities of missing data were conditionally dependent on variables associated with probability of missingness in the three variables of interest (based on the observed trial data set) (Supplementary Table A1). In both MAR and MCAR, missing data in the target variables were induced independently of each other, such that either one, two or all three variables were missing for any given patient. Each scenario was simulated 1000 times. Random number generators (seeds) were chosen and maintained for different scenarios to ensure reproducibility of results.
Thereafter, two approaches were used to handle missing data in each simulated data set. In the first approach, missing covariates were handled using MI (Equation (1)) while induced missing values in patient's weight, amoxicillin dose and frequency of administration (outcome subcomponents within treatment domain) were handled using the conventional approach. Undocumented signs and symptoms in the assessment domain were also handled using the conventional approach where they were score with value 0 at PAQC score construction stage described in Table 1. In the first approach, PAQC score was constructed prior to MI of missing covariates and hence included in the imputation model as a one of the predictor variables.
In the second approach, MI was used to handle partially observed covariates and missing treatment domain subcomponents. Outcome subcomponents in the assessment and diagnosis domains were included in the imputation model as predictor variables. In this approach, PAQC score was constructed after MI of incomplete subcomponents in the treatment domain. Variation on the 7-point scale was attributed to inappropriate pneumonia care which encompassed; lack of documentation of all primary sign and symptom, lack of documentation of all secondary signs and symptoms (assessment domain), misdiagnosis or misclassification of disease severity, failure to prescribe oral amoxicillin drug or prescription of the drug in the wrong dose or wrong frequency of amoxicillin administration [25].

Notation
A proportional odds random intercepts model (Equation (2)) was fitted to each imputed data set to obtain imputation-specific parameter estimates. Imputation-specific estimates were pooled using Rubin's rules [35] to produce a single estimate (β i,MI ) for the ith simulation. This procedure was repeated in all the scenarios.
Bias in regression coefficients was calculated as the differences between estimates (log odds) averaged over 1000 simulated data sets (β i,MI )/N)and estimated (log odd) (β * MI ) from the base data set. That is, To assess accuracy, model-based standard errors calculated as the average of the estimated within simulation standard errors were used. That is, The model-based standard errors were compared with empirical standard errors calculated as the standard deviation of the estimates of interest [5] across the 1000 data sets, that is, where N denotes the number of simulations,β i,MI is the coefficient estimated in the ith simulation andβ i,MI is estimator's average over 1000 simulations. The mean square error (MSE) which incorporates both measures of bias and variability [13,5,19] was calculated for the regression coefficients as Bias and accuracy of the corresponding standard errors were assessed in a similar manner. Coverage probability of the 95% confidence intervals were not applicable in this simulation study because missing data were simulated on the same subset of the pneumonia trial data set. To assess variability due to finite number of simulations [23], Monte-Carlo standard errors for estimated bias in regression parameters were calculated using Computation time was also used to assess performance of the two strategies employed in handling missing PAQC score subcomponents. Simulations were carried out using a server with the following specification: 40 GB memory, Intel Xeon E5-4650 (2.70 GHz) processor (12 cores/24 threads), Gnu/Linux Ubuntu 14.04 OS, and R (version 3.4.4) programming language. Figures 1 and 2, respectively, present bias estimated in regression coefficients and standard errors under the conventional and MI approaches across 6-simulation scenarios (i.e. 3 missing data rates of 3%, 10% and 40% and 2 missing data mechanisms namely MAR and MCAR).

Results
Results for specific scenarios with regard to estimated bias, empirical standard errors, model-based standard errors and MSE for regression coefficients and corresponding standard errors are presented in Tables 2 and 3, and supplementary Tables A2-A7. Monte-Carlo standard errors and confidence interval around bias for regression parameters across simulation scenarios are presented in Supplementary Tables A8-A9. Across 6-simulation scenarios, the regression coefficients either underestimated (negative bias) or overestimated (positive bias) the standard estimates ( Figure 1). Moreover, the magnitude of bias varied across variables and tended to increase with an increase in the proportion of missingness. However, the bias was much smaller when MI was used  to handle incomplete treatment subcomponents compared to the conventional approach (Figures 1).
On the other hand, the standard errors tended to overestimate the standard errors of the base data set resulting to positive bias across simulation scenarios ( Figure 2). It was further observed that for individual variables, the standard errors were less prone to bias compared to regression coefficients. These observations were made within and across simulation scenarios. Moreover, simulation results exhibited larger bias when missingness in treatment domain subcomponents were generated under MAR mechanism compared MCAR mechanism.
Across simulation scenarios, the estimated empirical standard errors were close to the estimated model-based standard errors ( Table 2, Table 3 and Supplementary Tables  A2-A7). In addition, the magnitude of both measures of accuracy tended to increase with an increase in the proportion of missing data in PAQC score components. The results further showed that MSEs were slightly larger under the conventional approach compared to MI approach and were somewhat larger under MAR mechanism compared with MCAR mechanism ( Table 2, Table 3 and Supplementary Tables A2-A7).
Across simulation scenarios that is, missing data mechanisms and rates of missingness, Monte-Carlo standard errors of estimated bias in regression parameters ranged between 0.001 and 0.04 (Supplementary Tables A8-A9). The corresponding 95% confidence intervals around bias in the parameters of interest were narrow across simulation settings.  Finally, the simulation process was on average more time-intensive under MI strategy compared to the conventional approach. Furthermore, the computational time increased with an increase in the proportion of missing data irrespective of the mechanism used to generate missing data.

Discussion and conclusions
In this study, bias in regression coefficients and corresponding standard errors attributable to missing PAQC score subcomponents were examined through a range of simulation scenarios. The study results demonstrated superiority of MI as a strategy for dealing with partially observed PAQC score domain subcomponents over the conventional method. Nevertheless, MI approach led to some level of bias in the regression coefficients. These observations could be due to lack of compatibility between the imputation model and the analysis model considering that PAQC score was not included in the imputation model. To be specific, incomplete subcomponents in the treatment domain were include in the imputation model as target variables while subcomponents in the assessment and diagnosis domains were included as predictors variables. In this case, the composite outcome was constructed after MI step. Therefore, further research is needed to compare the performance of the proposed MI method with that of MI including the composite outcome, possibly adapting substantive model compatible MI approaches [3] to this setting, in order to guarantee that the relation between subcomponent and composite outcome is preserved [34].
Simulation results further showed that regression coefficients were more prone to bias compared to standard errors across simulation scenarios. A possible explanation for these results is that case records with missing PAQC score subcomponents in the treatment domain were not discarded in both conventional approach and MI approach and hence no major impact on the estimation of standard errors.
Previously, MI has been used to address missing data at component level in composite scores assessing quality of patient's care [28,4]. Elsewhere, Simon et al., [38] proposed MI at index level particularly for smaller samples. In the case of PAQC score, there are no possibilities of missing PAQC score at aggregate level (the only possibilities are values between 0 and 6). Therefore, MI can only be implemented at subcomponents level.
In this study, MI was used to handle missing treatment domain subcomponents while undocumented pneumonia signs and symptoms in the assessment domain were regarded as inappropriate care and hence scored 0 in the binary indicators at PAQC score construction stage. This was in consideration of the trial's inclusion criterion which required recruitment of patients with syndromic pneumonia. That is, patients with at least one of the two primary pneumonia signs and symptoms (the presence of cough or difficult breathing) in addition to at least one secondary sign and symptom necessary for pneumonia severity classification [27,1]. As such, imputation of undocumented signs and symptoms was not expected to have any meaningful impact on simulation study based on pneumonia trial data. However, it should be noted that MI can be extended to handle missing PAQC score subcomponents in other domains of routine care in studies without such restrictive inclusion criteria. Moreover, analyses and MI procedure proposed in this study can be extended to other MI techniques [21,24] in order to examine performance in terms of computational cost, bias and measures of accuracy as appropriate.
In conclusion, MI produce minimally biased estimates in comparison with conventional method. However, the regression coefficients are more prone to bias compared to standards errors more so when the underlying mechanism is MAR. Besides, bias tended to increase with an increase in proportion of missing variable in the outcome subcomponents. Therefore, missing data in subcomponents composite measures should be addressed carefully to alleviate potential for biased estimates and misleading inferences.