Identification and prediction of novel classes of long-term disease trajectories for patients with juvenile dermatomyositis using growth mixture models

Abstract Objectives Uncertainty around clinical heterogeneity and outcomes for patients with JDM represents a major burden of disease and a challenge for clinical management. We sought to identify novel classes of patients having similar temporal patterns in disease activity and relate them to baseline clinical features. Methods Data were obtained for n = 519 patients, including baseline demographic and clinical features, baseline and follow-up records of physician’s global assessment of disease (PGA), and skin disease activity (modified DAS). Growth mixture models (GMMs) were fitted to identify classes of patients with similar trajectories of these variables. Baseline predictors of class membership were identified using Lasso regression. Results GMM analysis of PGA identified two classes of patients. Patients in class 1 (89%) tended to improve, while patients in class 2 (11%) had more persistent disease. Lasso regression identified abnormal respiration, lipodystrophy and time since diagnosis as baseline predictors of class 2 membership, with estimated odds ratios, controlling for the other two variables, of 1.91 for presence of abnormal respiration, 1.92 for lipodystrophy and 1.32 for time since diagnosis. GMM analysis of modified DAS identified three classes of patients. Patients in classes 1 (16%) and 2 (12%) had higher levels of modified DAS at diagnosis that improved or remained high, respectively. Patients in class 3 (72%) began with lower DAS levels that improved more quickly. Higher proportions of patients in PGA class 2 were in DAS class 2 (19%, compared with 16 and 10%). Conclusion GMM analysis identified novel JDM phenotypes based on longitudinal PGA and modified DAS.


Introduction
Uncertainty around medium-to long-term outcomes is a major burden of disease for patients and families, as highlighted by a recent qualitative study of children and young people with JDM [1]. In addition to characteristic features of proximal muscle weakness and pathognomonic skin rash, many patients develop a heterogeneous range of additional features, including calcinosis, lipodystrophy, treatment-resistant rash and internal organ involvement. Patients can have heterogeneous responses to treatment, which also poses a challenge for clinical management. At present there are few reliable predictive biomarkers for response to treatment.
Historically, disease courses in JDM have been described as monocyclic, polycyclic or chronic typically based on clinical assessment within the first 2 years after diagnosis [2][3][4][5][6]. Longer-term cross-sectional follow-up studies have indicated that some patients have ongoing disease activity and damage [7][8][9][10][11][12][13][14], underscoring the need to collect and understand data on long-term outcomes in JDM beyond the first 2 years. Longitudinal analyses enable insight into outcomes that change over time, and can be used to characterize a disease course in a data-driven manner. An example is a recent study that applied longitudinal methods to understand disease evolution using data from 95 patients at The Hospital for Sick Children, Toronto, Canada [15].
Growth mixture models (GMMs) are a longitudinal method for identifying latent or 'hidden' subgroups of patients with similar mean trajectories of disease activity over time when the number of groups and group membership are unknown [16]. In contrast to other methods for latent class trajectory analysis, GMMs allow individuals within each class to have differences in their trajectories. Although these alternative approaches are easier to implement, GMMs have a more substantive interpretation, as shown by the Canadian study results, which identified three different GMM 'typical' trajectories in total DAS, with baseline as DAS their predictor [15].
The aims of this study were to use GMM analysis to identify classes of patients with JDM with different trajectories of disease activity and to identify predictors of class membership at baseline, using data from a large, multicentre cohort study.

Patients and clinical data
All patients were recruited to the UK-wide JDM Cohort and Biomarker Study (JDCBS) [17], with ethical approval (UK Northern and Yorkshire Medical Research and Ethics Committee MREC/1/3/22). All patients provided written informed consent and age-appropriate assent. Patients are recruited at 16 centres in the UK (Supplementary Table S1, available at Rheumatology online), with de-identified patient data held within the JDCBS database. Centres are anonymized in this analysis to protect patient confidentiality. The JDCBS holds focus groups with young people with JDM to involve patients in study design and disseminates research at JDM regular family days.
For all patients (n ¼ 519 at time of study), demographic data and clinical features at first-recorded visit were collected, including diagnosis, abnormal respiration, arthritis, cutaneous ulceration, lipodystrophy and calcinosis. Abnormal respiration was defined as 'shortness of breath', 'accessory muscle use', 'tachypnea', 'requires oxygen' or 'ventilated' [18]. All other clinical features were defined by the treating clinician at the firstrecorded visit. For patients in whom autoantibodies were measured (n ¼ 379), these data were generally generated using immunoprecipitation and ELISA as previously described [19][20][21], and by line blot for a limited number of cases. Median duration of follow-up was 5.1 (2.6-8.6) years. Clinic visits were recorded approximately twice per year (Supplementary Table S1, available at Rheumatology online).
At all recorded visits (n ¼ 5306), several measures were collected including the physician's global assessment (PGA) of global disease activity on a visual analogue scale and the cutaneous components of the modified DAS for assessing skin disease (comprising Gottron's papules, heliotrope rash, vasculitis and erythema) [22]. The PGA in our analysis is a more subjective score that reflects cutaneous, musculoskeletal and other manifestations of JDM that influence the physician's overall assessment of global disease activity, and may in some cases be influenced by features such as calcinosis that can be attributed to ongoing activity by some physicians, but disease damage by others. Higher levels of disease activity are indicated by higher scores in the PGA (range 0-10) and modified DAS (range 0-5) and lower scores of the Childhood Myositis Assessment Scale (CMAS) (range 0-52). For the modified DAS, when any of Gottron's papules, heliotrope rash, vasculitis or erythema were missing they were treated as absent to avoid the need to impute this variable, similar to a previous analysis [18], although we recognize this approach may underestimate the true modified DAS. Since 11 patients never had a PGA recorded, the total number of patients when PGA was modelled as an outcome was 508.
Data are described using median and first and third quartiles for numeric variables (unless specified otherwise), and counts and percentages for categorical variables.

Data exploration and identification of missing data patterns
The PGA and modified DAS longitudinal trajectories were described using spaghetti and lasagne plots. Follow-up ranged from 0.2 to 23.4 years, with very few Claire T. Deakin et al. observations being available after 10 years. For this reason follow-up was truncated at 10 years since diagnosis, in order to avoid including sparse observations that may be highly influential. This led to the inclusion of complete follow-up for 83.6% of patients (n ¼ 4837 recorded visits) and a similar median length of follow-up of 4.9 (2.1-8.1) years in the truncated dataset.
Since for 13% of the available visits, PGA values were missing, patterns in missing data were explored. Data visualization did not indicate there were any patterns in the frequency of missing PGA over time or a relationship with duration of follow-up. Formal analyses of the pattern of missingness consisted of modelling the probability of missing a PGA observation during follow-up, accounting for clustering (as the same patient may have missing values at different visits) using generalized estimating equations (implemented with an exchangeable correlation matrix and robust standard errors). Potential predictors of missingness were: sex, age at diagnosis, time since diagnosis, baseline PGA and Centre, coded Centre A [n ¼ 241 (46.4%)] vs the rest [n ¼ 278 (53.6%), coded as 'Other']. Baseline PGA and Centre were found to be significantly associated with missingness. A random audit of all clinic forms received in a single month suggested clinical experience may influence whether PGA was missing; however, since allocation of patients to consultants or registrars is random, it was thought not to introduce bias.

GMM analysis
GMMs are models for repeated observations of an outcome that allow individual trajectories to cluster within classes. These classes are unobserved (latent) but are inferred from the data. GMMs can be fitted to continuous outcomes like the PGA and modified DAS, assuming Gaussian or non-Gaussian distributions, and to categorical variables using ordered logistic regression models. We fitted GMMs to either the PGA or modified DAS as outcomes with time since diagnosis (years) having both fixed and random effects, while also allowing for class-specific effects of time (Supplementary Methods, available at Rheumatology online). Since exploratory analysis identified Centre as a predictor of missing PGA over time, a binary variable for Centre (Centre A or Other; Supplementary Table S3, available at Rheumatology online) was included in the GMM analysis of PGA in order to deal with the missing data patterns under the missing at random assumption. Although baseline PGA was also a predictor of missing PGA over time, it did not need to be included as it was already part of the outcome model, and therefore its incompleteness was dealt with by the maximum likelihood estimation, again under the missing at random assumption. Single class models were fitted initially to define the optimal transformations of outcome variables and best function of time, with models selected according to the lowest Bayesian information criterion value (Supplementary Table S4, available at Rheumatology online). A cubic term for time and square-root transformations of PGA and modified DAS gave the lowest Bayesian information criterion values.
Models with two to six classes were subsequently fitted. The optimal number of classes was determined by selecting the model with the lowest Bayesian information criterion and highest entropy (Supplementary Table  S4, available at Rheumatology online), as recommended in [23]. Entropy is an indication of how well individuals have been allocated to each class on a scale of 0-1, with 1 meaning perfect classification, and is calculated using the mean posterior probability for each class. Posterior classification is also reported for each class for each GMM with two to six classes (Supplementary Table S4, available at Rheumatology online). The mean of posterior probabilities for each class is reported for each GMM with two to six classes (Supplementary Table S5, available at Rheumatology online).

Sensitivity analyses
Sensitivity analyses were subsequently performed to identify whether there were differences in the results for males or females and to investigate whether results were influenced by late-entry individuals or by both lateentry individuals and individuals contributing fewer visits or by non-incident individuals (Supplementary Table S6, available at Rheumatology online). Two definitions of non-incident individuals were used: individuals recruited to the JDCBS later than a month since diagnosis (n ¼ 184 excluded; the remaining cohort becoming the 'inception cohort'), and individuals recruited to the JDCBS at least 5 years after diagnosis (n ¼ 29). Individuals contributing few visits were defined as those contributing fewer than three visits (n ¼ 79). These thresholds were selected according to the distributions of time from diagnosis to recruitment (years) and the numbers of visits.

Lasso regression to predict class membership
Baseline predictors of membership of the classes predicted by GMM analysis were identified using Lasso regression, a regression method that involves automatic selection of predictors and regularization of estimated coefficients for improved prediction accuracy [24]. Lasso regression was performed using all clinical features recorded at the first visit as predictors: diagnosis, PGA, CMAS, modified DAS, age at diagnosis, time from diagnosis to first visit, and the presence of any of arthritis, abnormal respiration, calcinosis, lipodystrophy and ulceration. Myositis-specific autoantibodies (MSAs) were not included as a predictor due to missing data. Since 0-34% of data were missing at first-recorded visit for some clinical features, multiple imputation was first performed using baseline clinical features and demographic features to generate 10 imputed datasets.
For Lasso regression, the following steps were performed separately for each of these 10 imputed datasets (Supplementary Methods, available at Rheumatology online). In an initial step, 10-fold cross-validation was run to select the optimal value for the tuning parameter lambda, a parameter that contributes to the shrinkage penalty applied to the regression coefficients. Within each crossvalidation, the area under the curve was used to select the value of lambda that minimized the cross-validation error. These lambda values were then used to fit the regression models and estimate the coefficients for the predictors retained across each imputed dataset.
To account for uncertainty in class assignment during the GMM stage of the analysis, observations were weighted by the probability of class membership for each individual patient, as predicted by the GMM. Where significant predictors were identified by the Lasso regression, the model was re-run with interaction terms between those predictors. Finally, the estimates for predictors with non-zero coefficients were pooled from each imputed dataset. Standard errors are not reported as these are unreliable for penalized regression [25]. Lasso regression analysis was repeated using deviance and misclassification error as the loss functions to select the value of lambda that minimized crossvalidation error. The same predictors were identified using these loss functions, and similar estimates of effect size were obtained.

Software
All analyses were performed using R version 3.4.4 [26]. GMM analyses were performed using the lcmm() function in the 'lcmm' package [27]. Lasso regression analyses were performed using the 'glmnet' package [28]. Code for GMM and Lasso regression analyses are given in Supplementary Methods, available at Rheumatology online. Multiple imputation was performed using the 'mice' package [29]. Generalized estimating equations were fitted using the geeglm() function in the 'geepack' package [30]. Lasagne plots were generated using the 'longCatEDA' package [31].

Demographic, clinical and serological features of whole JDCBS cohort
For the whole JDCBS cohort (n ¼ 519), most patients were female, white and had a diagnosis of definite JDM (Table 1). Almost half of the cohort were recruited at Centre A, with remaining patients recruited at 15 other centres (Supplementary Table S2, available at Rheumatology online). Median age at diagnosis was 7.7 years and median first-recorded PGA and modified DAS were 3.0 and 3, respectively. The most prevalent MSAs were anti-TIF1c (16.9%) and anti-NXP2 (15.6%).
Median (range) follow-up was 5.1 (0.2-23.4) years before follow-up time was truncated at 10 years, and 4.9 (2.6-8.1) years after truncation. Median time from diagnosis to first-recorded visit was 0.2 (0.1-1.1) years. Visual representation of patients' longitudinal trajectories for up to the first 10 years post-diagnosis for both PGA and modified DAS for skin disease indicated that disease activity on average reduced to lower levels over time, although patterns were heterogeneous between patients (Fig. 1).

Two-class GMM for PGA
To separate this heterogeneity into groups with more homogeneous patterns in disease trajectory, GMM analysis was applied. In the model fitted for PGA (n ¼ 508), the two-class solution was preferred ( Fig. 2A,  Supplementary Fig. S1A and B, available at Rheumatology online). Most patients were predicted to belong in class 1 (n ¼ 450, 89%). This group had slightly lower levels of global disease activity at diagnosis, which steadily decreased over time ( Fig. 2A and B). By contrast, a small proportion of patients were predicted to belong in class 2 (n ¼ 58, 11%), with their trajectories starting with slightly higher PGA levels and displaying more fluctuations over time ( Fig. 2B and C).
Similar proportions of males and females and the major ethnic groups were represented in the two classes ( Table 2). Higher proportions of patients in class 2 were diagnosed at recruitment with JDM overlap with scleroderma or other idiopathic inflammatory myopathy. Ages at diagnosis and symptom onset were similar across classes. Although baseline PGA was higher in patients in class 2, baseline modified DAS scores were similar. Although in general MSA proportions were similar across classes, a higher proportion of patients in class 2 had anti-NXP2 autoantibodies and a higher proportion of patients in class 1 had no-detectable autoantibody.
Three-class GMM for skin modified DAS A GMM was also fitted for skin modified DAS (n ¼ 519), with a three-class solution being preferred (Fig. 3A, Supplementary Fig. S1C and D, available at Rheumatology online). Patients who were predicted as belonging to class 1 (n ¼ 81 16%) had high levels of skin disease activity at diagnosis, which on average reduced slowly over time (Fig. 3B). Patients in class 2 (n ¼ 63, 12%), on the other hand, also had high levels of skin disease activity at diagnosis, which tended to remain high over time (Fig. 3C). Most patients were predicted to belong in class 3 (n ¼ 375, 72%), with lower levels of skin disease activity at diagnosis that reduced more quickly over time (Fig. 3D). Class 2 had slightly higher proportions of patients who were female, white and who were diagnosed at recruitment with JDM overlap with scleroderma ( Table 2). Ages at diagnosis and symptom onset, and baseline scores for PGA, CMAS and modified DAS were similar across all three classes. Similar to the GMM fitted for PGA, proportions of MSA were similar across classes, but class 1 had a higher proportion of patients with anti-TIF1c autoantibodies.
Cross-tabulation of the two-class GMM for global disease activity with the three-class GMM for skin disease activity indicated a higher proportion (19%) of patients in class 2 of the GMM fitted to PGA were in class 2 of the GMM fitted to modified DAS than the in the other classes (respectively 16% and 10%, bottom of Table 2). Thus the closer association between these two classes indicates a joint ongoing disease activity for both PGA and modified DAS.

Sensitivity analyses
Sensitivity analyses for the results obtained on both outcomes indicated that coefficients, the overall shape of predicted trajectories and class proportions did not change substantially in models fitted with females only, males only, restriction to the inception cohort and exclusion of late-entry individuals, and exclusion of both lateentry individuals and individuals contributing few visits (Supplementary Table S5, available at Rheumatology online).

Predictors of PGA classes
Predictive analyses of class membership were performed using the classification obtained from the GMM for PGA, as a measure of disease activity that captures multiple features of this multi-system disease. Indeed, the GMM for global disease activity had a higher entropy than the GMM for skin disease activity, and hence better classification of patients. Baseline predictors of membership in class 2 of the GMM for PGA were identified using Lasso regression, a method that involves automatic selection of predictors and regularization of coefficients, as described above. Baseline modified DAS, CMAS, abnormal respiration, lipodystrophy and time from diagnosis to first-recorded visit were identified as significant predictors of class 2 membership, with mutually adjusted estimated odds ratios of 0.96, 0.99, 1.91, 1.92 and 1.32, respectively. The modification of the association with abnormal respiration due to lipodystrophy corresponding to these interactions is visualized in Fig. 4. None of these baseline variables were retained by the Lasso when this analysis was performed to identify possible predictors of class membership for the GMM for modified DAS.

Discussion
This study represents the largest GMM analysis in JDM (n ¼ 519) with this length of follow-up, and the first of a nationwide multicentre study. Our model of global disease activity identified two different trajectories of disease activity over time. While most patients (89%) had lower levels of disease activity that improved over time, a subset of patients (11%) had higher levels of baseline disease activity and tended to have more ongoing severe disease. Baseline variables that predicted whether patients belonged to this latter class identified by GMM were abnormal respiration, lipodystrophy and time since diagnosis. As a secondary analysis, trajectories of skin disease activity were modelled using GMM. While most patients (72%) had lower levels of skin disease that improved over time, smaller groups had high levels of skin disease that improved over time (16%), and high levels of skin disease that remained high over time (12%). A higher proportion of patients in class 2 of the GMM fitted to PGA, which had more ongoing severe disease, were in class 2 of the GMM fitted to modified DAS, which was similarly characterized by ongoing skin disease activity. Sensitivity analyses performed on subsets of the cohort participants did not indicate any systematic bias induced by the inclusions of possibly heterogeneous individuals. Separate analyses by sex also did not indicate any strong dissimilarities in the trajectories or class probabilities.
GMM analysis may represent a useful approach for defining sub-phenotypes of JDM, to complement current approaches for defining sub-phenotypes by MSAs, by focusing on disease outcomes. Interestingly, the major MSAs were distributed similarly across the GMMdefined classes, which could suggest the typologies identified by these models are different from the clinical features that characterized MSA-defined sub-phenotypes. Sub-phenotypes defined by GMM are driven by data on outcomes. Furthermore, the potential to weight individuals by the probability of class membership in subsequent analyses allows some accommodation of uncertainty in class allocation. A strength of GMM and other latent class methods, in contrast to alternative clustering approaches, is that they are model-based, with objective criteria for selecting models. Our findings complement the work reported by Lim et al. [15] that identified three classes based on modelling the total DAS across over 40 months. In that analysis (n ¼ 95) there were two larger classes, one with lower levels of disease activity at baseline that reduced slightly (42%), one with higher levels of disease activity  The figure shows the predicted probabilities for each combination of modified DAS, lipodystrophy and abnormal respiration. Median baseline values for CMAS and time since diagnosis were used to calculate predicted probabilities. The table on the right reports the estimated ORs in the final model. Standard errors are not reported as these are unreliable for penalized regression [25]. The predicted probabilities for lipodystrophy only obscure those for abnormal respiration only, due to similar estimated ORs for these two predictors. CMAS: Childhood Myositis Assessment Scale; OR: odds ratio.
Identification and prediction of novel classes of long-term disease trajectories for patients with JDM https://academic.oup.com/rheumatology that reduced to the lowest levels of disease activity (55%), and a much smaller third class with the highest levels of baseline disease activity that remain high over time (3%). Interestingly, our work on an independent cohort is consistent with this previous study, with the larger class resulting from the GMM analysis of PGA (class 1) paralleling the two larger classes in the Canadian analysis combined, as the classes that show improvement over time. It may be that this difference in number of classes between the studies reflects greater numbers of patients included, a longer duration of follow-up and use of a different (but related) outcome measure. In their analysis, the total DAS is a standardized score based on three specific cutaneous and three specific musculoskeletal findings [32], whereas the PGA in our analysis is a more subjective score that reflects cutaneous, musculoskeletal and other manifestations of JDM that influence the physician's overall assessment of global disease activity, and may in some cases be influenced by features such as calcinosis that can be attributed to ongoing activity by some physicians, but disease damage by others. This study has limitations. While n ¼ 519 is a large sample size for a rare disease, our findings need to be validated independently before they can be generalized to the general population of patients with JDM. Nonetheless, in this regard the similarities with the independent Canadian study are encouraging. A further limitation arises from missing data. Although we sought to account for missing data patterns, we may not have achieved this perfectly. While it would have been useful to identify groups with similar muscle disease activities over time, there were too many missing CMAS values to do so. Although 96% of our autoantibody data were generated using the same method, inconsistent testing methods are a challenge for combining data [33,34]. Future work could investigate how medication exposure both influences and is influenced by disease trajectories. While underlying differences in patient-mix were adjusted for in our GMM for PGA by including an indicator for Centre A (which contributed to 46% of patients), future work might investigate whether any of these differences might affect disease trajectories.
In summary, we have shown that GMM analysis can identify sub-phenotypes of JDM based on longitudinal disease outcomes, including global disease activity. Future work may identify biological differences in these classes, which could form the basis for a stratified approach to treatment in JDM.

Supplementary data
Supplementary data are available at Rheumatology online.