Precision Phenotyping of Dilated Cardiomyopathy Using Multidimensional Data

BACKGROUND Dilated cardiomyopathy (DCM) is a ﬁ nal common manifestation of heterogenous etiologies. Adverse outcomes highlight the need for disease strati ﬁ cation beyond ejection fraction. OBJECTIVES The purpose of this study was to identify novel, reproducible subphenotypes of DCM using multiparametric data for improved patient strati ﬁ cation. METHODS Longitudinal, observational UK-derivation (n ¼ 426; median age 54 years; 67% men) and Dutch-validation (n ¼ 239; median age 56 years; 64% men) cohorts of DCM patients (enrolled 2009-2016) with clinical, genetic, cardiovascular magnetic resonance, and proteomic assessments. Machine learning with pro ﬁ le regression identi ﬁ ed novel disease subtypes. Penalized multinomial logistic regression was used for validation. Nested Cox models compared novel groupings to conventional risk measures. Primary composite outcome was cardiovascular death, heart failure, or arrhythmia events (median follow-up 4 years). high sensitivity troponin I. This suggests that IL4RA is a novel prognostic marker for dilated cardiomyopathy. IL4RA ¼ interleukin 4 receptor alpha; NT-proBNP ¼ N-terminal pro-B-type natriuretic peptide.

Despite considerable improvements in disease classification and characterization, [1][2][3] DCM is associated with an average 5-year mortality of about 20%. 4,5 A key unmet need in DCM is to define the underlying disease with greater precision to either target existing therapies more effectively or to identify distinct pathophysiological mechanisms that may be amenable to novel therapies targeted to the patient subgroups most likely to benefit.
Parallel to this issue, for clinicians there is an unmet need to understand how to utilize the growing volume and complexity of clinical data to guide patient care. These include the increasing availability of genetic profiling, advanced imaging, and proteomic data. Novel approaches to data science such as machine learning may help, but have been hindered by studies with poor reproducibility and no validation. [6][7][8][9] In this study, we applied machine-learning approaches to multiparametric phenotyping to define and validate novel prognostically relevant DCM disease subtypes that could facilitate stratified therapy.
This approach harnesses a breadth of clinical, imaging, proteomic, and genetic data and makes it clinically accessible and relevant to improve disease characterization and patient stratification. We hypothesized that machine learning approaches would identify unique groupings-or clusters-of patients with DCM with characteristic patterns of risk factors, cardiac manifestations, and outcomes. We further validated our findings in a separate, distinct patient population to explore the portability of our findings. All patients provided written informed consent. The study was approved by the regional ethics committee   in the ggRandomForest package. 16 Cox proportional hazard modelling was used to evaluate a novel prognostic biomarker.
All statistical analyses were conducted in the R environment (version 3.3.1). An overview of the analysis pipeline is provided in Figure 1.   Machine learning approaches were applied to multiparametric data (clinical, imaging, genetics, biomarkers) from a prospectively recruited UK derivation cohort of patients with dilated cardiomyopathy (DCM) and identified 3 novel reproducible subtypes of disease: mild nonfibrotic, profibrotic metabolic, and biventricular impairment. Multinomial logistic regression was used to create a model to place patients in the independent Dutch validation cohort into corresponding subtypes.

RESULTS
Composite survival differed between novel subtypes in both the derivation and validation cohorts. CMR ¼ cardiovascular magnetic resonance;  Table 2). Key differences between groups are also highlighted in the Supplemental Appendix (Supplemental Table 2, Supplemental Figure 1). There were a number of differences between the groups in terms of fibrosis, metabolic state, and arrhythmia.    Table 6).
Random survival forests algorithm was used to investigate the importance of these protein  Table 2).

DISCUSSION
DCM is a phenotypically homogenous condition with a highly heterogenous etiology that is not currently used to guide management. In this multicenter international study using multiparametric phenomapping, a machine learning approach was used to identify patterns of mechanistically distinct DCM subgroups from familiar clinical and biological data (Central Illustration). Three distinct subgroups of DCM were identified: 1) a mild, nonfibrotic subtype; 2) a novel profibrotic metabolic subtype; and 3) a biventricular impairment subtype. These subtypes were informative for patient stratification and prognosis beyond traditional markers and were reproducible in an independent validation cohort. This complex multiparametric data could be captured in a validation model of 5 parameters that could be easily clinically applicable. These findings may facilitate more targeted approaches to an increasingly diverse repertoire of heart failure therapies and provide an opportunity to identify and better protect DCM patient subgroups at increased risk of mortality and morbidity. The identified subgroups could not be determined with current methods of disease classification. That these novel subgroups were informative for prognosis beyond conventional risk models  The novel disease subtypes in the validation cohort also vary by adverse event risk. Composite survival consists of major arrhythmic events, major heart failure events, or cardiovascular mortality. P value is computed by the log-rank test. The mild, nonfibrotic subtype 1 was characterized by more asymptomatic milder disease, absence of myocardial fibrosis, right ventricular involvement, or atrial enlargement, and less biomarker derangement.
The differential characteristics of myocardial fibrosis and prognosis between the subtypes is striking and highlights the potential of these groupings for clinical impact.
It is well established that midwall myocardial fibrosis is a strong prognostic indicator in DCM, 4,18 and it would be reasonable to expect that the pro- In this study, the presence of genetic DCM did not stratify between groups and did not seem to affect   prognosis. Titin gene truncating variants, which are not known to confer additional prognostic risk, 11,19,20 were the most common genetic abnormality.  Patients were predominantly of European descent.
Further work on more diverse cohorts should investigate the effect of ethnicity on phenogroup stratification. We elected not to include medications in the baseline model, because they reflect treatment decisions that are subject to factors other than the disease itself, such as provider bias, patient tolerance, patient preference, and renal function. 22,23 Moreover, many patients were studied within days or weeks of diagnosis at a time when their treatment was rapidly changing. In addition, this study describes mediumterm outcomes. Longer-term follow-up is planned to determine the ongoing prognostic implications of these novel subtypes.

CONCLUSIONS
Machine learning approaches using complex multiomics data in DCM robustly and reproducibly improved disease characterization and patient stratification. Reproducible subtypes of DCM were identified and were associated with distinct characteristics and clinical outcomes, which may reflect different underlying pathologies. In the drive toward personalized medicine, the subtypes identified in this study may facilitate more targeted approaches to an increasingly diverse repertoire of heart failure therapies.