Clustering of clinical and echocardiographic phenotypes of covid-19 patients

We sought to divide COVID-19 patients into distinct phenotypical subgroups using echocardiography and clinical markers to elucidate the pathogenesis of the disease and its heterogeneous cardiac involvement. A total of 506 consecutive patients hospitalized with COVID-19 infection underwent complete evaluation, including echocardiography, at admission. A k-prototypes algorithm applied to patients' clinical and imaging data at admission partitioned the patients into four phenotypical clusters: Clusters 0 and 1 were younger and healthier, 2 and 3 were older with worse cardiac indexes, and clusters 1 and 3 had a stronger inflammatory response. The clusters manifested very distinct survival patterns (C-index for the Cox proportional hazard model 0.77), with survival best for cluster 0, intermediate for 1–2 and worst for 3. Interestingly, cluster 1 showed a harsher disease course than cluster 2 but with similar survival. Clusters obtained with echocardiography were more predictive of mortality than clusters obtained without echocardiography. Additionally, several echocardiography variables (E′ lat, E′ sept, E/e average) showed high discriminative power among the clusters. The results suggested that older infected males have a higher chance to deteriorate than older infected females. In conclusion, COVID-19 manifests differently for distinctive clusters of patients. These clusters reflect different disease manifestations and prognoses. Although including echocardiography improved the predictive power, its marginal contribution over clustering using clinical parameters only does not justify the burden of echocardiography data collection.


Yeo
We used Yeo-Johnson transform and Iterative Imputer (see computational methods) as implemented in Scikit learn 31 .
Imputation of categorical variable-we imputed categorical variables in a relatively naïve approach of using the most frequent value. We also tested iterative imputation, which is a more advanced method, implemented in Scikit learn 2 . The clustering results obtained when applying the two imputation methods were very similar (Rand index 0.97 ± 0.01, adjusted Rand index of 0.92 ± 0.03), and we therefore decided to use the most frequent value for imputation. To choose the number of clusters , we ran the K-Prototypes algorithm for different values of and computed the silhouette score 5 for each solution. We used K-Prototypes 6 and silhouette score 2 implementations. We clustered the patients based on the continuous variables only since they differed more significantly among subgroups (Table S8) and used the Euclidian distance. The results are shown in Figure S1.
We can see that solutions with 6 or more clusters give scores close to 0 or below, suggesting insignificant clusters. For four clusters the silhouette score was 0.052, and for five it was 0.030.
For 2 ≤ ≤ 5 when using the "elbow method" 7, = 4 had a higher score than 3 and 5 ( Figure S2). A solution with = 2 had an even higher score, but we rejected that option since two clusters did not provide sufficient resolution and biomedical insights. Hence, we chose 4 as the final number of clusters. As an additional support for preferring four over three clusters, we focused on Clusters 1 and 2 in the solution with = 4. The subgroups of patients in these two clusters had very similar survival curves (see Figure 3). We tested how distinct these two clusters are in terms of their other clinical parameters, and concluded that they were very distinct (Supplement 5). A solution with < 4 does not show this distinction.

Supplementary 3 Testing the Parameters of Consensus Clustering and K-Prototype
To choose the sampling rate in consensus clustering, and the relative weight assigned to categorial variables in K-prototypes, we ran the two algorithms for multiple combinations of parameters. For each combination, K-Prototypes was run 50 times with the consensus procedure. We used consensus clustering implementation 8 . Fig. S3 shows the average silhouette score for different values of and . We can see that results for = 0.6 were inferior, and for = 3 − 5, the solutions with = 1, 0.85, 0.75 had similar scores. For ≥ 6 the score for = 1 was slightly higher. (In case = 1 there is no sub-sampling, and minor differences between solutions are due to random initialization and cluster assignments in case of ties).
We wanted to make sure that the choice of guarantees that both the numerical and the categorical variables influence the clustering results. The default value used in the K-Prototype algorithm is half the average standard deviation of the numerical values. As we worked with normalized data, the default was 0.5. In tables S1-S12 we see the changes in the composition of the clusters in comparison to the clustering solution with the default . We can see that for < 3 the clusters do not change much, as also  Tables S1-S6. The composition of the clusters for different values of compared to the default = 0.5. All runs used = 1. The rows refer to the same four clusters obtained with the default , and the columns show clusters obtained for different values of , ordered for convenience so that the diagonal entries are maximal. Entries are the size of the intersection between the row and column clusters. For example, in table 1 a single patient moved from cluster 0 to 1, and in table 6 ten patients moved from cluster 1 to cluster 3. The Rand index quantifies the similarity between the two solutions, with 1 indicating a perfect match and 0 being the least possible. All runs used = 1.  Table S13: statistics for variables that were considered as input for the clustering. Table S14 shows parameters that were excluded from that process as they reflect data not available at the time of admission and initial tests.  Table S13. Statistics per cluster of all variables that were included in the input for the clustering algorithm. P-values were computed using ANOVA for continuous variables and Chi2 for categorical variables, and FDR corrected for multiple testing. All variables refer to the first measurement taken at or after admission, unless otherwise noted. In yellow: Continuous variables, mean±SD in each cluster. In green: categorical variables, percentage (fraction of patients) for each cluster. *Echocardiography variables.   Fig. S4 shows the top ASMD scored variables when comparing only these two clusters. They include variables that are directly related to covid-19 like CRP and findings in chest X-ray, alongside age and age-related variables like past diseases (hypertension, dementia, IHD or CHF) and arriving from nursing home. Table S15 lists the top 20 variables with the lowest p-value for the null assumption that they do not vary between the two clusters, with similar results to ASMD. Figure S4. Variables with the highest ASMD scores when comparing clusters 1 and 2. In green -echocardiography variables.  We tested the change in echocardiography measurements over time, using data of a second echocardiography that 48 of the patients underwent. We chose nine variables of interest and performed Kruskal-Wallis test on the differences between the second and the first echocardiography results in each cluster. We chose Kruskal-Wallis since the sample sizes were low. The p-values were corrected for multiple testing with FDR. The results are summarized in Table S16. While some trends can be observed between clusters, due to the small sample size the p-values are high. The values are the mean change between the second and the first echocardiography measurements, for patients who underwent two or more measurements during their hospitalization. Cluster 0 was excluded as it had only one patient with a second echocardiography. #: number of patients in each cluster. In green are the right ventricle parameters, and in yellow the left ventricle's.

Supplementary 7: Evaluating Sex Differences
The clusters suggested that older females may have better chance to experience only a mild disease.
Therefore, we performed an analysis of the outcomes for patients aged 80+, comparing males and females. Males received significantly higher rates of respiratory support. They also had higher rates in all tested outcomes, but the differences were not significant (Table S17).
Table S17: Outcome analysis for males and females aged 80 and above.

Supplementary 8: Treatments
The data was gathered during the first months of the pandemic (March to September 2020), and before any targeted anti-viral effective treatments for Covid-19 were introduced to clinical practice. 37 patients were treated with Systemic corticosteroids and 24 patients were on Clexane. The medications do not add new insights about the clusters.
Clexane was admitted to 2% of patients in cluster 0 and 6% for each of the other clusters. Systemic corticosteroids were admitted to 1% of patients in cluster 0, 9% in cluster 1, 4% in cluster 2 and 15% in cluster 3, which aligns with the higher rates of inflammation in clusters 1 and 3.  Table S17. Outcomes analysis for males and females aged 80 and above. P-values were computed using t-test for continuous and Chi2 for categorical variables, and FDR corrected for multiple testing. Outcomes that were present in five patients or less were not considered. In yellow: Continuous variables, mean±SD in each cluster. In green: categorical variables, percentage (num of patients) for group.