Identifying longitudinal clusters of multimorbidity in an urban setting: A population-based cross-sectional study

Background Globally, there is increasing research on clusters of multimorbidity, but few studies have investigated multimorbidity in urban contexts characterised by a young, multi-ethnic, deprived populations. This study identified clusters of associative multimorbidity in an urban setting. Methods This is a population-based retrospective cross-sectional study using electronic health records of all adults aged 18 years and over, registered between April 2005 to May 2020 in general practices in one inner London borough. Multiple correspondence analysis and cluster analysis was used to identify groups of multimorbidity from 32 long-term conditions (LTCs). Results The population included 41 general practices with 826,936 patients registered between 2005 and 2020, with mean age 40 (SD15·6) years. The prevalence of multimorbidity was 21% (n = 174,881), with the median number of conditions being three and increasing with age. Analysis identified five consistent LTC clusters: 1) anxiety and depression (Ratio of within- to between- sum of squares (WSS/BSS <0·01 to <0·01); 2) heart failure, atrial fibrillation, chronic kidney disease (CKD), chronic heart disease (CHD), stroke/transient ischaemic attack (TIA), peripheral arterial disease (PAD), dementia and osteoporosis (WSS/BSS 0·09 to 0·12); 3) osteoarthritis, cancer, chronic pain, hypertension and diabetes (0·05 to 0·06); 4) chronic liver disease and viral hepatitis (WSS/BSS 0·02 to 0·03); 5) substance dependency, alcohol dependency and HIV (WSS/BSS 0·37 to 0·55). Interpretation Mental health problems, pain, and at-risk behaviours leading to cardiovascular diseases are the important clusters identified in this young, urban population. Funding Impact on Urban Health, United Kingdom.


Introduction
Multimorbidity, the co-occurrence of two or more diseases, is a growing global health challenge [1]. Multimorbidity increases in prevalence with age, with up to 95% of people aged 65 years and older showing clinical features [2]. Multimorbidity is strongly associated with social and material deprivation, and is a major driver of health care utilisation and mortality in deprived populations [3]. Recent research has focused on identifying the types of long term conditions (LTC) that cluster together in multimorbid patients [4]. This approach aims to identify which LTCs co-occur together more frequently to uncover new mechanisms of disease, offering clinicians the ability to develop multi-disease clinical strategies, avoiding conflicting treatment regimens and potential adverse drug effects and drug-drug interactions associated with polypharmacy [4][5][6].
Despite these advances, the concept and definition of multimorbidity remain elusive. The original concept of multimorbidity focused on the multiplicity of diseases but with little agreement on the set of diseases that were eligible for inclusion [7]. Advances in molecular medicine are revealing that our present understanding of nosology may be flawed because one molecular defect can result in several diseases and one disease may be associated with diverse molecular pathology [8]. The concept of multimorbidity has been extended to include not just diseases but 'health conditions' [1] including risk factors like hypertension, symptoms such as chronic pain, and measures of mental wellbeing including anxiety and depression [9]. Multimorbidity can be viewed as representing the intersection of multiple dimensions of poor health. The concept of intersectionality in health is commonly applied to the multiple overlapping social determinants that impact on the health of deprived and excluded populations [10], but the notion of intersectionality may be readily applicable in multimorbidity research. From a public health perspective, this approach to multimorbidity will contribute to understanding the social and environmental determinants of health and disease burden.
Most studies have identified multimorbidity as a manifestation of ageing, with age-associated frailty and multimorbidity being closely related concepts [11]. Urban populations, which account for 55% of the world population [12], are typically youthful. In Inner London, only 6¢8% of the resident population is aged 65 years and over compared with 17¢7% for England as a whole [13]. Urban environments are generally characterised by deprivation across multiple domains, a high proportion from ethnic minority and migrant populations, and by reduced life expectancy and life satisfaction compared with national comparators [13]. This research aimed to evaluate how multiple morbidity is expressed in an urban environment.
An initial approach to studying patterns of LTC combinations is to determine which conditions commonly co-occur together [9,[14][15][16]. However, this approach tends to emphasise the most prevalent conditions, like hypertension, which are members of most high-frequency disease combinations [15]. It is more informative to view disease patterns in terms of relative frequencies through the evaluation of 'associative multimorbidity' [5]. Two systematic reviews [4,5] revealed highly heterogeneous clusters of multimorbidity resulting, not only from the differing demographic characteristics of the sample populations, but also due to analytical clustering techniques employed. The aim of this study is to identify LTCs which tend to cooccur, in an inner-city primary care setting. Our objective was to find groups of conditions that are as correlated as possible among themselves and with as little correlation as possible with other groups in the data using Multiple correspondence analysis (MCA), a statistical technique to analyse clustering of multimorbidity [17][18][19].

Study design, setting and participants
The study was set in an inner-city borough in south London with a deprived, multi-ethnic, youthful population. In the UK, about 98% of the population is registered with a general practice. The population sample consisted of all patients registered at general practices in the borough (n = 41), except for patients (3¢2%) who had opted out of anonymised data sharing for research. Anonymised coded data on all eligible patients aged 18 years and over between 1/4/2005 to 1/5/ 2020 were extracted from electronic health records (EHRs) held in primary care. The study is a retrospective cross-sectional on three continuous time periods from 2005 to 2020. For the purposes of clustering we did not consider the order of the conditions or the follow-up time, only which conditions occur together during each individual's period of registration. The proposal for the analysis of fully anonymised data was approved the by Lambeth Clinical Commissioning Group. Separate ethical committee approval was not required (Health Research Authority, personal correspondence) since all data were fully anonymised for the purposes of research access, and all patient identifiable data had been removed.

Data variables and measurement
Multimorbidity in this study is defined as the co-occurrence of two or more out of 32 LTCs, adapted from previous studies (Table S1) [16]. Nineteen LTCs, and the risk factors smoking, hypertension and obesity, were defined by the Quality and Outcomes Framework (QOF) [20]. The data codes and recording of these conditions are standardised nationally and recording rates are incentivised and therefore high. The rest were selected based on the importance within the urban, multi-ethnic community of our population sample. For these conditions we searched extensively for all possible Read (the clinical coding system used in UK general practice to record patient findings and procedures in health-care IT systems) and SNOMED codes that could represent each of these conditions, using the CPRD GOLD Codes List [21] as a starting point. Some conditions were defined by prescribing characteristics (such as 'chronic pain', defined by prescriptions of opioids); some were unlikely to result in medication (such as learning disability or morbid obesity). Patients were considered to have a LTC or multimorbidity if there is a record at any point in their adult life, and were included in analyses.
Demographic data consisted of gender, age in years at last known follow-up, and self-assigned ethnicity. Social deprivation data derived from participant postal code of residence were based on the Index of Multiple Deprivation (IMD) 2019 [22] classification at lower super output area, divided into quintiles based on the national distribution. The IMD is based on seven domains of deprivation including income, employment, education, health, crime, housing, and quality of living environment. Clinical data at last known follow-up included: the number of medications prescribed based on British National Formulary (BNF) sub-heading, risk of hospital admission in the next 12 months (based on the QAdmissions score > 20 [23]), and six risk factors based on whether a person was ever exposed: hypertension; moderate obesity (BMI 30¢0À39¢9 kg/m 2 ), high cholesterol (total All identified studies used a cross-sectional design, with heterogeneity in the techniques used. Only the latter review included a study within the United Kingdom and this study used a small sample of hospitalized patients aged over 85 years. Recently, Zhu et al. (2020) clustered multimorbid adults in the UK whose diagnoses were defined in 2012. These clusters were based on individuals and not diseases. Relationships were found between psychoactive substance and alcohol misuse in those aged 18À64; coronary heart disease, depression and pain (aged 65À84); and coronary heart disease, heart failure and atrial fibrillation (aged 85+).

Added value of this study
To our knowledge, this study is the first in the UK to examine the non-random associations between diseases in a young, population-based cohort containing a high percentage of Black, Asian, and other ethnic minority groups. In addition, we compared clusters between different cohorts over time and found a high degree of similarity. We address the call of Whitty and Watt (2020) for a more generalized approach in the mapping of clusters.

Implications of all the available evidence
The links seen in previous studies between cardiometabolic diseases and chronic pain; cardiovascular diseases and dementia for older populations are supported in this study. In addition, mental health problems, risk factors and risky behaviours are the main concerns identified in a younger, multi-ethnic population.

Statistical analysis
Sociodemographic, risk factor and clinical data were summarized for the multimorbid population using means and standard deviations, median and inter-quartile range (IQR), or counts and percentages as appropriate. Missing data were kept as missing.
MCA was carried out on the dataset where each condition was coded as present or absent. All individuals with multimorbidity were used as rows and binary LTCs as variables to determine principal dimensions. Age, gender, and number of conditions were used as supplementary variables (i.e. not used to determine principal dimensions, but with their coordinates plotted along with the LTCs). The number of dimensions considered for retention was based on the elbow of the scree plot.
The presence of each condition was mapped on a biplot, with the positions of the points on the map indicating positive association between conditions when they are close together or negative association when they are in opposite ends of the plot. To characterise each dimension, statistical parameters were calculated including the contribution of each LTC to the dimension and the representativeness of the dimension to the LTC using squared cosines (Table S2). These parameters were used to aid visual interpretation of the data, and to determine which conditions are redundant (do not contribute to patterns in the data) or highly relevant.
Variable coordinates derived from MCA were used to perform hierarchical clustering, using Ward's minimum variance method [24], to determine groups of co-morbid conditions. Overall, the number of clusters was determined based on the sum of squared errors, with the ratio of within sums of squares to between sums of squares presented to evaluate the distances of the clusters to each other. People were then assigned to clusters based on the proportion of their conditions that belong to a cluster (i.e. if more than 50% of a person's conditions belong to a particular cluster, then that person is deemed to belong to that cluster). Sociodemographic, risk factor and clinical data were summarised for each cluster.

Sensitivity analyses
The results of MCA and clustering were compared with those using Exploratory Factor Analysis on tetrachoric correlations using the principal axis factoring method. Due to the low prevalence of some of the LTCs, a frequency adjustment was applied that increased cells with a zero count to one-half. Factor dimensions were considered if each factor loading had at least two scores larger than 0¢3 and the Heywood phenomenon was not observed. A second analyses was also performed to cluster all 32 conditions, not just the ones deemed well represented by MCA.
R version 4.0.2 and STATA were used for all analyses. This study is reported following STROBE guidelines for observational studies.

Role of the funding source
The funder had no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the paper for publication.

Participants
The 41 practices participating in the study provided care to a population of 826,936 unique individuals aged 18 years and over, between 2005 and 2020; mean age 40 years (SD 15¢6), 52% female, 54% white ethnicity, and 64% resident in socially deprived areas (two most deprived quintiles). Forty one percent of registered patients had at least one LTC and 21% (n = 174,881) had multimorbidity. Multimorbidity was more frequent among women (23%) than men (20%; x 2 p < 0.01). The number of conditions increased progressively with age, with those aged 80+ having a median of 5 (IQR=3) conditions compared with a median of 2 (IQR=1) conditions in those age 18À39 years.

Descriptive data
The characteristics of the multimorbid population are summarized in Table 1, stratified into three cohorts according to the year of last known follow-up (note that currently registered patients will belong in the most recent cohort). The most recent cohort has a higher prevalence of multimorbidity at 25% compared to 15À16% in the previous cohorts. All characteristics, apart from the proportion of substance use, change over time. The greatest changes were observed in the age and ethnicity structure: the 2016À2020 cohort has a younger age structure compared to those in 2005À2010, while Black and Asian ethnic groups feature prominently. Polypharmacy in the recent cohort decreases, while raised cholesterol and moderate obesity increases.

Main results
MCA retained 9¢3% variance in the first dimension, and 4¢8À5¢0% in the second dimension (Fig. 3). The elbow of the scree plots (Fig. S1) suggested the retention of two dimensions; examination of dimensions three to 32 revealed only a small amount of information retained (4¢2% to 1¢6% of explained variance), therefore only the first two dimensions were considered further. Fig. 3 presents the positions of LTCs in the first and second dimensions and the relationships between these conditions to supplementary variables and to each other. Similar patterns can be seen across time cohorts. The first dimension differentiates LTCs across age and number of conditions: anxiety, depression, substance dependency, viral hepatitis, HIV, and alcohol dependency are associated with a younger age and lower number of comorbidity (those with 2 LTCs), while stroke/transient ischaemic attack (TIA), chronic kidney disease (CKD), chronic heart disease (CHD), osteoporosis, PAD, atrial fibrillation, and heart failure are associated with the oldest age group. Conditions with a score close to one in the first dimension (i.e. COPD, diabetes, hypertension, osteoarthritis) are associated with high multimorbidity (4 LTCs), the 60À79 age group, and to Black and Asian ethnic groups. Gender and IMD score are closer to the centre of the plot, indicating a lower ability to discriminate between conditions.
Hierarchical cluster analysis was carried out on variables that are well represented by the first two dimensions of MCA (Table S1), and the individual clusters are identified in Fig. 3. The number of clusters was determined based on the loss of inertia and thus differs slightly across cohorts. However, the conditions which consistently occur together in the same cluster across cohorts are: A) anxiety and depression (the "mental health" cluster); (Ratio of Within Sums of Squares(WSS)/Between Sums of Squares (BSS) ranged from <0¢01 in 2005À10 to <0¢01 in 2016À20) B) heart failure, atrial fibrillation, CKD, CHD, stroke/TIA, PAD, dementia, and osteoporosis (the "cardiovascular" cluster); (WSS/ BSS = 0¢09À0¢12) C) osteoarthritis, cancer, chronic pain, hypertension, and diabetes (the "pain" cluster); (WSS/BSS = 0¢05À0¢06) D) chronic liver disease and viral hepatitis (the "liver disease" cluster); (WSS/BSS = 0¢02À0¢03) E) substance and alcohol dependency and HIV (the "dependence" cluster); (WSS/BSS = 0¢37À0¢55) Some differences between cohorts include the appearance of epilepsy in cluster E (2005À10 cohort), and the appearance of severe mental health in cluster E (2016À20 cohort). A large difference is the combining of the mental health cluster A with chronic pain and the high prevalence cluster C in the 2016À20 cohort. Table 2 presents frequencies for each condition at different levels of multimorbidity, across all patients. Cluster A shows a low multimorbidity burden À for example, of those diagnosed with anxiety (n = 80,284), 41% have only one other condition and 32% have 3 or more additional conditions. In contrast, conditions in Cluster B have a high multimorbidity burden À 86% of patients diagnosed with heart failure will also have 3 or more conditions. Conditions which were not considered for clustering are those that present early in childhood (e.g. asthma, sickle cell anaemia, learning disability).
Patients were grouped according to the clusters found based on their disease prevalence. As the analysis focused on the clusters of diseases rather than individuals, some people will have conditions that span across multiple clusters. For example, for persons assigned to cluster B, on average, 35% of their conditions could also belong to cluster C (Table S3). Reflecting the plots from MCA, those in cluster A, D and E tend to be younger (18À59), while cluster B tend to be older (80+; Table 3). Those in Cluster B and C have high cholesterol and obesity. The majority in Clusters D and E are males and current or past smokers (76%), while Black ethnicity features in Clusters C and D (32% and 26% in this cluster are of Black ethnicity, compared with 12À19% in other clusters). Clusters B and C have the highest number of medications. Deprivation does not seem to distinguish between clusters.
The main cluster analyses were carried out on 20 of the 32 conditions that were well represented in MCA. We compared our results using EFA and a cluster analysis using all 32 conditions and found similar results. Fig. S2 shows conditions which are closest together on the tree are the 'dependency and liver disease', 'ageing' (heart conditions and dementia), anxiety and depression, hypertension and diabetes, chronic pain, cancer and osteoarthritis. The conditions identified in the main cluster analyses also load onto the same factor in EFA (Tables S4ÀS6). The exception is the loading of chronic pain onto a separate factor with rheumatoid arthritis, however in the MCA plots we see that these two conditions are located close to each other in the first dimension.

Main findings
Using MCA to extract key patterns and discard noise from the data, this research found 20 conditions grouped together around five key clusters that remain consistent across time. The first dimension distinguishes conditions based on age and morbidity burden. In adults aged 18À39 years we identified the inter-connectedness of the highly prevalent conditions anxiety and depression diagnoses. A second cluster, associated with older age and polypharmacy, identified heart failure, PAD, osteoporosis, atrial fibrillation, CHD, CKD, stroke/TIA, and dementia as the most common co-occurring conditions. A third cluster, also occurring at older ages and particularly Black ethnic groups, connects highly prevalent conditions such as chronic pain, hypertension, diabetes mellitus and osteoarthritis. These conditions occur frequently in dyads and triads (Fig. 2) and are associated with high (4 LTC) multimorbidity. The second dimension identified the conditions HIV, viral hepatitis, liver disease, substance, and alcohol dependency, which predominantly occur in young males who are also likely to smoke. Social deprivation was not different across clusters, due to two possible reasons. Firstly, the variability between deprivation groups in this population is narrow, with just 1% in the least deprived national quintile. The small prevalence in this group makes it difficult to discriminate across conditions. Secondly, ethnicity is a stronger determinant, and this is often confounded by deprivation.
Across the whole population, the prevalence of multimorbidity increased over time, at 25% for the 2016À20 cohort compared to 15À16% for previous cohorts. The age and ethnicity structure of the multimorbid population changes to reflect a younger group with a higher proportion of Black and Asian ethnicities. The most prominent conditions in this population are not morbidities (physical diseases) but mental health conditions and chronic pain, as well as risk factors (hypertension, obesity, and alcohol).

Comparison with other studies
The finding of 21% prevalence of multimorbidity based on 32 conditions is comparable to studies reporting a similar number of conditions using an adult 18+ population [2,9,16]. Estimates of multimorbidity pattern prevalence differ in the literature because of variations in methods, data sources and structures, populations and LTCs studied. Although this makes it challenging to compare study results there are some similarities between the present and previous studies. For instance, the most common groups described in previous studies of multimorbidity patterns were cardiovascular and mental health. Clusters of high prevalence LTCs include cancer, hypertension, asthma, and depression [25], while clusters associated with high numbers of LTCs include hypertension, CHD, and diabetes. Clusters using older cohorts found clusters that are similar to our clusters B and C [19,26]. Other studies which have established different comorbidity profiles by an index condition, have also determined a high multimorbidity burden in those with cardiovascular diseases, and a lower burden in people with mental health disorders [16,17]. Substance misuse and mental illness clusters have been found to be more prevalent in younger, deprived communities [26], while cardiovascular multimorbidity has previously been shown to disproportionately affect black ethnic groups and women, and increases the likelihood of a diagnosis of chronic pain [27,28].

Strengths and limitations
Due to the long time period (15 years) of data collection, this study performed MCA separately on 5-year cohorts, to see if different patterns emerged. Previous studies have limited analysis to currently registered patients, whereas this study includes ex-registered patients. As a result, we identified long-term stability in multimorbidity patterns. Only one cluster grouping changed substantially, with the emergence of a combined mental health cluster A with that of chronic pain and the high prevalence cluster C in the 2016À2020 cohort, which may reflect the increasing co-occurrence between mental and physical health issues [6,26]. The number of determined clusters was based on the minimization of inertia, hence small changes in the groupings can be seen. However, the closeness of relationships, such as between anxiety and depression, between cardiovascular conditions, and the dependency/HIV and infectious disease conditions can be seen across all three cohorts, and this clinical consistency supported the analysis of clustering within the entire dataset.
This study benefits from having a large, relatively complete dataset. Selection of individual long term conditions and the definition of multimorbidity was identified using national guidelines as well as from discussions with local stakeholders, enabling identification of not only the 'old age' and 'high morbidity/high prevalence' clusters seen in other settings, but also clusters that are specific to this younger, multi-ethnic population, providing local relevance for service planning. However, the data only contains information on persons during their period of registration within a general practice, and we do not know what happens after they move away from the catchment area. The analysis does not consider the order of conditions, nor resolved conditions. Some conditions can be identified as resolved based on a standardised ruleset from the QOF (e.g. depression), but for non-QOF conditions (e.g. anxiety) 'resolve' codes are available but were not applied consistently. This means the relationship between anxiety and depression may change had coding of resolved conditions been more consistent. Relating to this, it is difficult to disentangle true population changes over time from increased data recording over time. Changes in LTC prevalence may be attributable to improved data recording, or true population changes.

What this study adds
This study not only identified clusters from the most prevalent conditions, but also clusters of conditions with low prevalence, which are missed when just examining the most common disease combinations in the form of dyads and triads -as the latter can be obscured by large data and may just be due to chance rather than identifying actual relationships [5]. Groups of conditions that remain consistent across time were identified, which is useful for future confirmatory research as well as to improve multimorbidity management. Analysis of a younger population identified a cluster of low prevalence conditions including dependency/HIV and infectious disease. The conditions in cluster D are known to co-occur, as Hepatitis B and C are common causes of chronic liver disease [29]. Similarly, people with alcohol and substance use disorders may be more at risk of HIV infection [30,31].
This study highlighted gender differences in the rates of common mental disorders À anxiety and depression. These disorders, in which women predominate, affect approximately 1 in 5 people in the south London population and constitute a serious public health problem. Not only are these conditions diagnosed at a younger age, but as women tend to survive longer than men they will also live with multiple long-term conditions for longer [32]. Gender disparities seen in the dependency/HIV clusters are also prominent in our sample, with higher rates in men. The World Health Organization reports that in developed countries, the lifetime prevalence rate for substance and alcohol dependence is more than twice as high in men than women with 1 in 5 men vs 1 in 12 women developing dependence during their lives [32].
for disease; to develop treatments; and to reconfigure services to better meet patients' needs [6]. With an ageing global population and a rise in disabling outcomes [33], it is necessary to continuously report on population health in detail, and to identify relationships between diseases to help decision makers identify ways of disease control and to better equip health services to deal with increasing burden of disease. Evidence suggests that multimorbidity prevalence is higher in urban versus rural areas [34]. Urbanisation is expected to increase, with many cities currently experiencing demographic changes with increasing migration and population densities. Thus, urban populations face challenges arising from increasing multimorbidity prevalence, severity, and complexity of conditions. This requires a tailored approach to care that considers these challenges, along with interventions designed to prevent and reduce avoidable disease burden [1]. For example, clinical management of HIV must consider possible diagnoses of co-morbid alcohol and substance use disorders or the possible prevention of these disorders [31]. Primary or secondary care that focusses on prevention of cardiovascular diseases, hypertension and diabetes is likely to delay the progression of severe multimorbidity in an ageing population.

Further work
Future work using this dataset will focus on the trajectories of diseases, to examine the onset of multimorbidity and their clustering. This analysis will take follow-up time into account, changing the study design to a longitudinal cohort study. We will use the clusters identified in this study to examine differences in patient consultation rates, and link to secondary care data to enable access to accurate hospital admissions and other important outcomes of multimorbidity.

Conclusion
This study has identified the co-morbidity between substance/ alcohol dependency and HIV; liver disease and viral hepatitis; anxiety and depression; cardiometabolic diseases and chronic pain; heart conditions and dementia. These key relationships characterise the young urban population of south London. When considering interventions or medications for one condition, clinicians should account for the increased risk of the patient belonging to one cluster acquiring other LTCs within the same cluster.

Data sharing
The data are not publicly available to share, but the research group can provide descriptive data in table form. Requests should be made to Mark Ashworth (mark.ashworth@kcl.ac.uk).

Declaration of Interests
The authors declare no conflict of interest.