Comparisons Between Hypothesis- and Data-Driven Approaches for Multimorbidity Frailty Index: A Machine Learning Approach

Background Using big data and the theory of cumulative deficits to develop the multimorbidity frailty index (mFI) has become a widely accepted approach in public health and health care services. However, constructing the mFI using the most critical determinants and stratifying different risk groups with dose-response relationships remain major challenges in clinical practice. Objective This study aimed to develop the mFI by using machine learning methods that select variables based on the optimal fitness of the model. In addition, we aimed to further establish 4 entities of risk using a machine learning approach that would achieve the best distinction between groups and demonstrate the dose-response relationship. Methods In this study, we used Taiwan’s National Health Insurance Research Database to develop a machine learning multimorbidity frailty index (ML-mFI) using the theory of cumulative diseases/deficits of an individual older person. Compared to the conventional mFI, in which the selection of diseases/deficits is based on expert opinion, we adopted the random forest method to select the most influential diseases/deficits that predict adverse outcomes for older people. To ensure that the survival curves showed a dose-response relationship with overlap during the follow-up, we developed the distance index and coverage index, which can be used at any time point to classify the ML-mFI of all subjects into the categories of fit, mild frailty, moderate frailty, and severe frailty. Survival analysis was conducted to evaluate the ability of the ML-mFI to predict adverse outcomes, such as unplanned hospitalizations, intensive care unit (ICU) admissions, and mortality. Results The final ML-mFI model contained 38 diseases/deficits. Compared with conventional mFI, both indices had similar distribution patterns by age and sex; however, among people aged 65 to 69 years, the mean mFI and ML-mFI were 0.037 (SD 0.048) and 0.0070 (SD 0.0254), respectively. The difference may result from discrepancies in the diseases/deficits selected in the mFI and the ML-mFI. A total of 86,133 subjects aged 65 to 100 years were included in this study and were categorized into 4 groups according to the ML-mFI. Both the Kaplan-Meier survival curves and Cox models showed that the ML-mFI significantly predicted all outcomes of interest, including all-cause mortality, unplanned hospitalizations, and all-cause ICU admissions at 1, 5, and 8 years of follow-up (P<.01). In particular, a dose-response relationship was revealed between the 4 ML-mFI groups and adverse outcomes. Conclusions The ML-mFI consists of 38 diseases/deficits that can successfully stratify risk groups associated with all-cause mortality, unplanned hospitalizations, and all-cause ICU admissions in older people, which indicates that precise, patient-centered medical care can be a reality in an aging society.


Introduction
Population aging is a global phenomenon that poses various challenges to societies [1]. The health characteristics of older people and their health care service utilization differ greatly from those of younger adults [2], and frailty plays a pivotal role in the health of older people [3][4][5]. Frailty has been widely accepted as a geriatric syndrome that substantially increases the complexity of diseases and the burden of care [3][4][5]. In addition, frailty is recognized as an intermediate state between healthy and unhealthy states, and the potential reversibility of its nature highlights the importance of considering frailty when aiming to maintain the health of older people [6]. Moreover, frailty involves the coexistence of multiple comorbid conditions, such as polypharmacy, depression, cognitive impairment, falls, and malnutrition [7]. Therefore, the early identification of frailty and appropriate intervention remain the core of health care services for older people.
Despite the clinical significance of frailty, conceptual and operational definitions of frailty are inconsistent across studies [8]. Currently, the two most widely accepted approaches include the phenotypic approach for physical frailty and the frailty index based on the theory of cumulative deficits [9]. Although the definitions of frailty provided by the two approaches overlapped to some extent, the major discrepancy is in the prefrail group, such that physically prefrail subjects demonstrated a wide range on the frailty index. Nevertheless, both definitions remain the most widely accepted [10]. The theory of cumulative deficits proposed that aging may be characterized by the presence of cumulative deficits in various domains of health (eg, multimorbidity, functional assessment, and psychosocial perspectives) [9]. With a sufficient number of variables, the individual component of the frailty index was considered the same weight to constitute the frailty index. Researchers applied the theory of cumulative deficits to various data sets and validated the ability of the frailty index (FI) to predict adverse clinical outcomes [4,9]. Internationally, documented health care services data sets have been widely used to develop the FI for the prediction of health outcomes, and studies from different countries have all shown optimal results [5,11,12]. In the United Kingdom, researchers developed the electronic FI (eFI) using electronic medical records, which significantly predicted the mortality of older people [13,14]. Using similar principles, we developed the multimorbidity FI (mFI) using Taiwan's National Health Insurance data set and significantly predicted mortality, hospitalizations, and admissions to critical care units [4]. However, it is always challenging to use data sets with large study samples and many variables to select appropriate variables to construct an FI and to optimally categorize the FI into risk classes. Both eFI and mFI adopted expert recommendations in the selection of variables, and the eFI and mFI were then categorized into quartiles for group comparisons, which is a widely accepted approach. Nonetheless, selecting variables based on expert recommendations may result in a failure to recognize previously unidentified associations. In addition, the quartile approach for risk group categorization may successfully be used to construct the prediction model, but the intergroup comparisons in survival analysis may overlap and fail to establish a clear distinction.
Therefore, this study aimed to develop the mFI by using machine learning methods that select variables based on the best fitness of the model. Furthermore, we aim to further establish 4 entities of risk using a machine learning approach and ensure the dose-response relationship and the best distinction between groups.

Study Design and Participants
This is a retrospective cohort study using data from Taiwan's National Health Insurance Research Database (NHIRD). Details about the NHIRD have been published [15]. Briefly, the NHIRD is a nationwide database composed of outpatient and inpatient claims, and it covers more than 99% of Taiwan's population. The data are checked for quality and maintained by the Data Science Centre of the Ministry of Health and Welfare of Taiwan. We used a subset of the NHIRD, which contains claims data for one million randomly selected beneficiaries from the Registry of Beneficiaries of the NHIRD

Construction of the Machine Learning-Based Multimorbidity Frailty Index
The mFI was constructed following standard procedures [16], and this method has been validated in the Taiwanese population [4,5]. Disease diagnoses (International Classification of Diseases, Ninth Revision, Clinical Modification [ICD-9-CM]) from outpatient and inpatient claims of the NHIRD between January 1 and December 31, 2005, were used to identify accumulated deficits to construct mFI. We adopted an algorithm widely used in studies using NHIRD as the data source to validate the diagnostic codes of the specified deficits within NHIRD; that is, only those who had at least 3 outpatient claim records or 1 inpatient claim record for that specified diagnosis code were considered to have the specified deficit. For example, an older adult must have at least 3 outpatient claim records or 1 inpatient claim record of diabetes mellitus [ICD-9-CM: 250] to be defined as having a deficit based on our definition.
A random forest method, with significant improvements in classification accuracy that resulted from growing an ensemble of trees and letting them vote for the most popular class, was adopted [17]. The variable importance of the random forest uses mean decrease accuracy to determine the specific conditions of machine learning-based multimorbidity frailty index (ML-mFI). The adequate constructive number of ML-mFI was 38 conditions, when the model accuracy reached the highest level, 0.602 (Figure 1 and Multimedia Appendix 1). The ML-mFI was calculated as the number of conditions a person encountered in a year out of the 38 selected ones.

Determination of Frailty Status by ML-mFI
All subjects were further categorized into 4 entities (fit, mild frailty, moderate frailty, and severe frailty) based on their risk status; this categorization was used by a previous study [4]. The fundamental rules for risk stratification included the following: (1) the individual risk groups were significantly different from each other, and (2) the health risk of these groups showed a dose-response relationship (ie, those in the severe frailty group had a higher risk than those in the moderate frailty group, who had a higher risk than those in the mild frailty group, and so on, at any follow-up time point after the first year). To achieve this purpose, we developed two indices, the distance index and the coverage index, which ensured the distinction and dose-response relationship of all survival curves.
The distance index measured the distance between each survival curve and the stability of those distances within groups. At any time point, the distance index was defined as . Therefore, the distances within groups are wider and more stable when the distance index is larger (Multimedia Appendix 2). Conversely, the coverage index aimed to evaluate the length of the confidence interval for each survival curve. The total length of the confidence intervals indicated the overall estimated error in the grouping method. In Multimedia Appendix 3, the coverage index was defined as at any individual time point, where L_total measured the difference in the estimated survival probability between the fit group and the severe frailty group, and L_error measured the total estimated errors within the 4 groups. When the coverage index is smaller, the estimated error within groups is smaller. With the application of both the distance index and the coverage index, the levels of frailty were successfully categorized into 4 groups by values of ML-mFI: fit was indicated by 0≤ML-mFI<0.026; mild frailty was 0.026≤ML-mFI<0.105; moderate frailty was 0.105≤ML-mFI<0.157; and severe frailty was 0.157≤ML-mFI. In the survival analysis, the grouping strategy successfully categorized all subjects into 4 groups with significant distinction during the follow-up period. In other words, there were no overlaps between the survival curves and the dose-response relationship between groups was clearly shown.

Outcomes of Interest
The outcomes of interest in this study include all-cause mortality, unplanned hospitalizations, and intensive care unit (ICU) admissions. The date of mortality was identified as the date of disenrollment from the NHIRD, which has been validated in a previous study [4]. Unplanned hospitalizations were any unexpected hospitalizations after an emergency department visit. ICU admissions were any hospital admissions with the use of ICU services. All study subjects were continuously followed from January 1, 2006, to the occurrence of each outcome or the end of 2013, whichever came first. For the outcomes of unplanned hospitalizations and ICU admissions, subjects were censored at death if it occurred first. Preplanned analyses were conducted to evaluate the effectiveness of ML-mFI in predicting outcomes at 1, 5, and 8 years.

Statistical Analysis
Numerical variables were expressed as the mean (SD), and categorical variables were expressed as a number or percentage. A random forest method not only determined the number of disease items comprising ML-mFI but also identified potential conditions of ML-mFI with prediction accuracy and variable importance. The distance index and coverage index with min-max and max-min criteria were used to determine cut points and categorize the frailty group by ML-mFI automatically. The Kaplan-Meier survival curve with the log-rank test was used to examine the association between categories of ML-mFI (fit, mild frailty, moderate frailty, and severe frailty) and 8-year mortality and hospitalizations. Cox proportional hazard models were used to estimate the hazard ratios (HRs) and 95% CIs for mortality and hospitalizations at 1, 5, and 8 years after the ML-mFI and mFI were estimated (based on a previous study [4]), considering both to be the independent variable. We further included age and gender as covariates in all adjusted models. Sex-specific analysis was conducted.
All of the analyses were performed using R Version 3.4.4 (R Foundation for Statistical Computing). A two-sided P value of <.05 was considered statistically significant. The coxph function in the survival package showed nonviolation of the proportional hazards assumption and a linear relationship between the log hazard and each covariate. The random forest and importance functions in the randomForest package showed the model building and variable importance to predict the outcome occurrence and comprise ML-mFI, respectively.

Construction of ML-mFI
The final ML-mFI with the highest model accuracy (0.6022061) contained 38 conditions (Multimedia Appendix 1). Details of convergences and divergences of composing conditions among ML-mFI and mFI are shown in Multimedia Appendix 4. Table  1 compares the ML-mFI group and traditional mFI by age and sex. There were two similar distribution patterns on mFI and ML-mFI. ML-mFI increased with age, but reached a plateau at age 80 years and older. Both indices were higher in males, which is compatible with the shorter life expectancy of men in Taiwan. However, the mFI was calculated based on 32 selected conditions a person may have in a year, while the ML-mFI was calculated based on 38 selected conditions a person may have in a year; thus, the actual numbers on the mFI and ML-mFI were very different. Among people aged 65 to 69 years, the mean mFI and ML-mFI were 0.037 (SD 0.048) and 0.0070 (SD 0.0254), respectively. The difference may result from discrepancies in the conditions selected on the mFI and the ML-mFI. For example, some conditions were selected only on the ML-mFI but not on the mFI (eg, ICD-9-CM: 250 [diabetes mellitus] and, vice versa, ICD-9-CM: 374 [entropion]). These discrepancies have been shown in Multimedia Appendix 1.

Survival Analysis
Overall, 86,133 subjects aged 65 to 100 years were included in this study. With a mean follow-up of 6.57 (SD 2.37) years, 30,136 deaths (34.99%) occurred among the study cohort during the study period. Figure 2 summarizes the results of the Kaplan-Meier survival curves estimating 4 levels of ML-mFI on all-cause mortality, unplanned hospitalization, and ICU admission, and shows that ML-mFI significantly predicted all these outcomes of interest. Table 2 shows the hazard ratios of all-cause mortality, unplanned admissions, and ICU admissions for the ML-mFI and the mFI at the 1-, 5-and 8-year follow-up periods. Among all three outcomes of interest, ML-mFI posed higher hazards than did mFI. For example, those who were categorized as severely frail by the mFI or the ML-mFI were associated with 4.97-fold (adjusted HR 4.97, 95% CI 4.49-5.50) and 11.4-fold (adjusted HR 11.40, 95% CI 10.32-12.59) increases in 1-year all-cause mortality, respectively. Similar patterns were observed for 5-year and 8-year all-cause mortality.  For unplanned hospitalizations, those who were categorized as severely frail by the mFI or the ML-mFI were associated with 4.28-fold (adjusted HR 4.28, 95% CI 3.94-4.64) and 6.20-fold (adjusted HR 6.20, 95% CI 5.66-6.80) increases in 1-year unplanned hospitalizations, respectively. Similar patterns were observed for 5-year and 8-year all-cause unplanned hospitalizations.
For ICU admissions, those who were categorized as severely frail by the mFI or the ML-mFI were associated with 4.28-fold (adjusted HR 5.35, 95% CI 4.84-5.91) and 9.41-fold (adjusted HR 9.41, 95% CI 8.49-10.44) increases in 1-year ICU admissions, respectively. Similar patterns were observed for 5-year and 8-year all-cause ICU admissions.
Sex-specific analysis showed that both indices were higher in men than in women for various outcomes and follow-up periods (Tables 3 and 4 for males and females, respectively). For example, men in the severe frailty group (as defined by the ML-mFI) were associated with a 12.64-fold increased risk of 1-year mortality, while women in the severe frailty group (as defined by the mFI) were associated with a 10.37-fold increased risk of 1-year mortality.  a For all outcomes, the comparator is subjects in fit categories (n=64,650). All data were adjusted for age and gender. b ML-mFI: machine learning multimorbidity frailty index. c mFI: multimorbidity frailty index.

Discussion
In this study, we successfully used a machine learning approach to define ML-mFI. Specifically, we selected disease/deficit items by the random forest method and ranked the importance of each individual disease accordingly. The selection of these diseases/deficits items to construct ML-mFI was driven solely by data, while the conventional mFI included disease/deficit items based on expert recommendations. Moreover, the combined use of the distance index and coverage index successfully distinguished 4 groups with dose-response risks of adverse outcomes. In epidemiological studies, researchers have often encountered similar challenges in selecting appropriate variables for analysis and optimally categorizing continuous variables into categorical variables for further comparisons. Traditionally, researchers need to search for literature support or adopt a generic approach to develop an optimal statistical model for data interpretation [18][19][20]. The hypothesis-driven approach for a research question is of great importance in scientific development; however, previously unknown or unidentified factors may be overlooked in the analysis, which may lower the statistical power in the interpretation of the phenomenon. Compared to our previous work where we used a hypothesis-driven approach to construct the mFI [4], the machine learning model selected significantly different disease/deficit items for ML-mFI construction. The traditional approach selected the diseases/deficits of older adults based on the selection criteria, and the machine learning approach identified more disease/deficit items, including chronic diseases, infectious diseases, and even some cancers, but these items did not comprise the majority of disease/deficit items.
The FI developed by Rockwood et al [18] hypothesized that cumulative deficits in various health domains may represent the process of biological aging, and this FI has been widely validated to predict adverse health events and mortality in different countries [5,11,12]. In theory, an FI may consist of as many variables as possible, so there are no issues regarding variable selection. However, to meet the needs of the busy clinical environment, the mFI is derived from the concept of the FI; a selection of age-related chronic conditions were the key variables used to construct the prediction model. Existing studies have shown that these previously developed mFIs can significantly predict the mortality of older adults [13.14]. However, to maximize the effectiveness of the prediction model, using a data-driven approach to construct the ML-mFI may provide better prediction accuracy. Moreover, in the survival analysis, the dose-response relationship is usually expected when grouping continuous measurements into distinct risk groups in association with outcomes. However, the distinction between individual groups from the continuous measurements is not always statistically significant even though the whole model reached statistical significance. For example, in one Taiwanese study, the developed FI was found to predict the adverse outcomes of older adults, which was in line with most related studies [5]. However, in that study, different risk groups that were categorized based on FI tertiles resulted in overlapping survival curves of the intermediate-and high-risk groups; it failed to achieve the stratification of risk groups. The combined use of the distance index and coverage index developed in this study engenders the ability to address the overlapping phenomenon of survival curves.
Although the mFI we developed adequately predicted adverse outcomes for older adults, the ML-mFI showed relatively higher hazard ratios than did the mFI for all health outcomes. Overall, the data-driven ML-mFI may identify different at-risk populations than the hypothesis-driven mFI. The data-driven approach may disclose the phenomenon of the whole data set [21][22][23], but the hypothesis-driven approach may provide a better explanation for the observations [24,25]. The data-driven approach may not be superior to the hypothesis-driven approach, since the study purpose and research questions may vary greatly. Although a data-driven approach may usually establish a prediction model with better accuracy, it is difficult to implement intervention programs for the observed phenomenon. Applying the theory of cumulative deficits, a large number of variables may be used to construct the prediction model, but it becomes challenging to further utilize the prediction model with a large number of variables. Therefore, researchers have attempted to reduce the number of variables while maintaining optimal prediction accuracy. Our previous study used factor analysis to reduce the 125 selected variables into 35 factors to improve the clinical application [5]. However, the machine learning approach in this study may play a similar role in reducing the selected number of variables and optimizing prediction accuracy. The main strength of this study was to demonstrate the methodological advance of processing a large data set to select appropriate variables to construct a prediction model and to ensure the distinction of different risk groups with dose-response relationships. This methodological advance may facilitate public health or social sciences research, or interdisciplinary research that uses a large data set with a wide array of data characteristics. In particular, the distance index and coverage index would be of great importance for future research to categorize the results of continuous variables into distinct entities with different health risks. Avoiding the overlap of the survival curves of different risk groups by using the distance index and coverage index is important to strengthen the observed phenomenon and the risk group classification.
Therefore, this ML-mFI demonstrated an automatic approach to predict adverse outcomes in older people, and it can be applied to different populations in different countries. Using the same approach, different diseases can be selected to construct the new ML-mFI in another population to predict adverse outcomes in the corresponding population. For example, we further stratified our study population into 3 subcohorts, including those aged 65 to 75 years, 76 to 85 years, and 85 years and older, and we constructed three kinds of ML-mFI for each age group according to the same automatic machine learning approach and model selection criteria. We found that the total deficit number and composing deficits on the ML-mFI, as well the cut-off points of different frailty statuses, are quite different in distinct age groups. For example, the total deficit numbers on the ML-mFI were 59, 47, and 39 for those who were aged 65 to 75 years, 76 to 85 years, and 85 years and older, respectively. In addition, the composing deficits were different, as displayed in Multimedia Appendix 5. Although the composing deficits of the ML-mFI are different in distinct age groups, all of these ML-mFI can successfully predict all-cause mortality (C index>0.6). These findings are inspiring because they indicate that the same machine learning approach can be used to construct one's own ML-mFI to fulfill this purpose. Individual diseases may have different clinical impacts in different countries due to diagnosis, treatment, and quality of care. Therefore, the results of this study can be applied to different countries and populations using the same approach to construct their own ML-mFI to meet their needs. Therefore, our ML-mFI could have clinical implications in public health or in health care administration. For example, in large long-term care facilities management, the administration needs to optimize the admission waiting list through the estimation of the mortality of all residents. On the other hand, in public health settings, the government is able to accurately estimate the health risk of residents in a certain geographic area and to provide optimal health care or palliative care services. Traditionally, these decisions were made based on existing medical knowledge, but a data-driven approach may better predict outcomes and optimize the government's public health policy. In clinical practice, the ML-mFI may enable physicians and families to quantify health risks for optimal care planning. Hence, using available electronic medical records, the ML-mFI can be automatically generated and integrated as part of the medical record to facilitate certain forms of decision-making in care planning.
Despite all the effort that went into this study, there are still some limitations. First, like all data-driven studies, the results of this study could not provide or validate a well-established hypothetical framework due to the nature of machine learning.
Second, it remained difficult to develop further intervention programs based on the diagnostic entities identified by machine learning. Third, another data set is needed to examine whether overfitting exists in the machine learning model. Finally, as in most of the previous frailty index studies, although we adjusted for age and sex as covariates in the Cox model, we were unable to access some residual confounders not routinely captured in a claims database, such as disease severity or lifestyle factors (eg, physical activity and diet).
In conclusion, the ML-mFI significantly predicted adverse health outcomes for older adults, and the risk groups defined by the combination of the distance index and coverage index distinguished the different risk groups with dose-response relationships and clear distinctions. The methodological advance of this study also had further research implications for studies with similar data and research questions. The data-driven approach may provide better prediction accuracy than the hypothesis-driven approach, but the superiority of the data-driven approach requires further study for confirmation.