Predictive Modelling for Hospital Readmission Risk in the Philippines

Predictive models have been developed over the years to identify patients at risk of readmission. The goal of this study is to identify the risk factors associated to a patient’s readmission within one year in the cohort study including acute myocardial infarction (AMI), Heart Failure (HF), Chronic Obstructive Pulmonary Disease (COPD) and Pneumonia (PN) in a reputed Philippine hospital. Four predictive models were used and evaluated using performance metrics. The study found Logistic Regression as the most performing model in most of the cohort studies. There are 6 to 8 variables significantly associated with the readmission of high-risk patients.


Introduction
Hospital readmission happened when a patient hospitalized again due to same or different diagnosis within a time frame after being discharged. Hospital readmission is either planned or unplanned treatment in the same (index) hospital or the different hospital within 7, 15, 30, 60, or 365 days [8]. In most cases, unplanned readmission is always associated with lower quality of care received during index admissions. There are many factors associated with readmission. Some are intrinsic, attributable to the reduced patient reserve due to disease progression and severity at each admission. Some may have related to the clinical planning and care coordination while patients are still in the hospital. Others are related to post-discharge care and other social factors [1]. The most common medical condition diagnosis that has higher readmission rates are acute myocardial infarction, congestive heart failure, pneumonia, chronic obstructive pulmonary disease, total hip arthroplasty and total knee arthroplasty [2].
Hospital readmission is not only degrading the quality of healthcare but also affects patient financial due to increasing medical expenses. Some developed countries like England and Denmark consider readmission applicable to almost all conditions. In the US, the readmission penalty program focuses on the diagnosis that is more expensive and which can be preventable [1,3]. Furthermore, in developing countries such as South Africa and Nigeria, hospital readmission problems are much pronounced due to limited resources and lack of funding to grapple [7]. One of the strategies in reducing unplanned hospital readmission rates is to apply classifiers and predictive models, including Logistic Regression and machine learning algorithms. Several reviews to date have developed readmission risk prediction models to identify high-risk patients, where it emphasized that not every predictive model works equally for each hospital. Hence, the applicable model depends on the setting, and the population studied, and the overall performance of reviewed models was still inconsistent [4,5]. With around 80% readmission occurring at the same hospital, it may be reasonable to assume that the models developed based on same-hospital readmission data are also able to predict all-hospital readmission [6]. Also, few studies have published regarding hospital readmissions in developing countries, including the Philippines [9,10]. Furthermore, there is a lack of digitized data available in the Philippine hospital, and no national program penalized hospitals for higher readmission rates. This study aims to design and build a predictive model to identify the predictors and patients at risk of readmission within a one-year timeframe in a same-hospital setting.

Data Collection and Definition
We used administrative data and clinical data obtained from Northern Mindanao Medical Center (NMMC), Cagayan de Oro City, Philippines approved from their Research Ethics Board. This study was a retrospective cohort study that aims to analyses patients hospitalized in a primary diagnosis based on the identified conditions within 1-year readmission.The initial dataset contains 2,234 patients who readmitted between June 2017 and December 2018. The inclusion criteria consisted of patients who were 18 years of age or older and had an index inpatient admission with a primary diagnosis of four conditions, including AMI, HF, PN, and COPD. These conditions extracted using the International Classification of Disease, Ninth Edition (ICD-9) codes to confirm diagnosis and discharge. For AMI, the codes used were I20.0, I21.9; for HF, I50.0, I50.9; for COPD, J44.9, J44.8; and for PN, J18.92, J18.93, J18.99, J69.0. Although it is possible for a patient to have multiple admission and readmission, this study only limits the population or sample to the first occurrence to ensure that a unique patient is analyzed.This study excludes records when a patient has missing records to specific attributes such as missing length of stay. It also considers excluding patients that lead to transfer to another acute care hospital and died during readmission. The final sample consisted of 200 AMI patients, 127 HF patients, 75 COPD patients, and 261 PN patients.

Data Preparation
It is essential to have appropriate preprocessing since the data acquired are inconsistent and incomplete. First, constant features or features that provide no information in the dataset removed to avoid error in models. Second, we applied multiple imputations with 5 imputations using predictive mean matching (PMM) for accounting missing data. Finally, combine filter and wrapper feature selection methods to identify relevant and best features in predicting hospital readmission. All analyses conducted using R statistical software.

Constructing Predictive Modelling
Before constructing a predictive model, each cohort dataset was separated into training (70%) and testing (30%) sets which is one of the common ratios used in medical research [11]. Logistic Regression, Support Vector Machine, Random Forest, and Neural Network were used to classify using the training set [12]. To further enhance the models' performance, tuning was conducted to find the best parameters. The result from each model was used to predict the testing set. The effectiveness of each model was evaluated using the area under the curve (AUC), accuracy, sensitivity, specificity, and precision with 10-fold cross-validation.

Predictors of 1-year Readmission
Predictors were determined by using both forward and backward selection method. Initially, there are 90 variables collected in each cohort study, and it was then reduced to 54 variables (  Table 3 summarizes the performance of the best models for each disease cohort. As shown, in terms of AUC and other performance metrics. Logistic Regression is the best performing model for predicting readmission in HF and COPD. SVM show better performance in predicting AMI provides high result in specificity and precision in most of the cohort studies. Neural Network is the best performing model for Pneumonia, and it can perform better in less hidden layers. It is observed that Random Forest was the least impressive model in all disease cohorts.

Conclusion and Future Works
In this study, we developed and compared four predictive models. All models were evaluated based on several performance metrics. It found that Logistic regression performs better when predicting readmissions. It is determined which predictors have significant associated with 1-year readmission using selection technique. The use of the predictive model can be a great tool in providing insights to design disease-specific interventions and decrease the readmission of high-risk patients in the Philippine hospital.In future studies, several critical features in medical records like health history, lifestyle, and social factors need to be collected. Explore other statistical methods or machine learning algorithms that could provide better results.