A New Tool Preliminary Assessment on Temporal-Comorbidity Adjusted Risk of Emergency Readmission (T-CARER)

Patients’ comorbidities, operations and complications can be associated with reduced long-term survival probability and increased healthcare utilisation. The aim of this research was to produce an adjusted case-mix model of comorbidity risk and develop a user-friendly toolkit to encourage public adaptation and incremental development. It has been shown in healthcare research that demographics, temporal dimensions, length-of-stay and time between admissions, can noticeably improve the statistical measures related to comorbidities. The proposed model incorporates temporal aspects, medical procedures, demographics, and admission details, as well as diagnoses. The research resulted in the development of Temporal-Comorbidity Adjusted Risk of Emergency Readmission (T-CARER) model using routinely collected hospital data.


Introduction
There is increasing evidence that the quantification of high-risk diagnoses, operations and procedures, and monitoring changes over time, can greatly improve the quality of readmission models with adequate adjustment.There have been two streams of work on risk scoring comorbidities to estimate future resource utilisation, emergency admission and mortality.
Firstly, one stream of research looks at the odds ratio of major diagnoses groups and therefore is highly reliant on the whole population statistics.These models stem from crudely summing up the derived weights for comorbidities, which are based on the most recent admission of patients with disregard of temporal patterns.A popular example is the Charlson Comorbidity Index (CCI) [1], which relies on twenty-two comorbidity groups.One of the recent translation of the CCI is the National Health Service (NHS) England version of the CCI (NHS-CCI), that is continuously being updated [2][3][4][5][6].
The second stream of models uses a diagnosis classification approach based on simi larities, type, likelihood or duration of care.However, they are usually very complex and specialised to highly particular settings and populations.Also, these models use a period of care records in past, but temporal patterns are greatly ignored.One prominent method is the Elixhauser Comorbidity Index (ECI) [7,8], which relies on thirty comorbidity groups and 1-year lookback period.Unlike the CCI, the ECI is using Diagnosis-Related Groups (DRG), which was first developed by Fetter, et al. [9], Mistichelli [10]and is based on ICD (International Statistical Classification of Diseases) diagnoses, procedures, age, sex, discharge status, complications and comorbidities.A recent adaptation of the ECI is the AHRQ-ECI, which is actively being maintained by the US Public Health Service [7].Another wellestablished method is the John Hopkin's [11], Adjusted Clinical Groups (ACGs), which is a commercial tool.The model uses a minimum of 6-month and maximum of 1-year prior care records, and it encapsulates 32 diagnoses groups, known as Aggregated Diagnosis Groups (ADGs), and their aggregations called Expanded Diagnosis Clusters (EDCs).Moreover, these indices are initially developed to adjust for particular risks, like mortality risk and care utilisation, but they are commonly used in a variety of risk adjustment problems in critical care health services research.
In the machine learning pipeline that was developed in the prior stage of our research [12], comorbidity index was an extremely significant factor and has a high potential for further improvement.Presently, comorbidity risk indices have four major weakness areas: robustness, temporal adjustment, population stratification, and the inclusion of associated factors to comorbidities and complications.
In this research, we are going to improve on these four major areas.Firstly, to make the risk score relevant to different environments, an approach must be used to model complex correlation between variables and states.
Secondly, to better distinguish the short-and long-term conditions (i.e.prior admission, length-of-stay, and deltatime between admissions), the temporal dimension may be included in form of life-table or a polynomial weight function.Thirdly, population stratification is a major factor in the prevalence of medical conditions, and therefore must be adjusted.Fourthly, major correlated factors to diagnoses may be included directly or indirectly (latent) to improve the risk estimates, including secondary diagnoses, operations, procedures and complications.

Data
In this study, a bespoke extract of the HES inpatient data was used, which contains records from April 1995 to April 2010.Two main samples were randomly selected from this database, which includes 20% of total unique patients from 1999-2004 and 2004-2009 periods.Then, each main sample was divided into two equal half, to be used for training and testing.The specification of the data and selected samples are presented in Table 1.
Each time-frame was divided into one year of trigger-event, one year of prediction period, and three years of prior-history.The population includes all alive patients than one-year-old that have an admission within the trigger year.The prediction target variables are 30-and 365-day hospital emergency admission to the inpatient.

Features
After the data extraction step, several stages of data pre-processing and feature selection were carried out using the framework introduced by Mesgarpour, et al. [13].Firstly, a set of data pre-processing steps are applied, then the feature selection steps are carried out.Also, before carrying out the feature selection steps, features are aggregated and split into temporal events, to capture the events through time.

Pre-Processing
The pre-processing stage implements data selection, removals of invalids and imputations of observations.Also, the feature re-categorisation was applied in this stage, to reduce sparsity and to better capture non-linear relationships.
In re-categorisation step, a clinical grouper, known as the Clinical Classifications Software (CCS), was used to categorise the diagnoses, to better capture comorbidities' patterns and cross-correlations.The CCS categorises the ICD-10 (10th revision of the ICD) diagnoses and operations into a number of categories that are clinically meaningful [14,15].Furthermore, operations and procedures were categorised using the major categories of the OPCS-4, but alternative coding categorisation may be used, like ICD-10-PCS.The OPCS-4 is an alphanumeric nomenclature (similar to ICD-10-PCS), and is used by the NHS England and has an implicit categorisation for operations based on clinical categories rather than cost or risks.

Life-Table and Aggregation
Healthcare administrative data are severely unbalanced regarding the amount of longitudinal (panel) data per patient and their distributions over the years.Statistical methods are not equipped to handle these type of unbalances directly.Therefore, the survival analysis's life-table approach was used to keep track of temporal events [16].
Based on previous studies and the initial statistical analyses, four levels of temporal features were generated: 0-30, 30-90, 90-365 and 365-730 days.These four levels capture part of the temporal aspect of comorbidities, in addition to the delta-time between admissions (gapDays) and the length-of-stay (epidur) features that include temporal metadata.Furthermore, in the modelling stage, we applied several techniques to capture the complex temporal patterns of patients' Comorbidities.The temporal features were summarised in each temporal level based on several aggregation functions, including prevalence, count and average.This stage increased the number of features by more than fifty folds.

Feature Selection
After feature generation, a feature pool was produced based on the developed features.Thereafter, the feature selection step has been carried out.Firstly, the features were filtered out based on their linear cross-correlation, as well as frequency and sparseness (percentage of distinct, and the ratio of the most common value to the second most common).
Thereafter, the continuous features have been transformed using two feature transformations methods: scale-tomean and Yeo-Johnson [17].Both methods can be used to transform the data, to improve normality.Although, feature transformations would not guarantee better convergence or very stable variance for any dataset, they have been applied to avoid inputting skewed features into models.Moreover, a downside of transformations is that they make model interpretation harder, and can negatively impact the relationship between correlated features in the model.Therefore, the highly correlated features were removed after transformations.

Discussion
We compared the performance of the T-CARER against commonly used comorbidity index models using different samples and population cohorts across a ten year period.Our analyses of the T-CARER and the NHS-CCI for different diagnoses categories demonstrated that our model performed best in the majority of comorbidity groups, and in overall T-CARER models show better results against previous surveys of CCIs and ECIs.

Conclusion Remarks
An advanced research will be sought to identify an approach to score commodities by the inclusion of diverse categories of diagnoses, operations and complexities.The T-CARER performs consistently across tests and validations, and it outperformed against Charl-son and Elixhauser indices which are widely used for prediction of comorbidity risks.