External Validation of a Population-Based Prediction Model for High Healthcare Resource Use in Adults

Predicting high healthcare resource users is important for informing prevention strategies and healthcare decision-making. We aimed to cross-provincially validate the High Resource User Population Risk Tool (HRUPoRT), a predictive model that uses population survey data to estimate 5 year risk of becoming a high healthcare resource user. The model, originally derived and validated in Ontario, Canada, was applied to an external validation cohort. HRUPoRT model predictors included chronic conditions, socio-demographics, and health behavioural risk factors. The cohort consisted of 10,504 adults (≥18 years old) from the Canadian Community Health Survey in Manitoba, Canada (cycles 2007/08 and 2009/10). A person-centred costing algorithm was applied to linked health administrative databases to determine respondents’ healthcare utilization over 5 years. Model fit was assessed using the c-statistic for discrimination and calibration plots. In the external validation cohort, HRUPoRT demonstrated strong discrimination (c statistic = 0.83) and was well calibrated across the range of risk. HRUPoRT performed well in an external validation cohort, demonstrating transportability of the model in other jurisdictions. HRUPoRT’s use of population survey data enables a health equity focus to assist with decision-making on prevention of high healthcare resource use.


Introduction
Canada has among the highest healthcare spending in the world, with total costs for 2018 estimated at CAD 253.5 billion or CAD 6839 per Canadian [1]. Across jurisdictions, healthcare spending is concentrated among a small proportion of the population [2]. From 2009 to 2011, 5% of healthcare users in Ontario, Canada accounted for 65% of healthcare costs [3]. Predictive tools that estimate new high cost users can inform cost reduction strategies. Existing healthcare utilization models that can be applied to new health settings, outside from where the model was originally developed, should demonstrate external validity [4,5]. Geographic validation (a type of external validation that examines how the model performs in individuals from a different health setting from where the model was originally developed) is especially suited for establishing the generalisability of a predictive model to health systems in other jurisdictions [6].
The High Resource User Population Risk Tool (HRUPoRT) is a population-based predictive 5 year model of adults who will become the top 5% of healthcare users based on self-reported health, sociodemographic, and health behavioural information that is routinely collected in population surveys [7]. HRUPoRT was developed in a cohort of 58,617 people from Ontario, Canada who responded to the 2007/08 Canadian Community Health Survey (CCHS) and was temporally validated in a cohort of 29,721 Ontarians in the 2009/10 CCHS [7]. The top 5% of healthcare users were calculated based on ranking individuals according to gradients of cost within the CCHS cohort using a person-centered costing methodology [7]. HRUPoRT is unique in that it was designed for use at the population and health system level, and the algorithm can be readily applied in other jurisdictions given that the model inputs are widely accessible in population survey data. The objective of this study was to establish HRUPoRT's generalizability to other jurisdictions by geographically validating the Ontario derived HRUPoRT algorithm in another Canadian province, using self-reported risk factors from the Canadian Community Health Survey and high resource user status ascertained by administrative data from Manitoba's health system. This type of geographic (cross-provincial) validation is important for demonstrating whether the HRUPoRT model can accurately predict high resource users in other jurisdictions, ensuring wide-scale generalizability of the tool.

Context and Setting
A prospective cohort study conducted in the province of Manitoba, Canada was used to externally validate the High Resource User Population Risk (HRUPoRT) model [7]. Manitoba has a single payer health insurance system that provides universal health coverage to its population of about 1.28 million residents, as of 2016 [8]. The study utilized linked population health surveys and health administrative data held and accessed at the Manitoba Centre for Health Policy. The study was approved by the research ethics boards at the University of Manitoba (#HS20593/H2017:093) and University of Toronto (#31967).

Data Sources
The study cohort was created by linking respondents from the combined 2007/2008 and 2009/2010 Canadian Community Health Surveys (CCHS) to provincial health administrative data. The CCHS is a cross-sectional survey administered by Statistics Canada that collects self-reported health related data and uses a probability sample and weighting system that is representative of 98% of the Canadian population aged 12 years and older living in private dwellings. Excluded from the CCHS sampling frame are individuals living in First Nation communities, full-time members of the Canadian Forces, individuals living in long-term care institutions, and residents of certain remote regions. The detailed survey methodology of the CCHS is described elsewhere [9]. Cycles of the CCHS were combined using the pooled approach [10].
Data on health services utilized for the 5 years after the CCHS interview date were obtained from the health administrative databases that are linked through a unique anonymized personal health identification number (PHIN). The administrative databases included Medical Services Data, Discharge Abstract Database, Case Mix Grouper data, National Rehabilitation Reporting System, Long Term Care Utilization database, and Drug Program Information Network. A description of these databases is available in Appendix A.

Participants
The external validation cohort was created using identical eligibility criteria as were applied in the original development study. The cohort consisted of CCHS respondents aged 18 years and older, who had a valid health card and agreed to have their survey responses linked to the provincial health administrative data. For individuals that appeared in multiple CCHS cycles, only data collected from their first CCHS interview were used. After exclusions, the cohort consisted of 10,504 respondents from the pooled 2007-2010 CCHS.

High Resource User Outcome
Healthcare utilization costs for each of the 5 years following the CCHS interview date were computed by applying a person-centered costing approach to the linked health administrative databases [11]. The costing algorithm estimated individual healthcare costs for utilization of physician services, inpatient hospitalizations, rehabilitation, long-term care, same day surgery, and pharmaceuticals. Individuals were ranked in each year according to the total annual per-person healthcare expenditures, with high resource users defined as the top 5% of users in any given year.
Due to data availability, costs did not include health services for complex continuing care, home care, emergency department visits, inpatient mental health, and assisted medical devices, as were included in the ascertainment of high resource user status in the original HRUPoRT development cohort [7]. As well, due to differences in provincial drug plans (income-based drug coverage in Manitoba versus age-based drug coverage in Ontario for people aged 65 and older), a wider population coverage of drug costs was included in the Manitoban cohort compared to the original development cohort.
All predictors and categorizations were defined using the same CCHS questions as applied in the original development cohort [7]. An exception to this was alcohol consumption, in which due to provincial differences in CCHS questions related to alcohol use, categories were defined based on how often alcohol was consumed in the past 12 months as opposed to the number of drinks consumed per week in the past 12 months. Specifically, alcohol consumption was defined as heavy drinker (drinks 1-6 times a week or everyday and binges once or more than once a week), moderate drinker (drinks once a week and binges 1-3 times a month; or, drinks 2-3 times a week and binges less than 3 times a month; or, drinks 4-6 times a week or everyday and binges less than 3 times a month or never), light drinker (drinks 1-3 times a week and never binges), non-drinker (drinks less than weekly or no alcohol consumption in the last 12 months). The revised definition for alcohol categories was informed by alcohol consumption definitions used in previous literature [12].

External Validation of the Model
All predictor variables were centered to the mean to account for differences between populations and the distribution of risk factors. Missing values for predictors were maintained as separate categories. For each individual in the cohort, the predicted 5 year probability of high resource user status was computed by using model coefficients that were derived from the original development cohort, but which were updated from the previously published model using mean-centered risk factor variables (see Appendix A Table A1). The probabilities were computed using the formula: probability = exp(logit)/(1 + (exp(logit))), in which the logit is the sum of the regression coefficients multiplied by their respective predictor variable values.
The predicted probabilities were used to evaluate the model's performance relative to discrimination and calibration. Discrimination refers to the ability of the model to differentiate between those who will, and those will not become a high resource user. Discrimination was measured using the c-statistic, which is identical to the area under the receiver operating characteristic curve [13]. Calibration was assessed by grouping the observations into deciles of risk and comparing the agreement between predicted risk and observed high resource user outcomes. Calibration was visually assessed by observing the calibration plot across deciles of high resource user risk [13]. The overall performance of the models was assessed using the likelihood ratio (R 2 ), which measures the variation explained by a model, and the Brier score, which measures the accuracy of predictions by calculating the squared difference between outcome and predictions [13].

General Statistical Analyses
All estimates were weighted using sampling survey weights provided by Statistics Canada to account for the complex survey design of the CCHS and to be representative of the provincial population. Confidence intervals were estimated using bootstrap weights applied using the balanced repeated replication approach for standard error estimation. All statistical analyses were conducted using SAS, version 9.4 (SAS institute, Cary, NC, USA).

Results
Baseline characteristics of the external validation cohort are summarized in Table 1. At the end of the 5 year follow-up period, 10.9% (n = 1145) of respondents became a high resource user. In the original development cohort, 6.0% of respondents became a high resource user [7]. The difference in magnitude may be due to differences in the provincial distribution of immigrant status and non-white ethnicity, which are protective of high resource user risk. The cohort had a similar age, sex, and income distribution compared to individuals in the original development cohort [7]. In addition, the presence of a chronic condition and the distribution of health and behavioural characteristics (BMI, smoking, physical activity, alcohol consumption) were similar in the current and original development cohort. Major cohort differences were that the Manitoba cohort had fewer immigrants (11.0% vs. 23.7%) and individuals of non-white ethnicity (9.9% vs. 21.7%), compared to the original Ontario development cohort. In the external validation cohort, the HRUPoRT model demonstrated strong discriminative-ability (c-statistic = 0.8264). This was similar to discrimination observed in the original development (c-statistic = 0.8213) and validation (c-statistic = 0.8171) cohorts [7]. The R 2 statistic was 0.1408 and the Brier score was 0.0835, indicating appropriate overall model performance. Figure 1 shows the calibration plot for observed and predicted high resource user cases across decile risk groups. The model was well calibrated across the spectrum of risk, with the exception of an underprediction of 28% in decile two, 40% in decile four, and an overprediction of 33% in decile one. It is possible that other risk factors not captured by the model are more predictive of high resource user risk for people in these lower decile risk groups. Otherwise, differences between observed and predicted cases were in the range of 7% to 14%. one. It is possible that other risk factors not captured by the model are more predictive of high resource user risk for people in these lower decile risk groups. Otherwise, differences between observed and predicted cases were in the range of 7% to 14%.

Discussion
This study externally validated a previously developed population-based model for predicting the five-year risk of high healthcare resource utilization [7]. We demonstrated that the HRUPoRT had good discrimination and was well calibrated throughout the range of risk in an external population that was distinguished by geography from the original derivation and validation cohorts. The HRUPoRT can be applied to identify priority populations and inform population-level prevention of high resource users and reduction in cost to the healthcare system in other jurisdictions. HRUPoRT's ability to predict high users could also facilitate a process by which health ministries could identify clusters of residents at high risk in the community and target interventions to prevent high healthcare use.

Discussion
This study externally validated a previously developed population-based model for predicting the five-year risk of high healthcare resource utilization [7]. We demonstrated that the HRUPoRT had good discrimination and was well calibrated throughout the range of risk in an external population that was distinguished by geography from the original derivation and validation cohorts. The HRUPoRT can be applied to identify priority populations and inform population-level prevention of high resource users and reduction in cost to the healthcare system in other jurisdictions. HRUPoRT's ability to predict high users could also facilitate a process by which health ministries could identify clusters of residents at high risk in the community and target interventions to prevent high healthcare use.
HRUPoRT is uniquely designed for use on routinely collected population surveys, which contain social determinants of health information, enabling a focus on health equity in decision-making. Socioeconomic factors, including income, food security, and life satisfaction, are known to be associated with high resource use [14][15][16]. These data are not readily collected in electronic medical records or health administrative data, which are common data sources for other existing healthcare utilization models [17,18]. Population health surveys are widely available across countries and regions, and importantly, tools that can make use of this data can help identify clusters of health disparities in communities.
A limitation of this study was the difference in the measurement of HRU status, as compared with the development study. Specifically, due to data availability, fewer healthcare services were included in calculating healthcare costs for individuals in the validation cohort, including the absence of costs for the emergency department and home care. Although this study used fewer types of health services to calculate healthcare costs, the characteristics of HRUs identified in our cohort were similar to those reported in the original development study. Furthermore, due to the sampling frame of the CCHS, the model's performance among populations living in institutional facilities and Indigenous people living in First Nation communities is unknown. Finally, the HRUPoRT does not include clinical variables related to illness level, which is a key determinant of health service utilization [19]. This was due to our aim to build a model that could run solely on routinely collected population health survey data to enable wide-scale use. It is possible that the integration of clinical variables, and other risk factors that may be more predictive of HRU risk for people in the lower risk deciles, could improve the predictive performance of the model. Future research focused on improving predictive performance could explore updating the model using linkages of population health surveys and administrative data, noting that including more variables that are not easily accessible to planners could reduce usability. Nonetheless, it is noteworthy that the HRUPoRT accurately predicts risk in high-risk decile groups, which is a key purpose of the model.
Containing healthcare spending has been identified by governments in multiple health systems as a top priority. The HRUPoRT considers the upstream determinants of high-cost users, which is useful for informing the design of different HRU prevention strategies at the community level. The HRUPoRT is intended to be used by health system planners and decision-makers as an aid in population-based planning. The tool's relevance has been demonstrated by its integration into routine practice at a major public health unit in Ontario to understand the determinants of high resource use, which is a core focus of public health [20]. The HRUPoRT can inform how risk is distributed in communities and help identify which population groups would benefit from targeted interventions. Population surveys that are suitable for HRUPoRT application in other jurisdictions include The National Health Interview Survey (NHIS) in the United States, National Health Survey in England, New Zealand Health Survey, among others. Demonstrating performance of prediction models in various settings is an important step to ensure validity when implemented in other regions. Future research can focus on the application of the HRUPoRT for modelling HRU prevention strategies in different jurisdictions.

Conclusions
This study has demonstrated the geographical validation and resulting predictive performance of the HRUPoRT, offering evidence to support the transportability of the model in other jurisdictions. The HRUPoRT's use of population health survey data enables a focus on health equity to assist with decision-making on high healthcare resource use prevention.

Conflicts of Interest:
The authors declare no conflict of interest.

Data Sharing:
The dataset for this study is held securely in coded form at the Manitoba Centre for Health Policy (MCHP). While data sharing agreements prohibit MCHP from making the dataset publicly available, access may be granted to those who meet pre-specified criteria for confidential access. The full dataset creation plan and underlying analytic code are available from the authors upon request, understanding that the computer programs may rely upon coding templates or macros that are unique to MCHP and therefore either inaccessible or may require modification.

Appendix A. Description of Health Administrative Data Sources
Medical Services Data-Manitoba database consisting of claims for physician visits in offices, hospitals and outpatient departments; fee-for-service components for tests such as lab and x-ray procedures performed in offices and hospitals; payments for on-call agreements (e.g., anaesthetists) that are not attributed to individual patients; as well as information about physician specialties, and shadow billings.
Discharge Abstract Database (DAD)-compiled by the Canadian Institute for Health Information and contains administrative, clinical (diagnoses and procedures/interventions), demographic, and administrative information for all admissions to acute care hospitals, rehab, chronic, and day surgery institutions across Canada.
Case Mix Grouper Data-supports the case mix group (CMG) component of the hospital abstract data, containing variables related to CMG codes, diagnosis codes, discharge date and times, and intervention codes.
National Rehabilitation Reporting System (NRS)-compiled by the Canadian Institute for Health Information and contains client data collected from participating adult inpatient rehabilitation facilities and programs across Canada.
Long Term Care Utilization-the LTC program, originally known as the Personal Care Home program, consists of records of chronic and rehabilitative services provided by long term care institutions in Manitoba. Long term care may be provided in hospitals, in chronic care beds, or in personal care homes. These records include information on admissions, separations, assessments, levels of care, and rate changes.
Drug Program Information Network (DPIN)-contains prescription drug claims from DPIN, an electronic, on-line, point-of-sale prescription drug database that connects Manitoba Health and all pharmacies in Manitoba. The DPIN system generates complete drug profiles for each client including all transactions at the point of distribution. Manitoba offers an income-based drug plan that covers drug costs after a deductible is reached based on the total adjusted family income, as well as for social assistance recipients.