The use of individual and multilevel data in the development of a risk prediction model to predict patients’ likelihood of completing colorectal cancer screening

Promotion of colorectal cancer (CRC) screening can be expensive and unnecessary for many patients. The use of predictive analytics promises to help health systems target the right services to the right patients at the right time while improving population health. Multilevel data at the interpersonal, organizational, community, and policy levels, is rarely considered in clinical decision making but may be used to improve CRC screening risk prediction. We compared the effectiveness of a CRC screening risk prediction model that uses multilevel data with a more conventional model that uses only individual patient data. We used a retrospective cohort to ascertain the one-year occurrence of CRC screening. The cohort was determined from a Health Maintenance Organization, in Oregon. Eligible patients were 50–75 years old, health plan members for at least one year before their birthday in 2018 and were due for screening. We created a risk model using logistic regression first with data available in the electronic health record (EHR), and then added multilevel data. In a cohort of 59,249 patients, 36.1% completed CRC screening. The individual level model included 14 demographic, clinical and encounter based characteristics, had a bootstrap-corrected C-statistic of 0.722 and sufficient calibration. The multilevel model added 9 variables from clinical setting and community characteristics, and the bootstrap-corrected C-statistic remained the same with continued sufficient calibration. The predictive power of the CRC screening model did not improve after adding multilevel data. Our findings suggest that multilevel data added no improvement to the prediction of the likelihood of CRC screening.


Background
Use of predictive analytics in healthcare has been steadily rising as health systems leaders are recognizing their value in targeting and optimizing care for patients at high risk of negative health outcomes (Miller, 2019).Predictive analytics is a way to maximize the utility of healthcare expenses through precision delivery of care in our resourceconstrained healthcare climate (Bresnick, 2018).Understanding which patients most benefit from outreach can help care delivery systems prioritize spending on patients who need it most.
Risk prediction models can be used to identify populations at risk for disease or adverse events (Jeffery et al., 2019;Benuzillo et al., 2019).However, models have not always been built using a broad patient population or multi-level data (Jeffery et al., 2019;Moons et al., 2019;Parikh et al., 2019;Kent et al., 2018).Multilevel data can reveal influences that individual level data cannot.While individual characteristics may predict health behaviors such as cancer screening, emerging evidence is increasingly pointing to the outsized role of external factors such as interpersonal (friends, family, providers), organizational data (clinic setting and provider characteristics), community characteristics and other relevant publicly available data (census data) in influencing behaviors.(Zapka et al., 2012) The use of predictive analytics in combination with multi-level data could be one way to recognize group membership and individual characteristics simultaneously.Advancements in the application of predictive analytics in addressing health needs may include increasing access to more diverse sources of data.
To our knowledge, predictive analytics have not been applied to predict a patient's likelihood of screening for colorectal cancer (CRC).CRC screening can save lives because precancerous polyps or early stage cancers are detected and removed (Safayeva and Bayramov, 2019).Yet, only 69% of age eligible adults in the general population were screened for CRC in 2020.(Richardson et al., 2022) This means that a staggering 21.7 million people are not up to date on their CRC screening (CDC, 2018).Patient, provider, and system level characteristics have been found to contribute to an individual's failure to screen for CRC (Muthukrishnan et al., 2019;Yu et al., 2018;Weiss et al., 2013), suggesting that a multi-level risk prediction model might identify patients who are unlikely to screen for CRC.
This study sought to develop and validate a risk prediction model to predict patients' likelihood of screening for CRC.First, we used individual-level data available from the electronic health record (EHR) to predict CRC screening and compared the performance of this model to a model where we added multilevel data obtained from US Census, American Community Survey, Department of Health, and locally available data.

Setting and patient population
The study was conducted at Kaiser Permanente Northwest (KPNW), an integrated health and financing system that provides health insurance coverage to about 606,000 members and dental insurance coverage to approximately 280,000 members in northwest Oregon and southwest Washington, in the United States.The study met the KPNW guidelines for protection of human subjects concerning safety and privacy (KPNW IRB #00000405, 1/6/21).The outcome of interest was CRC screening completion in 2019, based on HEDIS criteria, including fecal testing, FIT DNA, or colonoscopy in the year.(Centers for Disease Control and Prevention) All study procedures were reviewed and approved by the KPNW institutional review board; informed consent was waived as a data only study.
Eligible patients were identified through the EHR at KPNW (Fig. 1).Patients were determined eligible if on their birthday in 2018 if they were due for screening and aged 50-75 (n = 192,447).Eligible patients had to have at least 1 year of membership prior to their birthday (n = 169,871).Predictors were included from clinical encounters closest to but prior to their birthday in 2018.Outcomes were assessed for up to one year following their birthday.Patients were excluded if they were not recommended for screening due to comorbid conditions like a history of CRC, had a prior colectomy or if they were in end-of-life care (n = 163,809 remaining eligible patients) (Centers for Disease Control and Prevention, 2020).Patients were also excluded if they were current for screening by way of FIT in the past year, colonoscopy in the past 10 years, FIT DNA in the past 3 years, or flex sigmoidoscopy or virtual colonoscopy in 5 years (n = 104,617 (64%) excluded) (Centers for Disease Control and Prevention, 2023).Of the remaining 59,249 eligible patients, 36% were found to have been screened, which is the primary outcome for analysis.

Model development
The design of this study was guided by the Social Ecological Model (SEM) multilevel approach to colorectal cancer promotion, which suggests multiple bands of influence (individual, interpersonal, organizational, community, policy) (Zapka and Lemon, 2004).The selection of potential predictors was guided by literature on predictors of CRC screening, potential measures were identified by level of influence in the SEM framework.Available data included 21 individual level, 5 interpersonal level, 5 organizational level, 29 community level and 1 policy level variables.Individual level predictors included individual level data     (Collins et al., 2015).
Characteristics of the population of patients due for screening were assessed for variability in demographics, such as race, ethnicity, or insurance type.Some predictors were excluded prior to use due to unavailability (i.e., immigration status, income), missingness (i.e., education level, provider race/ethnicity match), or lack of distribution (i.e., having a primary care provider, or health literacy score).The Prediction Risk of Bias Assessment Tool (PROBAST) was used to reduce the risk of bias (ROB) by adhering to the 20 PROBAST principles (Wolff et al., 2019).Participants, predictors, outcomes, and analysis bias was assessed through questions to determine population characteristics disproportionate to the overall population, predictors with missingness in populations, and outcomes ratios based on the original population.Then observed screening rates for different groups identified in PRO-BAST were evaluated for differences (in % screened) from the overall eligible population.The sources of data were assessed for the ROB and the applicability to the question.The final predictors were assessed for missingness for all eligible patients in the model, and screening completion determined prior to analysis.

Model performance Assessment
Characteristics predicting the likelihood of screening were evaluated using a multivariate logistic regression model in SAS® System Software.The model assumes an absence of screening for patients who died or discontinued insurance coverage in the year of follow-up.For the first model, only available individual level data from the EHR and administrative datasets was used.The multilevel model incorporated the external multilevel data, and then tested the applicability of the final model to a subpopulation of patients who have been historically underserved by the medical system including African American or Black, Asian, Native American, Native Hawaiian, Pacific Islander (non-White).
For each model, a full model of patients with complete data for that model was fit.Then, guided by Harrell's methods, a step-down method was used to manually remove the least predictive characteristics one covariate at a time to make the model parsimonious such that the final model retained at least 95% of the variation explained in the full model.(Heinze et al., 2018;Harrell et al., 1998).
For the final model, the observed and predicted risk of screening was calculated and plotted in quintiles using risk predictiveness curves that show the distribution of observed and predicted risks of completing screening.(Pepe et al., 2008;Lumley et al., 2002) Discrimination, or the ability to determine who will screen versus not, was measured by a bootstrap corrected C-statistic.(Moons et al., 2019) The explained variation was measured with an R 2 statistic, an R 2 of 0% means the model failed to explain variation, an R 2 of 100% means the model explained all of the variation.(Royston, 2006).The calibration was measured by the Integrated Calibration Index (ICI), which assesses the difference between the model's calibration and perfect calibration.(Austin and Steyerberg, 2019) Validation of the model was internal using bootstrapping.All analysis was initially conducted in STATA 17©, and then replicated in SAS (bootstrapping, C and R 2 ) and R (ICI) by a KPNW analyst.
The first model was developed in a full dataset of patients who are due for CRC screening using individual level data from the EHR.The multilevel model used the same individual level dataset and incorporated multilevel data from databases at KPNW, and publicly available data.The statistical improvements in the model were then assessed when multilevel data was added.The model was validated internally using the preferred bootstrapping approach, which is a way to determine concordance and predict the fit of a model to a series of hypothetical datasets when other validation techniques are not available.(Steyerberg and Harrell, 2016) Bootstrapping is superior to separate training and testing of the data.(Austin and Steyerberg, 2017).
Data was first accessed for availability at each level of the SEM framework.Predictors were omitted from analysis due to missingness, and imputation was performed when possible.BMI had 9.0% of the values missing, imputation was performed by age group.Imputation was conducted in STATA, which uses the predictive mean matching imputation method.Language is not an imputable variable, so patients missing language (3.6%) were grouped with the most common category,   English.Biological sex had 15 missing values (0.02%), these patients with missing data were removed from the analysis, as the models fit a population with no missing data.

Individual level model (EHR data)
The full model included 98.97% of the population (n = 59,234) as 15 patients were missing sex and removed from the analysis.The screening outcome was completed for 36.1% of the population (n = 21,337), 90.7% were screened by FIT, and 14% were screened by colonoscopy (not mutually exclusive, patients may have been screened by more than one modality, data not shown).Less than 1% of patients were screened by Flexible Sigmoidoscopy, FIT DNA, or virtual colonoscopy.In the population, 6,041 (10.0%) discontinued KPNW insurance, and 369 (0.6%) died during the year of follow-up.
The variables in the full model included 21 individual characteristics (Table 1).Two variables, prostate screening, and mammogram screening were combined with sex.This leaves 19 remaining variables in the initial full model.The performance of the full model was adequate with a naive C-statistic of 0.724 and an R 2 of 0.112.The step-down process was then used to simplify the model, based on Akaike's information criteria (AIC) and the change in R 2 (Steyerberg, 2019).First, the AIC was determined for each variable, variables were ranked from lowest to highest, and the least contributory variables were removed one by one in AIC order until the model R 2 dropped to no lower than 0.1105 (99%).Insurance group, language group, interpreter, BMI, inpatient visits, were all removed from the model while retaining 99% of the predictive value.
The 14 retained characteristics include prior CRC screening and preference, age group, prior preventive screening and sex, enrollment in KP.org (patient portal), dental membership and visits, number of outpatient visits, number of missed appointments, prior influenza vaccination, Charlson comorbidity score, race, Hispanic ethnicity, membership for 5 years or more, membership for less than 2 years, and substance abuse (Table 2.).
The performance measures used for evaluation were bootstrapped C and R 2 and calibration.The model was also validated internally using bootstrapping (500 bootstraps) as described, which showed adequate performance with a bootstrap corrected C-statistic of 0.722 (Table 3).The calibration was also determined by plotting the observed and predicted risks of the reduced model by quintiles of predicted risk (Fig. 2).Calibration was also determined by calculating the ICI, which shows  excellent calibration (0.013).If the observed and predicted values agreed perfectly, the ICI would be 0.0.Consequently, the model is only miscalibrated by an average probability of 0.01.The calibration of the observed and predicted risk appears to be sufficient, with close calibration between observed and predicted risk at all levels.

Multilevel model
The multilevel model incorporates data outside of the individual level based on the SEM Framework.The full population model was fit for the multilevel data in the same way it was fit for the individual level model described above.The starting population was identical to the individual model.Patients were also removed from the full model due to missing data (n = 1,194).Missing data included patients who were missing sex (n = 15), interpersonal level data (i.e., address changes n = 186), provider level data (n = 415), clinic level data (n = 159), and miscellaneous information at the county level (i.e., facilities was missing for n = 834 patients).The remaining sample size in the full model is 98.0% of the full population (n = 58,040).The screening outcome was completed for 36.3% of the population in the model (n = 21,068).
The 57 variables included in the full model include the 19 individual characteristics described above, 5 interpersonal characteristics, 3 organizational characteristics, 29 community characteristics, and 1 policy level characteristic.Collinearity was assessed among the variables in the multilevel model.At the organizational level, the patient to provider ratio variable was redundant with clinic size and clinic location (city) was redundant with primary care clinic; the program removed patient to provider ratio and clinic location from the model.
The performance of the full model was adequate with a naive Cstatistic of 0.725 and an R 2 of 0.113.The model used 110 degrees of freedom.The step-down process was then used to simplify the model, based on AIC and the change in R 2 28 , using the same process as in the initial model.
The multilevel model retained 23 variables after stepdown (Table 4).All variables from the reduced, individual level model were included, as well as inpatient visits and BMI.Variables retained from the interpersonal level were address changes and provider gender match.From the organizational level, wait time, clinic size, and primary clinic were retained.From the community level, facilities (number of healthcare facilities > 15/100 K residents) and median family income were retained.From the policy level, the only variable (STATE) was not retained.
The final reduced model R 2 was 0.1119 and had 70 degrees of freedom, the reduced C-statistic was 0.7242, and the ICI is 0.0130 (Table 3).Bootstrapping and calibration were used to evaluate the performance of the model.The model was validated internally using bootstrapping (500 bootstraps), which showed adequate performance with a bootstrap corrected C-statistic of 0.7218.The calibration was also determined by plotting the observed and predicted risk of the reduced model (Fig. 3).

Model comparison
The overall screening population (eligible for this analysis) was approximately 9.8% (59,249/617,073) of the overall age eligible population of KPNW (Appendix Table A1).Compared to the overall population of KPNW, a greater proportion of the population identified as due for screening, female, Native American or American Indian, White, and enrolled in Medicare.There are fewer patients in the screening population who are Asian, Black, Hawaiian or Pacific Islander, patients that list "Unknown" race, Hispanic ethnicity, and have Medicaid insurance.The population that completed screening is 52.7% female; 6.1% Asian; 2.8% Black; less than1% Hawaiian or Pacific Islander, Native America or American Indian, or identify as "other" race;7.3%have "unknown" race; 81.8% White; 5.9% Hispanic; and 27.9% are on Medicare; and 3.4% are on Medicaid (Table 1.).
The individual model intercept is 0.298 (95% CI 0.271, 0.326), and the multilevel model intercept is 0.235 (95% CI 0.125, 0.443).The clinical utility of the models could be used in mapping predicted probabilities of screening.To apply clinical utility, we have presented points for the variables in each model (Appendix A2.).
A comparison analysis was conducted to ensure that the model was not only applicable to the prevalent Northwest demographic population (insured and White patients).Subpopulations considered for a comparison analysis were the Medicaid and non-White groups.The Medicaid patient subgroup (n = 2613) was determined to be too small to handle the multilevel analysis.While we would have ideally been able to apply the model to individual subgroups of patients historically underserved by the medical system, no single group had enough cases for individual analysis.Therefore, the non-White group (n = 13,655) was the subpopulation for comparative analysis.To assess the risk of algorithmic bias in the models, they were fit in a non-White subpopulation.
In the individual model for the subgroup analysis, the model was fit for 99.94% of the subpopulation (n = 12,676), 8 patients were missing sex and eliminated from the model.The screening outcome was completed for 31.0% of the 50-75 aged population (n = 3,933).The individual model in the subpopulation was slightly improved from the full population with an R 2 of 0.1364, C-statistic of 0.7490, and ICI of 0.0183 (Appendix Table A2).The multilevel model was fit for 96.05% of the population (n = 12,184) in the subpopulation.Patients were missing data (n = 500) as described above in the full multilevel model, including missing provider, clinic, and community level data.The screening outcome was completed for 31.5% of the population (n = 3,838).The subpopulation multilevel model was slightly improved from the full population with an R 2 of 0.1369, and the C-statistic of 0.7484, and the ICI is 0.0206.

Discussion
Predictive analytics and multilevel data are increasingly used in population health management and offer clinical decision support at the point of care.The use of predictive analytics can allow systems to personalize care based on an individual's risk of certain events.(Parikh et al., 2016) Our findings show that an individual reduced model using 14 variables commonly available in EHR databases was able to adequately predict CRC screening to be used in clinical care to optimize the delivery of CRC promotion efforts.The multilevel reduced model, which added components at the individual, interpersonal, organizational, and community levels (and used 23 variables), only modestly improved model performance.The bootstrap-corrected C-statistics are identical; the R 2 statistic is only 1% higher, and the ICI is 3% better on a relative scale for the multi-level model.
No other models that test the likelihood of obtaining screening for CRC were identified in the peer-reviewed literature.However, past literature has found that risk prediction is useful in tailoring screening programs to inform patients of the risk of developing cancers (Saya et al., 2020).Some models have used a few available multilevel variables, but no risk prediction models have focused on the performance improvement after the addition of multilevel data.The lack of information about adding multilevel data could be attributed to limitations in the availability of these data, and the lack of contributions to the model, as we found.
There is potential clinical use for either the individual or multilevel model.There is a pragmatic use of the reduced individual model that is the most simplistic way to identify patients' likelihood of screening.In this sense, the individual model could be used to target patients for interventions aimed at increasing indicated screening.In addition, the parsimony of the individual model reduces the effort and potential resources needed to acquire and maintain multilevel data as well as the generalizability outside KPNW or other closed systems of care (e.g., VA).Our model holds promise to serve as a tool to efficiently identify patients' likelihood of screening, so that screening promotion efforts could be directed to those most likely to benefit.Both the individual model and multilevel models are sufficient in discrimination (C > 0.6) (Steyerberg et al., 2010).
The literature has addressed the benefits of looking at multiple levels of data and the groups to which people belong.The literature shows that the different levels are linked or interconnected and that levels can be synergistic (Rousseau, 1985;Weiner et al., 2012;Taplin et al., 2012) Nevertheless, the multilevel data did not have the expected impact on our model.This may be because the multilevel variables' influence on screening may be captured by already included individual level data that acts as a proxy for multilevel data.Or the KPNW patient population may be too homogenous to capture community and policy level differences.Multilevel data that may be beneficial to models is not always available in EHR and administrative databases.For example, income, education, marital status and physician recommendation are known predictors of screening yet were unavailable in the data (Beydoun and Beydoun, 2008;Guessous et al., 2010;Zimmerman et al., 2006;Laiyemo et al., 2014;Seeff et al., 2004).Further, multilevel data may be less reliable than individual level data.Applicability was assessed for a non-White population.Both models showed improved performance after application to the non-White patients, reassuring concerns of applicability when applied diverse populations.Further, the non-White population, was chosen partly because they have greater variation in screening rates.The screening outcome was present in 36.1% of the full population and only 31.2% of the non-White subpopulation.Rates of CRC (adenoma prevalence) have been found to be higher in Black patients but determined to be predominately due to lower rates of screening.(Rutter et al., 2021) The non-White population (n = 13,655) provided an adequate subgroup for the application of the large model.Both the individual and multilevel models performed better statistically in the non-White population than in the full population, as the R 2 and C-statistics in both models were both higher.The health system should use the non-White version specific to that population, as the model calibration is better in the sub-population model.
There are several limitations in this study.Ideally, a model would be created using data from multiple health systems with diverse geographical and patient populations, for a greater understanding of applicability to subpopulations and generalizability.While internal validity was determined through bootstrapping, transferability will have to be tested through external validation.The model would need to be redeveloped in the population where it is to be used, as there may be different dependencies in individual and multi-level data due to factors like culture.This model was developed in an integrated delivery and financial system (KPNW) in a single geographic region (Pacific Northwest).The geographical (community level) and organizational variation limit the impact of multilevel data, where it could be more valuable in a wider and more diverse population.
There are a variety of ways this project could expand to future research.A series of projects could externally validate the model.Externally validating the model in community clinics, other integrated systems, other geographical areas, or in a combination of environments could determine a more broad-based utility and generalizability of the model.The multilevel data could improve the model in other environments.Other multilevel data that was not included due to lack of distribution, unavailability, or missingness could improve a model in another setting.This model could also be used to prioritize interventions to increase CRC screening.

Table 3
Performance Characteristics of Individual and Multilevel Models.
2 statistic, represents the model's predictive ability, the agreement between an individual's predicted and observed risk of the event; ICI statistic represents the model's weighted difference between observed and predicted probabilities A.F.Petrik et al.available in the EHR(demographic, encounter, enrollment, diagnosis,  and procedure data).Interpersonal level predictors were also from the EHR (marital status, physician recommendation, provider characteristics).Organizational level predictors included administrative data (clinic characteristics).Community level predictors included Census and County level data (i.e., education by census block, median household income, neighborhood deprivation index, percent Hispanic ethnicity, rurality), and policy level predictors included State characteristics (CRC screening rates, Medicaid expansion, incentivized metrics).For EHR data, all predictors were identified in clinical records closest to but prior to the patient's birthday in 2018.Predictors were defined and assessed in the same way for all participants.Predictor assessments were made without knowledge of the outcome data.Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) reporting guidelines were used in development of this manuscript.

Table A1
Screening Population Comparison.

Table A2
Performance statistics for all prediction models.

Table A3 Model
Points per Predictive Characteristic.