Examining the Long-term Spillover Effects of a Pay-for-Performance Program in a Healthcare System That Lacks Referral Arrangements

Background: Several studies have examined the intended effects of pay-for-performance (P4P) programs, yet little is known about the unintended spillover effects of such programs on intermediate clinical outcomes. This study examines the long-term spillover effects of a P4P program for diabetes care. Methods: This study uses a nationwide population-based natural experimental design with a 3-year follow-up period under Taiwan’s universal coverage healthcare system. The intervention group consisted of 7688 patients who enrolled in the P4P program for diabetes care in 2017 and continuously participated in the program for three years. The comparison group was selected by propensity score matching (PSM) from patients seen by the same group of physicians. Each patient had four records: one pertaining to one year before the index date of the P4P program and the other three pertaining to follow-ups spanning over the next three years. Generalized estimating equations (GEEs) with difference-in-differences (DID) estimations were used to consider the correlation between repeated observations for the same patients and patients within the same matched pairs. Results: Patients enrolled in the P4P program showed improvements in incentivized intermediate clinical outcomes that persisted over three years, including proper control of glycated hemoglobin (HbA1c) and low-density lipoprotein cholesterol (LDL-C). We found a slight positive spillover effect of the P4P program on the control of non-incentivized triglyceride [TG]). However, we found no such effects on the non-incentivized high-density lipoprotein cholesterol (HDL-C) control. Conclusion: The P4P program has achieved its primary goal of improving the incentivized intermediate clinical outcomes. The commonality in production among a set of activities is crucial for generating the spillover effects of an incentive program.


Background
2][3][4][5] A P4P program is expected to enhance the quality of care by offering incentives to providers.Financial incentives reward certain services based on pre-determined benchmarks.These programs generally provide incentives to healthcare providers to improve the process of care (eg, checking the glycated hemoglobin [HbA1c] levels of patients with diabetes) and intermediate outcomes (eg, controlling the HbA1c levels of patients with diabetes).The measures are based on the recommendations of clinical practice guidelines.Good adherence to clinical guidelines can improve health outcomes for patients.Therefore, considering the incentives offered under P4P programs, healthcare providers might devote more effort to specific conditions or indicators that are rewarded and pursue better care outcomes for patients. 1owever, P4P programs may have potential unintended consequences, 1,2,4 such as spillover effects on neglecting conditions or activities that are not directly rewarded. 4olmstrom and Milgrom proposed the "multitasking problem" to illustrate the possibility of spillover effects. 6In a P4P scheme, healthcare providers might pay more attention to indicators tied to financial incentives and neglect those that are not incentivized.Therefore, P4P programs may improve the performance of incentivized items at the expense of the unfavorable performance of non-incentivized items, resulting in a decline in the quality of care for patients.This phenomenon is called a "negative spillover effect." The problem of multitasking might be alleviated if the activities undertaken in a process are similar or co-occur, because certain conditions or indicators rely on similar inputs. 7,8This means that, under a P4P program, nonincentivized activities may be indirectly rewarded when they share commonalities with incentivized activities.This phenomenon is called a "positive spillover effect" (or halo effect).Therefore, commonalities in production among a set of activities are crucial for generating positive or negative spillover effects from an incentive program.
Although the intended effects of P4P programs have been well documented, [1][2][3][4][5] little is known about their spillover effects, and the results tend to be inconclusive.Some studies have found that P4P programs have a positive spillover effect on non-incentivized conditions or indicators, [9][10][11][12] whereas others have found negative spillover effects, because of the possible neglect of non-incentivized aspects of care by healthcare providers over the long term. 13,14Some studies have found no spillover effect at all of P4P programs. 8,15The aforementioned discrepancies in the findings might be attributed to variations in the types of P4P programs implemented, the quality of the methodology used, and differences in the research setting.
Most empirical studies on the spillover effects of P4P programs have been performed in the United States 8,11,15 or the United Kingdom. 9,10,13,14The spillover effects of P4P programs in the United States have tended to be small, which might be attributable to the diluting effect of the small incentive size based on multiple payers and the phenomenon in which payers often consider only a fraction of the targeted healthcare providers. 2,3The P4P programs in the United Kingdom have been implemented nationally.Most of these programs are designed for primary care settings and reward general practitioners based on the quality-of-service delivery.However, healthcare systems in many Asian countries such as Japan, Korea, and Taiwan lack general practitioners (as gatekeepers) or formal referral mechanisms and are mainly specialist-based care.

Methods
This study used a natural experimental design with population-based longitudinal data to examine the long-term spillover effects of a P4P program.The analysis was based on claims data for healthcare utilization from 2016 to 2020 in Taiwan.
Pay-for-Performance Program for Diabetes Care Since the end of 2001, Taiwan's single-payer, the National Health Insurance Administration (NHIA), has implemented nationwide P4P programs for several chronic conditions.For the P4P program for diabetes care, physicians who specialize in metabolic disorders or endocrinology or those who have completed training for the diabetes shared care program can voluntarily participate in the P4P program for diabetes care.Participating physicians can recruit patients for the program.The P4P program pays participating physicians three fees: the P4P management fee for the initial enrollment visit, an extra fee for a comprehensive follow-up visit, and an annual evaluation fee.The required and recommended services include a medical history examination, physical examination (for example, ophthalmoscopic or foot examination), laboratory evaluation (for example, HbA1c check or cholesterol checks), management plan, and diabetes self-management education, which are clearly defined in the P4P program for diabetes care. 16utcome-based quality indicators have been gradually incorporated into the P4P program for diabetes care.Between 2001 and 2006, the financial incentives of the diabetes P4P program focused only on participation or process-based care (eg, performing HbA1c or cholesterol checks).In late 2006, the NHIA implemented a new strategy to better reward the health outcomes by paying an additional bonus for improvement in intermediate clinical outcome measures.A composite score was developed for each participating physician based on measures including the complete followup visit rate and control of HbA1c and low-density lipoprotein cholesterol (LDL-C) levels.Under this scheme, physicians receive an extra outcome-based reward (NT$ 1000 or US$ 33 per patient) if their composite scores rank among the top 25% of participating physicians.This plan is called the "pay for excellence" incentive.In 2009, the NHIA introduced a "payfor-improvement" incentive to encourage physicians to focus on improving the composite scores of patients.Physicians receive an extra reward (NT$ 500 or US$ 17 per patient) if the composite score of a patient improves or is maintained over two years. 16 Selection bias cannot be ruled out because the patients enrolled in the program were purposively selected by their physicians and voluntarily enrolled in the P4P program. 17,18To minimize the potential impact of selection bias, we adopted a two-step sample selection: identifying patients visiting the same group of physicians and using the propensity score matching (PSM) approach to select patients.In the first step, we identified the most frequently visited physicians for the diabetes care of the 9119 patients enrolled in the P4P program.All diabetic patients who visited the shortlisted set of physicians were identified.Patients who had never been enrolled in the P4P program were considered suitable for the comparison group.A total of 2649 physicians were consulted by 9119 patients (pre-matched) in the intervention group, and 21 651 patients (pre-matched) were selected via PSM in the comparison group.
Subsequently, in the second step, we used a PSM approach to select appropriate patients to form the comparison group. 19or each patient, we created a propensity score that estimated the probability of their enrollment in the P4P program based on their characteristics using a generalized estimating equations (GEEs) model considering the effects of patient clustering among particular physicians. 20The characteristics of patients considered in the matching process included their age, sex, Charlson comorbidity index (CCI) score, 21 diabetes complication severity index (DCSI) score, 22 likelihood of hospitalization in the previous year, and baseline HbA1c values.The characteristics of healthcare providers included in the model were the accreditation level 23 and location of the hospital or community clinic that was most frequently visited by a given patient.
Based on the propensity scores, we used the caliper matching method with 1:2 matching between the intervention and comparison groups.Finally, 7688 patients (post-matched) were enrolled from the P4P program, and the PSM process yielded 15 376 patients (post-matched) in the comparison group.In addition, we calculated absolute standardized differences in baseline characteristics between the intervention and comparison groups.][26] The observation period ranged from one year before the index date in 2017 to three years of subsequent follow-up.As the subjects in the comparison group did not have an index date unlike those enrolled in the P4P program, they were assigned a pseudo-index date of their matched counterparts in the intervention group.In all, 23 064 patients and 92 256 patient years were included in the analysis.The unit of analysis was the number of patient-years.

Measurement of Variables
In terms of dependent variables, four intermediate clinical outcome indicators were included in this study: two incentivized and two non-incentivized indicators under the P4P scheme.In the analysis, the two incentivized indicators measured whether the patient had proper control of their HbA1c (HbA1c <7%) and LDL-C levels (LDL-C <100 mg/dL) during the study period.High-density lipoprotein cholesterol (HDL-C) and triglyceride (TG) levels are also crucial for the management of dyslipidemia in patients with diabetes. 27herefore, we included two non-incentivized indicators to measure whether the patient had proper control over their HDL-C (HDL-C >40 mg/dL) and TG levels (TGs <200 mg/ dL).
The leading independent variables were patient enrollment in the P4P program, time dummy variables for the three years after the index date, and three interaction terms for the previously described variables.The following covariates were controlled in the regression models: patient characteristics (sex, age, CCI score, DCSI score, and likelihood of hospitalization in the previous year) and healthcare provider characteristics (accreditation level and location).

Statistical Analyses
To examine the spillover effects of the P4P program, we fitted the GEEs with difference-in-differences (DID) estimation to longitudinal data that considered the correlation between repeated observations for the same patients and patients within the same matched pairs as well as considered the unobserved time-invariant characteristics for patients. 20The likelihood of the patient having proper control on the four intermediate clinical outcome indicators was analyzed using a logit link function and had a binominal distribution.The specifications are as follows: where Y it is the likelihood of good control on an intermediate clinical outcome for patient i during period t.The P4P it dummy indicates whether the patient was enrolled in the P4P program.Year1 it , Year2 it , and Year3 it are dummy variables that indicate observations for over 3 years after the index date of the P4P program.The DID estimates of the effects of the P4P program are captured by the coefficients β 5 , β 6 , and β 7 on the three interaction terms P4P it × Year1 it , P4P it × Year2 it , and P4P it × Year3 it , respectively.Parameters X it and Z i represent a set of variables that measure time-variant and time-invariant covariates (participant's sex).All statistical analyses were performed using the SAS version 9.4 (SAS Institute) and Stata version 15.1 (StataCorp).

Descriptive Analysis
Table 1 lists the baseline characteristics of the pre-and postmatched participants in the intervention and comparison groups.In the pre-matched sample, patients diagnosed with diabetes in the intervention group were younger, had higher DCSI and CCI scores, and controlled their baseline HbA1c levels less well.Furthermore, they had a higher likelihood of hospitalization in the previous year and tended to receive care at a medical center/regional hospital.The PSM process resulted in a more balanced distribution of the characteristics of patients and their care providers in the intervention and comparison groups.All absolute standardized differences were less than 10%, indicating acceptable matching results.
Figure presents the changes in the outcome measures over the study period.For the incentivized indicators, proper control of HbA1c levels was lower in the intervention group than in the comparison group in the baseline year (43.59% vs. 45.05%).After the first year of the P4P program, the rate of proper control of HbA1c increased in the intervention group (49.08%) and was higher than that in the comparison group (46.85%).For proper control of LDL-C, the rates in the intervention and comparison groups increased after the intervention (ranging from 56.18% to 70.51% in the intervention group; 57.30% to 66.77% in the comparison group).For non-incentivized indicators, we found that the rates of proper control on HDL-C steadily increased in both the intervention and comparison groups over four years.Finally, we found a noticeable increase in the rate of proper control of TG levels in the intervention group after enrollment, compared with a slight increase in the comparison group.
Results From Regression Models Table 2 lists the results obtained from the DID estimates in the GEEs model.For the incentivized indicator HbA1c, patients enrolled in the P4P program were more likely than their counterparts to have proper control of their HbA1c levels during the post-P4P period.The odd ratios (ORs) of the interaction terms were positive and significant from the first to third years after the intervention (OR was 1.166 in the first year [95% confidence interval [CI]:   2).

Sensitivity Analysis
Sensitivity analyses were performed to validate the robustness of the findings.First, we used a continuous scale of the intermediate clinical outcome variables instead of a binary one.The analyses yielded results similar to those described earlier (Supplementary file 1, Table S1).Second, we performed analyses stratified by sex and found partly similar results (Table S2).Third, we performed analyses stratified by the status of multiple chronic conditions (<2 chronic conditions or ≥2 chronic conditions, excluding diabetes).In this study, the number of chronic conditions was measured using nine common chronic conditions that often require ongoing medication management, including hypertension, heart disease, cerebrovascular disease, pulmonary disease,  chronic renal disease, arthritis/degenerative joint diseases, depression/anxiety-associated diseases, and cancer.Most of the results were similar to previous findings, except for patients with ≥2 chronic conditions, in terms of TG outcomes (Table S3).Finally, we used random-intercept models with DID estimation to consider unobserved time-invariant covariates and considered the correlation between repeated observations for the same patient.Instead of the odds ratios, we calculated the average marginal effects for all estimates and used bootstrapping with 100 replications to acquire standard errors (Table S4).Previously, a similar approach has been used in the econometric literature.These sensitivity analyses yielded results similar to our main results.

Effect of P4P Programs on Incentivized Outcomes
The effect of a P4P program on intermediate clinical outcomes tended to be inconclusive in a previous systematic review. 5For example, under the Quality of Outcome Framework (QOF) in the United Kingdom, Vamos et al 28  For patients with diabetes, a high TG or low HDL-C level is associated with a higher risk of vascular complications. 30herefore, controlling patients' TG and HDL-C levels is essential for diabetes management, which in turn affects the overall health outcome.The findings from this study suggest that the P4P did not affect the control of non-incentivized HDL-C levels among the intervention group even three years after enrollment.However, we found that the P4P program had a positive spillover effect on TG control (another nonincentivized parameter) in the first and second years after the P4P program.The inconsistent findings of the spillover effects seem intriguing; however, when we consider "commonality in production, " these seem reasonable.In the P4P program, the multitasking problem can be mitigated when the procedures used by physicians to address the incentivized and nonincentivized indicators are similar. 7,8We observed that the improvement in TG control (a non-incentivized outcome) occurred because it was addressed by physicians in a manner similar to that of HbA1c control (an incentivized outcome).The recommendations of the American Diabetes Association emphasize that lifestyle interventions and glycemic control are both beneficial for improving TG control. 27For example, metformin is a first-line pharmacological agent used for glycemic control in type-2 diabetes, and evidence has shown that metformin can significantly decrease TG levels in patients with diabetes. 31In other words, for patients enrolled in the P4P program for diabetes care, physicians might regularly monitor their blood glucose levels and improve their glycemic control using medications such as metformin, which also improves the TG level.However, the improvement in HDL-C control was not affected by the use of similar treatments.The effects of metformin on HDL-C levels are less evident. 32,33ifestyle interventions such as weight loss, exercise, and diet control are better ways to manage HDL-C levels. 27Therefore, our findings support the idea that commonalities among a set of activities are crucial for financial incentive programs to generate positive or no spillover effects. 8,12mitations This study had some limitations.First, the participants were not randomly assigned to the intervention or comparison groups, which may have caused selection bias.In this study, we used two-step sample selection strategies to minimize selection bias: selecting patients from the same physician and using PSM.Although the PSM matching strategy may have increased the similarity of participants in the intervention and comparison groups at baseline, it does not ensure that the two groups were similar before the intervention.Owing to the lack of available intermediate clinical outcome data, we performed a placebo DID analysis using the participants' diabetes-related hospitalizations as proxies to examine the parallel trend assumption before the intervention.During 2007 and 2016 (before the P4P intervention in 2017), the placebo DID demonstrated small and insignificant changes in hospitalization for diabetes-related conditions, which implied similar trends in the two groups before the intervention (Table S5). 34Finally, the participants in this study were representative of patients with at least three diabetes-related physician visits and laboratory test results, which might not represent all diabetes patients in Taiwan.

Conclusion
This is one of the first studies to evaluate the long-term spillover effects of a P4P program on intermediate clinical outcomes under a single-payer healthcare system.This study provides evidence showing that the P4P program has significantly improved incentivized outcome measures.In addition, the P4P program has a positive spillover effect on a non-incentivized outcomes when the related measure is treated in a manner similar to that of the incentivized indicators.However, this study found no spillover effect on non-incentivized outcomes when the related measure is not treated in a manner similar to that of incentivized indicators.We speculate that the problem of multitasking can be mitigated if a set of indicators is addressed using similar treatment measures.Therefore, we suggest that the incentive design of P4P programs should consider "commonality in procedure" to facilitate positive spillover effects and avoid negative spillover effects.Future investigations are necessary to examine the spillover effects of a financial incentive program with both "targeted vs. non-targeted conditions" and "incentivized vs. non-incentivized indicators" under various health-care settings and adopt a more extended follow-up.

For
the non-incentivized indicator TGs, the result was similar to that of the aforementioned incentivized indicators.Patients in the P4P program were more likely than their counterparts to have proper control of their TG levels in the first and second years after the P4P program (OR of 1.114 [95% CI: 1.044-1.189],1.122 [95% CI: 1.047-1.202]).However, there was no significant effect in the third year after the P4P program (OR of 1.051 [95% CI: 0.980-1.128]).Conversely, we found no spillover effects of the P4P program on the nonincentivized HDL-C control in the three years after the P4P program (OR of 1.036 in the first year [95% CI: 0.985-1.089];0.986 in the second year [95% CI: 0.936-1.040];and 0.974 in the third year [95% CI: 0.922-1.029])(Table
In short, both process-based (eg, HbA1c check) and intermediate clinical outcome-based indicators (eg, , we included only patients with at least three diabetes-related physician visits and laboratory test results every year from 2016 to 2020.Patients with diabetes who were enrolled in the nationwide P4P program in 2017 were included in the intervention group.The index date for each patient was defined as the date of enrollment in the P4P program between January 1, 2017 and December 31, 2017.The study included only participants who remained in the program throughout the observation period from 2017 to 2020.

Table 1 .
Characteristics of Patients With Diabetes in the Pre-and Post-matched Samples at Baseline Abbreviations: DCSI, Diabetes Complication Severity Index; CCI, Charlson Comorbidity Index; SD, standard deviation; HbA1c, glycated hemoglobin.

Table 2 .
Adjusted Estimations of the Effects of the Pay-for-Performance Program on the Incentivized and Non-incentivized Intermediate Clinical Outcomes 29d Alshamsan et al29performed an interrupted time series analysis and found that the trend in the proper control of HbA1c in patients with diabetes after the introduction of the QOF worsened relative to the pre-QOF trend.In contrast, we found that the P4P program for diabetes had a positive long-term impact on incentivized intermediate outcomes, including proper control of HbA1c and LDL-C levels in patients.Several possible explanations can explain this outcome. Fit, the universal QOF scheme is designed for primary care settings to reward general practitioners for achieving quality indicators.
spill over to other non-incentivized aspects of care, resulting in a decline in the overall quality of care among patients with diabetes.Under the P4P program for diabetes care in Taiwan, checks of patient TGs and HDL-C (a process of care) are incentivized, but control of patients' TG and HDL-C levels (intermediate clinical outcomes) are not incentivized.