Risk-adjusted cesarean section rates for the assessment of physician performance in Taiwan: a population-based study

Background Over the past decade, about one-third of all births nationwide in Taiwan were delivered by cesarean section (CS). Previous studies in the US and Europe have documented the need for risk adjustment for fairer comparisons among providers. In this study, we set out to determine the impact that adjustment for patient-specific risk factors has on CS among different physicians in Taiwan. Methods There were 172,511 live births which occurred in either hospitals or obstetrics/gynecology clinics between 1 January and 31 December 2003, and for whom birth certificate data could be linked with National Health Insurance (NHI) claims data, available as the sample for this study. Physicians were divided into four equivalent groups based upon the quartile distribution of their crude (actual) CS rates. Stepwise logistic regressions were conducted to develop a predictive model and to determine the expected (risk-adjusted) CS rate and 95% confidence interval (CI) for each physician. The actual rates were then compared with the expected CS rates to see the proportion of physicians whose actual rates were below, within, or above the predicted CI in each quartile. Results The proportion of physicians whose CS rates were above the predicted CI increased as the quartile moved to the higher level. However, more than half of the physicians whose actual rates were higher than the predicted CI were not in the highest quartile. Conversely, there were some physicians (40 of 258 physicians) in the highest quartile who were actually providing obstetric care that was appropriate to the risk. When a stricter standard was applied to the assessment of physician performance by excluding physicians in quartile 4 for predicting CS rates, as many as 60% of physicians were found to have higher CS rates than the predicted CI, and indeed, the CS rates of no physicians in either quartile 3 or quartile 4 were below the predicted CI. Conclusion Overall, our study found that the comparison of unadjusted CS rates might not provide a valid reflection of the quality of obstetric care delivered by physicians, and may ultimately lead to biased judgments by purchasers. Our study has also shown that when we changed the standard of quality assessment, the evaluation results also changed.


Background
The rise in cesarean section (CS) delivery rates in Western nations and many developing countries has become an issue of major public health concern. Over the past decade, about one-third of all births nationwide in Taiwan were delivered by CS, with the proportion having fluctuated between 32.39% and 34.47% [1]; indeed, cesarean deliveries have now become the most common of all major surgical procedures currently being undertaken under the country's National Health Insurance (NHI) program. Although considerable controversy remains as to whether elective CS can provide any benefits, either to the newborn child or to the parturient [2, 3], the medical costs associated with the high CS rate in Taiwan have nevertheless become a tremendous financial burden on the island's finite healthcare resources.
In order to ensure the optimum level of quality care for pregnant women and their infants, the Bureau of NHI (BNHI) has placed the CS rates of all providers under close scrutiny, adopting actual quarterly average rates by region and/or hospital accreditation levels, as a means of evaluating hospital/physician obstetric practice patterns [4]. However, numerous studies have consistently provided evidence to show that differences in case mixes among patient populations of different healthcare providers in a wide range of settings may lead to bias in the judgment of whether a provider truly has a high CS rate. Those studies suggest that prior to any comparison being undertaken between different providers, all CS rates should be adjusted for risks [5][6][7][8][9]. A pioneering study in Taiwan by Hsu et al. [10] pointed out the unfairness of using crude CS rates as a means of comparing hospital performance within the Taipei Municipal Hospital System, and also the inappropriateness of drawing conclusions based on crude rates on the link between hospital practice patterns and quality of care.
Prior research into risk-adjusted CS rates tended to focus on profiles of hospital CS rates; thus, to the best of our knowledge, no study has yet been carried out on case mix adjustment based upon profiles of individual physicians' CS rates. This clearly hinders the efforts by policymakers to develop effective interventions aimed at lowering the CS rates at the physician level. This study therefore took advantage of a unique dataset which merges birth certificate data with the NHI claims data, to determine the impact that adjustment for patient-specific risk factors has on CS rates among different physicians in Taiwan. The findings of this study will not only facilitate more-accurate comparisons of CS rates among physicians, but can also help policymakers target their interventions at specific physicians.

Data collection
All live births (n = 176,399) occurring in either hospitals or obstetrics/gynecology clinics between 1 January and 31 December 2003, and for whom birth certificate data could be linked with the NHI claims data, were selected as the sample for this study. The mother's date of birth, along with her unique personal identification number, provided the link between the birth certificate data and the NHI claims data.
The birth certificate dataset contains various parental demographics (including age, the highest education level achieved, marital status, and county of residence), infant gestational age (in weeks), birth weight (in grams) and gender, as well as details on multiple pregnancies and the mother's gravidity. The NHI claims data contain information on all deliveries occurring in NHI-contracted hospitals and clinics (over 92% of all healthcare institutions), including the method of delivery, the characteristics of the hospital/clinic and attending physicians, as well as one principal diagnosis code, and up to four secondary diagnosis codes for each hospital admission, from the International Classification of Disease, Ninth Revision, Clinical Modification (ICD-9-CM).
Those cases with missing data (n = 924), and those cases for which delivery was carried out by physicians who had fewer than 20 deliveries in 2003 (n = 2964), were excluded from the sample. The reason for the exclusion of the latter group of physicians was to ensure that all physicians included within this study had adequate obstetrics practical experience. We were ultimately left with a study sample of 172,511 deliveries for analysis in this study, comprising 60,079 births by CS and 112,432 vaginal deliveries.
The availability of unique physician identifiers within the claims data for each medical claim submitted enabled us to identify the same physician carrying out one or more deliveries in 2003; data for that year indicated that 1031 physicians delivered the babies of the 172,511 sampled patients. We divided the physicians into four equivalent groups based upon the quartile distribution of their crude (actual) CS rates during the period of study. Those physicians whose CS rates fell into the top quartile had the highest crude CS rates, while those falling into the bottom quartile had the lowest crude CS rates.
The institutional review board (IRB) at Taipei Medical University, Taipei, Taiwan granted ethical approval for the study.

Variable definitions
'Case mix' represents those factors demonstrated in the literature to increase the risk of CS delivery. In this study, we categorized these factors into obstetric, pregnancy, and other risk factors.

Obstetric risk factors
Numerous studies have documented a breech presentation, dystocia, and fetal distress as obstetric factors which are indications for a CS [5,[11][12][13][14]; however, while these are common indications for a CS, dystocia and fetal distress diagnoses are often very subjective, and, indeed, are often not risk factors in themselves. The inclusion of these factors in the regression, along with preexisting risk factors, may well have masked many important differences by 'adjusting away' subjective practice differences among physicians. Furthermore, since dystocia may be related to fetal macrosomia, and fetal distress may be related to various other conditions (including diabetes, hypertension, and collagen vascular disease), this may once again have introduced potential redundancies into the regression.
There is also a lack of standard clinical criteria for defining dystocia and fetal distress [8,10], with obstetricians in different healthcare institutions possibly applying the terms to quite-different conditions; thus, the variability in the proportions of patients diagnosed with dystocia and fetal distress may be partially attributable to differences in defining these conditions, rather than differences in the physician/patient mix. We therefore selected malpresentation (ICD-9-CM codes 652, 761.7, 763.0, or 763.1), a prolapsed cord (663.0), antepartum hemorrhage, abruptio placenta, and previa placenta (641, 762.0 or 762.1) to better define and represent the obstetric risk factors likely to result in a CS.

Pregnancy risk factors
A number of independent factors, all of which seem clinically relevant and are known to physicians prior to delivery, were included in this study. A woman was considered to have a pregnancy risk factor if she had one or more of the following conditions: a previous CS history (654.

Other risk factors
Certain variables from the database were also recorded to create clinically meaningful categories. These variables included maternal age (in years) at the time of the infant's birth, infant gender, and parity. Maternal ages were grouped into < 20, 20-34 and ≥ 35 years. Since birthweight and gestational week are highly correlated, to prevent redundancy by including both parameters, gestational age was selected to capture the effect of preterm (< 37 weeks) or postdate delivery (≥ 42 weeks). Insufficient or excessive fetal growth (656.5 or 656.6) was also included as a pregnancy risk factor to further distinguish the risk of intrauterine growth retardation or macrosomia for a given gestational age. Parity was recorded as whether or not a mother was parous.

Statistical analysis
All of the statistical analyses within this study were performed using the SAS statistical package (SAS System for Windows, version 8.2, Cary, NC). The χ 2 test was first performed as a means of assessing whether there were any significant variations in the distribution of patient population risks across the physician quartiles, and second, to evaluate whether there were any significant differences, again by physician quartiles, in the cesarean delivery rates among women for each of the risk factors.
A univariate analysis was first carried out as the primary method of calculating the odds of a cesarean delivery, in order to determine whether there was any association with the potential risk factors. Stepwise logistic regressions were then conducted on parturients delivered by physicians in all four quartiles (referred to as model 1) to first develop a predictive model, and second, as a means of minimizing the number of predictive risk factors within the formula. A p value of < 0.05 was required for entry of a risk factor into the model. Given the sufficiently large volume of sample patients adopted for the current study, it was extremely unlikely that any important variables would have been overlooked.
Having computed the predicted risk of CS for each woman based upon the predictive model, these predicted risks were then aggregated by physician to determine the expected (risk-adjusted) CS rate and 95% confidence interval (CI) for each physician. The actual rates were then compared with the expected CS rates for each physician so as to determine how many of the physicians were below, within, and above the predicted CI in each physician quartile.
The main problem with risk adjustment, however, is the way in which an appropriate 'gold standard' is applied. Deriving a logistic equation and then applying it back to the same population from which it was derived is a well-accepted technique; however, there is a certain circular logic to this method, since the gold standard is partially defined by practices that are themselves outliers, such as the case in model 1.
Since the average CS rates for physicians in quartile 4 were as high as 52%, we therefore treated physicians in quartile 4 as outliers and repeated the logistic regression using only those parturients delivered by physicians in quartiles 1, 2, and 3. The results of this analysis are referred to as model 2. Based upon the regression results from model 2, we then recalculated the predicted CS rate and the 95% CI for each physician in the four quartiles, and again determined how many of the physicians were below, within, and above the predicted CI in each physician quartile. Our intuition was that there would be even-greater variation in predicted cesarean section rates under model 2 than under model 1.

Results
The total number of deliveries and mean CS rates are summarized in Table 1 by physician quartiles. Of all the deliveries carried out in Taiwan in 2003, the overall CS rate was 36%, with the mean CS rates being 21% for physician quartile 1; 32% for quartile 2; 39% for quartile 3; and 52% for quartile 4.
In total, 258 physicians delivered 40,973 babies in the first quartile, 257 physicians delivered 54,464 babies in the second quartile, 258 physicians delivered 45,456 babies in the third quartile, and 258 physicians delivered 31,618 babies in the fourth quartile. Table 2 presents the CS rate and frequency for each risk factor by physician quartile. As expected, the presence of risk factors considerably increased from the lower quartile to the higher quartiles. As results of the χ 2 test showed, there were statistically significant variations in the distribution of patient population risks across physician quartiles in almost all of the risk factors, with the exceptions of a prolapsed cord, genital herpes, diabetes mellitus, arterial hypertension, polyhydramnios/oligohydramnios, and congenital/acquired abnormalities of the cervix or vagina.
Similarly, the χ 2 test in Table 2 also indicated that with the notable exception of those cases involving a prolapsed cord, on presentation of each of the risk factors, the decision to perform a surgery significantly increased from the lower quartile to the higher quartiles.
Results of the univariate analysis in Table 3 suggest that those with a significantly greater likelihood of undergoing a CS included parturients who were multiparous, had a multiple gestation, were preterm, were aged over 35 years, were undergoing premature labor, or had malpresentation, a prolapsed cord, antepartum hemorrhage, a previous history of CS, insufficient or excessive fetal growth, genital herpes, diabetes mellitus, had anemia, cardiac disease, arterial hypertension, eclampsia/preeclampsia, polyhydramnios/oligohydramnios, an infection of the amniotic cavity, a congenital/acquired abnormality of the cervix or vagina, or cervical incompetence.
All of the risk factors included within the univariate analysis were retained in the stepwise regression analysis for adjusting the CS rates. The results for model 1 in Table 3 indicate that after controlling for all the risk factors within the model, the adjusted odds ratio for most of the obstetric-and pregnancy-related complications were of considerably greater magnitude compared to the unadjusted risks, while those of premature labor and maternal age were reduced. However, with the exceptions of premature rupture of the membrane and parity, the direction of the relationship between the CS rate and all other variables remained unchanged. Nulliparous women and women with premature rupture of the membrane were no longer protective against CS, but instead, incurred increased risk.
The results of the stepwise logistic regression, using parturients delivered by physicians in quartiles 1, 2, and 3, are presented in model 2 of Table 3. Within the model, with the exception of cervical incompetence, the direction of  the relationship between the CS rate and all other risk factors remained unchanged, with the adjusted odds ratio for most of the obstetric-and pregnancy-related complications being of similar magnitudes when compared to the results of model 1. Cervical incompetence was excluded from the prediction of the CS rate in model 2.
Both the likelihood ratio and the Hosmer-Lemeshow test were significant in models 1 and 2, at p < 0.0001, indicating that while the risk factors considered in both models were significant, in terms of predicting the probability of a CS, the fit was nevertheless imperfect. However, the number of concordant pairs was very high, with the respective c-statistic for models 1 and 2 respectively being 0.870 and 0.872, thereby indicating that the predictive accuracy of the logistic model was good in both models.
Each of the models derived from the stepwise logistic regression analysis in Table 3 was used to generate a predicted CS rate for each physician based upon the patient mix for that physician. The actual CS rates, the predicted rates generated by models 1 and 2, and the percentages of physicians whose actual rates were below, within, or above the predicted 95% CI are presented in Table 4 by physician quartile.
The overall results from model 1 show that of 1031 physicians, 448 physicians (43.45%) had CS rates which were above the predicted CI; as expected, the proportion of physicians whose CS rates were above the predicted 95% CI increased for the higher quartiles. In the lowest quartile, no physician's actual rate was above the predicted CI, while in the highest quartile, the actual rates of as many as 218 physicians (84.85%) were above the predicted CI.
Of the 448 physicians whose actual rates were greater than the predicted CI, 230 physicians (51.34%) were not in the highest quartile of actual rates; on the other hand, of the 446 physicians whose actual rates were below the predicted CI, 212 physicians (47.53%) were not in the lowest quartile.
When a stricter standard was applied to the assessment of physician performance, the results from model 2 in Table  4 indicated that compared to the results of model 1, there were increases in the numbers of physicians whose CS rates were above the predicted CI in each quartile. Overall, as many as 60% of the physicians were found to have higher CS rates than the predicted CI.
In addition, within model 2, CS rates of 24 physicians in the lowest quartile were now higher than the predicted CI. Moreover, the actual rates of all the physicians in the high- est quartile were above the predicted CI, and indeed, the CS rates of no physicians in either quartile 3 or quartile 4 were below the predicted CI.

Discussion
CS rates have been used for a decade or so as a means of comparing obstetric practices across different providers, with many studies throughout this period having documented the need for risk adjustment for fairer comparisons among providers [5][6][7][8][9][10]. In 2000, the American College of Obstetricians and Gynecologists (ACOG) recommended that all CS rates should be risk-adjusted (casemix adjusted) prior to any comparisons being made [15]; however, while this may be desirable, in many areas or countries, such adjustment is usually not feasible, largely due to the lack of adequate data sources.
The CS rate in Taiwan has hovered between 32% and 34% since implementation of the NHI program in 1995.
Although lower than the respective CS rates of 36% and 40% found in Brazil and Chile, the CS rate in Taiwan is nevertheless much higher than the 15% limit recommended by the World Health Organization [16].
Inappropriate CS procedures increase maternal and neonatal morbidity and healthcare costs; however, the BNHI has continually adopted crude CS rates as its measure of quality in obstetrics management, and this can clearly lead to unfair assessments of those providers who are serving high-risk populations, since these providers will obviously have higher CS rates, and will therefore appear to be providing a lower quality of care.
By examining all 172,511 births in Taiwan in 2003, the present study took advantage of a unique populationbased dataset which merges birth certificate data with the NHI claims data of this population to show how adjustments of patient risk factors can significantly change the overall assessment of a physician's obstetrics performance.
Our data specifically demonstrate that more than half of all physicians with inappropriately high rates were not in the highest quartile, and that these would be overlooked if the assessment was only targeted at those physicians in the highest quartile of unadjusted rates. Conversely, we also found that some physicians (40 of 258 physicians) in the highest quartile were actually providing obstetric care that was appropriate to the risk. However, our data also demonstrate that when stricter standards were applied to the evaluation of physician performance, a discernibly higher proportion of physicians were seen to be producing inappropriately high CS rates. In particular, we suggest that all of the physicians in quartile 4 should be placed under close scrutiny.
The CS prediction model adopted for this study was the logistic regression method, which is very similar to that used in analyses undertaken in the US and Europe [5][6][7][8][9].
In general terms, the factors associated with CS are largely similar between various regions or countries; however, due to differences in the data sources or specific patient populations under examination, the odds ratios associated with each of the variables may vary. In contrast to many prior studies [6][7][8], our results show that prior to adjustment, multiparous women are at higher risk of cesarean delivery, mainly attributable to their prior history of a CS. Our data show that of the 87,777 births to parous women, 31,747 (36.17%) were delivered by CS, and 17,594 (55.42%) of these were repeat CSs. The data also reveal that of the 84,734 nulliparous women, 28,332 (33.44%) underwent a CS.
There are limitations to this study which should be noted. First, the possibility may exist for differential misclassification in obstetric/pregnancy risks for CS and vaginal delivery, since women who deliver vaginally tend to underreport complications much more than do those who undergo a CS. Second, the analysis in this study relied upon linked administrative data; however, some of the prior studies evaluating the use of a CS indicated that the use of administrative data may yield similar or higher discrimination than models based upon medical records [17,18]. It has been noted that across different medical institutions, medical record coders may have inconsistent coding practices, while clinicians may have very different views on diagnosis. Third, we do not have access to information on whether an obstetrician collaborated with midwives or family physicians who themselves do not have training/privileges to perform a CS. Such collaboration may have inflated an obstetrician' s apparent CS rate and elevated him/her into the uppermost percentile. Finally, our study was based on a dataset of linked birth certificate and NHI claims data for a single country, which may not be representative of other regions or countries. Thus, we recommend caution when interpreting these results, or attempting to apply them to other regions or countries.

Conclusion
Our study arrived at the same general conclusions reported in the literature in both the US and Europe: that is, comparisons of unadjusted CS rates might not validly reflect the quality of obstetric care being provided, and may ultimately lead to biased judgments by health care purchasers. It is worth noting that the same issues which apply to obstetrics care in the US and Europe also apply to non-Western cultures, such as Taiwan.
While it is clearly important to be able to identify the risk factors associated with CS, in order to optimize the utilization of both resources and personnel, risk adjustment using average performance as a benchmark is not necessarily a true measure of quality [19]. Our study results also show that when we changed the standard of quality assessment, the evaluation results changed.
This study represents the first attempt within a developing country to use a population-based dataset as a means of calculating risk-adjusted CS rates in order to identify those physicians who are above or below the average practice in a high CS rate setting, which prevails in Taiwan. We believe that this represents a cornerstone with regard to the design of effective strategies for the monitoring of the 'epidemic' CS rates currently existing in Taiwan.