Cost-effectiveness analysis of colorectal cancer screening in Shanghai, China: A modelling study

Highlights • The current Shanghai CRC screening program is cost-effective.• Changing to a validated FIT would make the program more efficient.• The results were sensitive to an increase in the cost of the validated FIT.• The results were sensitive to more participation in screening and colonoscopy.


Introduction
Colorectal cancer (CRC) is one of the most common cancers worldwide. (Sung et al., 2021) In recent years, CRC incidence in China has increased substantially from the historically low levels. (Pan et al., 2017) This rapid rise in incidence and the accompanying increase in disease burden is set to become a major public health challenge.
Screening for CRC has been implemented in many (largely highincome) countries and has contributed to reducing CRC incidence and mortality. (Schreuders et al., 2015) Shanghai is one of the first cities in China to implement a CRC screening program.  Agestandardized CRC incidence in urban Shanghai has increased steadily between 1973 and 2015 rising from 13.58 per 100,000 person-years to 28.36 for males and from 11.92 to 22.33 for females. (Bao et al., 2016;Li et al., 2002) Initiated in 2013, the program targets individuals aged 50-74 years and offers triennial screening with a two-sample qualitative faecal immunochemical test (FIT) and a questionnaire-based risk assessment (RA).
Abbreviations: CRC, colorectal cancer; FIT, faecal immunochemical test; ICER, incremental cost-effectiveness ratio; MISCAN-Colon, The Microsimulation Screening Analysis model for CRC; LYs, the number of life years; LYG, the number of life years gained; RA, risk assessment; ng Hb/mL, ng haemoglobin per mL buffer; µg Hb/g, µg haemoglobin per g faeces.
While the RA plays an important role in CRC screening in China (Meng et al., 2009); it has led to a high false-positive rate in the Shanghai screening program. Moreover, although the program states that it utilizes a qualitative FIT with a positivity threshold of 100 ng haemoglobin per millilitre buffer (Gong et al., 2018) (ng Hb/mL, equivalent to 20 µg haemoglobin per g faeces (µg Hb/g) (Fraser et al., 2012), the actual cutoff has been shown to be between 1 and 5 µg Hb/g faeces in lab experiment.  A lower cut-off increases test sensitivity, however, it also lowers specificity and increases the rate of falsepositivity, which results in more unnecessary referrals for colonoscopy. This has been shown to impact compliance to diagnostic colonoscopy (Gong et al., 2018), thereby limiting the effect of screening.
The Shanghai qualitative FIT has been shown to have a low specificity (35%) and a high false-positive rate (65%) in the lab experiment . Low compliance to follow-up colonoscopy has been identified as a significant challenge for CRC screening in China (Shanghai 28%, Pudong 22%). (Gong et al., 2018) Changing to a validated FIT, with a higher specificity, may help the screening program to overcome these issues.
These issues cast doubt on the effectiveness and cost-effectiveness of the current screening program. Therefore, this microsimulation modelling study aims to assess the effectiveness and cost-effectiveness of the Shanghai FIT and RA compared to a validated FIT.

Shanghai screening protocol
Detailed information on the CRC screening program in Shanghai has been provided elsewhere (Gong et al., 2018), also in the Supplementary Methods. In brief, the program was initiated in 2013 with individuals aged 50-74 years offered triennial screening with the Shanghai FIT (100 ng Hb/mL cut-off) and a RA. The RA involved a face-to-face interview consisting of nine questions including anorectal symptoms, related diseases (such as polyps and appendicitis), CRC family history, personal cancer history etc. (Gong et al., 2018) Positive individuals (either a positive Shanghai FIT result or a positive RA result) were recommended to have a diagnostic colonoscopy.

MISCAN-Colon
The Microsimulation Screening Analysis model for CRC (MISCAN-Colon) is a well-established microsimulation model developed at the Department of Public Health, Erasmus University Medical Center. (Loeve et al., 1999) The model has been extensively described previously and is described in Supplementary Methods. (Loeve et al., 1998;van Hees et al., 2015) For this study, we followed the approach developed by Gini et al. (Gini et al., 2021) and adjusted the MISCAN-Colon model to the situation in Shanghai. Data about age distribution and life expectancy of the Chinese population were direct inputs to the model. In order to match CRC incidence before the introduction of screening (Supplementary materials Figure S8), the onset of adenomas was adjusted. To match the stage distribution, transition probabilities for clinical diagnosis were adjusted. Stage distribution (Gong et al., 2015), localization of cancers in the colorectum (Shanghai, 2016) and five-year relative survival (Gong et al., 2015) after clinical diagnosis of a cancer are based on Chinese literature.

Screening strategies
We simulated a cohort of 100 million individuals born between 1939 and 1963 until death (restricted at age 100). The simulated individuals, aged between 50 and 74 years in 2013, were free of diagnosed CRC up to then and had a life expectancy as observed in China in 2010. (Guo, 2010) Individuals were screened triennially between ages 50-74. Screen-positive individuals were invited for a diagnostic colonoscopy.
Surveillance was based on findings at diagnostic colonoscopy according to the European Society of Gastrointestinal Endoscopy (ESGE) Guidelines. (Hassan et al., 2013) We chose to simulate surveillance consistent with these guidelines because in China there is conflicting advice about the post-diagnostic colonoscopy pathway (including when to return to screening and the surveillance pathway). (Gong et al., 2017;Zhonghua et al., 2014;Chinese Society of Clinical Oncology, 2019;Endoscopology, 2015) Furthermore, the Asia Pacific Consensus Group did not provide precise guidelines on surveillance intervals, other than to suggest that such intervals should be tailored to the risk level. .
We assumed age-specific participation in screening and diagnostic colonoscopy based on the participation rates provided by Pudong CDC (Table 2). We assumed 80% adherence to surveillance colonoscopy. We assumed no difference in adherence to follow-up colonoscopy based on which of the two tests (FIT and/or RA) was positive. As there is no data available on adherence with rescreening in Shanghai, we assumed 90% of those who had previously participated would participate again, while 15% of those who had not participated in the previous round would now attend.
Under these conditions, we simulated three screening strategies: screening with the Shanghai FIT only, screening with Shanghai FIT + RA and screening using a validated two-sample FIT with a cut-off of 20 µg Hb/g faeces (validated FIT). In addition, we simulated a strategy without screening.

Test characteristics
We estimated the test characteristics of the Shanghai FIT and Shanghai FIT + RA so that the model predicted positivity and detection rates for advanced neoplasia are similar to those observed provided by the Pudong CDC in the first three years of screening in Pudong (2013-2015) ( Table 1; Table 2; Supplementary Methods page 19 and  Table S1).
To estimate the validated FIT characteristics, we modelled the Dutch trial according to the approach of Goede and colleagues. (Goede et al., 2013) Here, the test characteristics of the validated FIT were fitted to the positivity and detection rates of advanced neoplasia observed in the first screening round of two Dutch randomised trials, which utilised the OC-Sensor micro (Eiken Chemical, Tokyo, Japan). (Hol et al., 2010;Hol et al., 2009;van Rossum et al., 2008;van Roon et al., 2011) (Table 2) The obtained sensitivity and specificity were then transferred to the mode for China. The characteristics differ to those previously presented as the natural history of the MISCAN-Colon model has been updated since this publication. (Rutter et al., 2016).
The test characteristics for all FITs were adjusted to take into account the effect of individuals without adenomas who always test positive and adenomas that do not bleed (systematic false-positive and false-negative results).  Table 2). The lack of specificity of colonoscopy reflects the detection of benign hyperplastic polyps, which are not cancer precursors. (Schroy et al., 2013).

Costs of screening, surveillance and CRC care
Costs were included from the healthcare sector perspective. The costs of the Shanghai FIT and the RA were provided by Pudong CDC. The Pudong CDC also provided the costs for the validated FIT based on the current reimbursement funding arrangement. These costs include the test kits, their distribution, return, analysis, and expenses in marketing. Costs for colonoscopy, polypectomy and complications from colonoscopy were obtained from research based in China (Table 2). (Wang et al., 2012;Li et al., 2016) Costs for cancer care were based on costs of cancer treatment in the Chinese setting;

Outcomes
For all strategies, the model estimated CRC incidence, CRC mortality, benefits (the number of life years (LYs) and the number of life years gained (LYG)), burden (number of screening tests, and diagnostic and surveillance colonoscopies required), harms (number of colonoscopy complications and false-positive tests), as well as total costs (Renminbi (¥)). A false-positive test is defined as a positive screening test followed by a colonoscopy with no clinical findings. Costs and LYs were discounted using a standard annual rate of 3%. Undiscounted results and results discounted to 5% are presented in Supplementary Results Tables and Figures.

Cost-effectiveness analysis
To determine the cost-effectiveness of all strategies, strategies were rank-ordered according to their costs. Strategies that cost more than (a combination) of other strategies while gaining fewer LYs were considered inefficient. For the remaining strategies, cost-effectiveness was expressed by the incremental cost-effectiveness ratio (ICER) as incremental cost per LYG compared to the next less effective strategy. The willingness-to-pay (WTP) threshold was set at three times the Chinese gross domestic product per capita in 2018 (¥193,931 Chinese Renminbi Yuan, equal to $29,313US) per LYG. The strategy with the highest ICER below the WTP threshold was considered as the most efficient strategy.

Sensitivity analysis
We conducted six sensitivity analyses to assess the robustness of our results. First, due to the uncertainty about the cost of the validated FIT, we explored the impact of varying its cost by assuming a 50% reduction and a two-fold increase. Second, we adjusted the treatment costs for stages II-IV because of the limited availability of cost data and the possibility that the assumed costs may not reflect the actual costs of treating CRC. Treatment costs were adjusted to reflect the proportional increase in lifetime health care costs for the different stages of CRC. (Lang et al., 2009) The costs for stage I remained unchanged from the base case scenario. Third, as there is currently no available information on quality of life in the Chinese setting, they were excluded from the main analysis. Therefore, we conducted a sensitivity analysis utilising international quality of life measurements (Supplementary Methods  Table S2). (Ness et al., 1999) Fourth, we assessed the impact of an alternative surveillance pathway, derived from Chinese literature (Supplementary Methods Figure S7). (Gong et al., 2017;Zhonghua et al., 2014) Fifth, in order to determine the effect of improved participation in screening and diagnostic follow-upon outcomes, the participation was increased to 60% and 80%, respectively. Finally, due to uncertainty in performance of the Shanghai tests, we calibrated the FIT and risk questionnaire characteristics based on data published about CRC screening in Guangzhou, China. (Lin et al., 2019).

Probabilistic sensitivity analysis
In the probabilistic sensitivity analysis, we assessed the uncertainty of the costs to evaluate future economic improvements and change in health care costs. For every strategy, we performed 1,000 simulations each containing different costs drawn from a gamma distribution (Table 2).

Ethic compliance
This study met the institution's guidelines for protection of human subjects concerning their safety and privacy.

Base case
After adjustment to the Shanghai population, the MISCAN-Colon model predicted that, without screening, lifetime CRC incidence and mortality would be 45 and 10 per 1,000 individuals, respectively (Table 3). Introducing screening reduced both CRC incidence and mortality in all three screening strategies. Screening with the Shanghai FIT and the validated FIT reduced CRC incidence from 45 to 43 cases per 1,000 individuals (4.4% decrease) and to 42 cases (6.7% decrease) with the Shanghai FIT + RA. All screening strategies reduced CRC mortality by 10.0% (to 9 deaths per 1,000 individuals).
The number of screening episodes used in the three different strategies was comparable (ranging from 2,142 to 2,150). As would be expected, colonoscopy demand increased with the introduction of screening. The validated FIT had the lowest colonoscopy demand (122 colonoscopies per 1,000 individuals.), followed by the Shanghai FIT (171 colonoscopies) and the Shanghai FIT + RA (197 colonoscopies). The number of false-positive tests was substantially lower for the validated FIT (14 tests) compared to the Shanghai FIT (56 tests) and the Shanghai FIT + RA (70 tests).
The validated FIT screening strategy had the largest LYG (6.97 per 1,000 individuals), followed by Shanghai FIT + RA (6.62) and Shanghai FIT (6.19) (Fig. 1, Table 3). Without screening the cost of diagnosing and treating CRC was ¥1,080,042 per 1,000 individuals. Screening increased costs by ¥49,797 to ¥70,436 (4.7 to 6.6%). The current screening program (Shanghai FIT + RA) cost ¥146,1768, an increase of ¥66,134 (6.2%). At the WTP threshold (¥193,931/LYG), the most efficient Table 1 Positivity and detection rates obtained by estimation and provided by Pudong Centre for Disease Control for the first three years of screening (2013)(2014)(2015) Abbreviations: CRC, colorectal cancer; FIT, faecal immunochemical test; RA, risk assessment; CI, confidence interval. a The observed positivity rate is determined as the total number of positive tests divided by the total number of participants using the specific screen test. In case of the Shanghai FIT + RA, this screen test was considered positive when the Shanghai FIT and/or the risk assessment were positive. b The estimated positivity and detection rates are obtained by the Nelder-Mead Simplex method (Nelder and Mead, 1965) as explained in the methods section. c The observed detection rates were corrected for lack of adherence with colonoscopy to allow unbiased comparison with estimated detection rates. This is established by multiplying the observed positivity rate with the positive predictive value.
screening strategy was the validated FIT with an ICER of ¥26,461 per LYG. The strategy using the Shanghai FIT had a lower ICER (¥8,045 per LYG (Table 3)), but was less effective than the validated FIT. The strategy using the Shanghai FIT + RA was less costly and less effective than the validated FIT and thus extended dominated.

Sensitivity analysis
Our results were robust to changes in the discounting rates, producing similar results without discounting (Supplementary Results  Table S1) and under a discounted rate of 5% (Supplementary Results  Table S2). In addition, they were robust to the reduction in the cost of the validated FIT (Supplementary Results  Table S3e). Under all these assumptions, the validated FIT remained the most efficient strategy. However, when the cost of the validated FIT was increased by 200% (Supplementary  Table S3e), the ICER of the validated FIT exceeded the WTP threshold. In addition, when participation for screening and diagnostic follow-up was increased, the ICER of the Shanghai FIT + RA was the highest below the WTP threshold (Supplementary Table S3g). Therefore, in these two scenarios, screening with the Shanghai FIT + RA was the most costeffective strategy.

Probabilistic sensitivity analysis
The probabilistic sensitivity analysis suggests that at the WTP threshold of ¥193,931, the optimal screening test (the validated FIT) is the cost-effective strategy in more than 75% of the 1,000 considered cost b Specificity and sensitivity based on the positivity rates and detection rates of advanced neoplasia observed in the first screening round in Pudong, Shanghai. This data for this was provided by Pudong Centre for Disease Control. Sensitivity for adenomas smaller than 5 mm was assumed to be 0% for all tests. c Different sensitivities are defined in the model as it simulates the development of colorectal cancer through the adenoma carcinoma sequence. As each simulated person ages, one or more adenomas may develop and these adenomas can progress in size increasing from small (<5 mm) to medium (6-9 mm) to large (greater than10 mm). Some adenomas can develop into preclinical cancer, which may progress through cancer stages I to IV. d Complications are conditional on polypectomy, and we assume that polypectomy is only performed if colonoscopy is positive. A complication is considered as an unplanned hospital admission within 30-days of a colonoscopy. e Rate of perforation is based on data from Shanghai, China, 2014 (Shi et al., 2014). f Costs are from a health system perspective and do not include patient time costs. All costs are presented in Chinese Renminbi Yuan (¥) and are indexed to 2019 prices (Tool, 2019). g Costs for two-sample quantitative FIT provided by Pudong Centre for Disease Control and are based on the current reimbursement funding arrangement. h Cost for a two-sample FIT used in the Shanghai screening program taken from colleagues, 2018 (Gong et al., 2018). i Cost of the risk assessment provided by Pudong Centre for Disease Control. j These costs are provided to encourage those with positive screening test to attend diagnostic colonoscopy, as well as support other activities related to colonoscopy. Costs provided by Pudong Centre for Disease Control. k Costs for colonoscopy are based on sources from China (Wang et al., 2012) and includes cost of bowel preparation (Huang et al., 2017). l Costs polypectomy is based on sources from China (Wang et al., 2012) and includes costs of biochemical and pathological testing (Huang et al., 2017). This cost is in addition to the cost for colonoscopy. m Costs for perforation during colonoscopy are based on sources from China (Wang et al., 2012). n Costs of cancer treatment are taken from the Chinese setting. These costs were for the hospitalization stage and the first year after CRC has been diagnosed. It included medicine, surgical, examination, and treatment fee. It didn't include surveillance (CT scan, blood test, endoscopy, etc.) after initial treatment of CRC (Shanghai, 2016. o Co-payment made by patients when seeing a doctor and undergoing a colonoscopy (Shanghai, 2016). p Ranges of 95% confidence intervals for the costs in the probabilistic sensitivity analysis were obtained by halving and doubling the base case values. Using these ranges, the shape parameter k and the scale parameter θ are calculated as input for the Gamma-distributions. Figure S4). The existing program was not cost-effective in any of the 1,000 values for the costs.

Discussion
This study investigated the effectiveness and cost-effectiveness of three CRC screening strategies (Shanghai FIT, Shanghai FIT + RA, and a validated FIT), using the MISCAN-Colon model. Our results suggest that all strategies were almost equally effective at reducing CRC incidence and mortality and resulted in similar LYG. The strategy utilising the validated FIT had substantially lower harms, however it cost more than the alternatives. At the WTP threshold of ¥193,931 Renminbi (equivalent to $29,313 US) per LYG, triennial screening between ages 50-74 with the validated FIT at a cut-off 20 µg Hb/g faeces was the most costeffective strategy. Although our results demonstrate that the current screening program is not the most cost-effective, it is encouraging that even with low levels of participation screening reduces CRC incidence and mortality. In order to improve the cost-effectiveness and reduce the burden of screening, the program could shift to a validated FIT.
The effectiveness of screening is impacted by the characteristics of the screening test(s). Although in theory the Shanghai FIT and the validated FIT have the same cut-off value, the two tests have different characteristics. The validated FIT has higher specificity and acceptable sensitivity, resulting in less false-positives and lower colonoscopy demand. The estimated complications and CRC mortality were comparable with the other screening strategies, and CRC incidence was the same as with Shanghai FIT, and only slightly less than with Shanghai FIT + RA.
Better specificity is an important explanation for the better costeffectiveness of the validated FIT. The Shanghai FIT, both with and without RA, has low estimated specificity (87.4% and 79.3%, respectively). This is consistent with the reported low specificity of qualitative FITs (75.1%) in China.  The low specificity and therefore high false-positivity of the Shanghai FIT may be a consequence of unstandardized cut-off of the test. This, coupled with visual interpretation of the results (rather than a numerical result) which introduces inter-observer variation, influenced the number of referrals for colonoscopies. This supports our suggestion to move to the validated FIT.
Although the cost-effectiveness of CRC screening in high incidence countries is widely accepted, less is known about the cost-effectiveness in lower incidence countries like China. Previous research has shown a wide range of screening strategies to be cost-effective, with some being cost saving, in Western countries. (Ran et al., 2019) Modelling studies using a Markov model found CRC screening to be cost-effective in the Chinese population (Cai et al., 2016;Wang et al., 2020;Liang et al., 2019;Huang et al., 2014), which is consistent with our findings. The study demonstrating the cost-effectiveness of a stool-based test combined with a RA over a stool-based test alone are hard to compare to this study, because they investigated different screening strategies (especially for surveillance) and the assumptions of the population and test characteristics were different. (Huang et al., 2014) To the best of our Note: Screening in Pudong occurs every three years between ages 50 to 74. Bold highlights the most efficient screening strategy under the willingness-to-pay threshold. Abbreviations: CRC, colorectal cancer; FIT, faecal immunochemical test; RA, risk assessment; ICER, incremental cost-effectiveness ratio. a Shanghai FIT + RA and the validated two-sample FIT were both considered to be one single test episode in the simulation. b Due to rounding and the low probability of an adverse event (complication) during a colonoscopy (0.012%), the number of complications per 1,000 individuals simulated did not change between different screening strategies. c Life years gained compared to a situation without screening. knowledge, there are no cost-effectiveness analyses investigating the use of a quantitative FIT in China until now. Studies outside of China found that in population-based screening, the likely total cost of qualitative FIT was greater than an automated high-volume quantitative FIT, since quantitative FIT provide the tools to control colonoscopy referrals and for most health systems the cost of the screening test is smaller than colonoscopy. (Allison et al., 2014). Our results were sensitive to improved participation in screening and diagnostic follow-up and an increase in the cost of the validated FIT. With increased participation in screening and diagnostic colonoscopy, the Shanghai FIT + RA became the most cost-effective strategy. However, under this strategy the colonoscopy demand became very high (635 per 1,000 simulated individuals), almost three times as high as with the validated FIT. Considering that colonoscopy capacity is limited in many regions in China , it is very likely that the Chinese health system does not have the capacity to satisfy such highintensity colonoscopy demand. Colonoscopy demand is a bottleneck in CRC screening. When the demand exceeds the available capacity, the implementation of the screening program will be obstructed and the effectiveness of screening will be influenced. (Arrospide et al., 2018) Therefore, any proposal for CRC screening that increased colonoscopy demand would not be feasible with the current available resources in China. In a complementary article (Cenin et al., 2022), annual testing using the validated one-sample FIT, with a cut-off of 10 µg Hb/g from ages 45-80 years would be the most cost-effective, while still requiring fewer colonoscopies than the existing triennial Shanghai screening strategies. If Shanghai was to move to a validated FIT, the large purchasing power would likely reduce the cost from our estimated ¥25. Our sensitivity analysis showed the validated FIT to be highly cost-effective under a 50% reduction in this cost.
The CRC screening participation rates in China are considerably lower than in other countries. (Yang et al., 2020) Compared with participants who were at least positive with the Shanghai FIT, participants with only a positive RA were the least likely to undergo colonoscopy. (Gong et al., 2018;Cheng et al., 2018;Wu et al., 2019) This suggests low public confidence in RA, probably due to its high false-positive rate. (Gong et al., 2018) Although previous studies showed that RA improved sensitivity and helped to identify individuals with non-bleeding lesions (Meng et al., 2009), more research is needed to develop and validate optimal RA tools. In Japan, a risk score including sex, age, CRC family history, BMI, and smoking history was shown to be more effective and cost-effective than colonoscopy and FIT. (Sekiguchi et al., 2020) In Hong Kong, a similar risk scoring system which included self-reported diabetes has been effectively used to prioritize high-risk individuals for colonoscopy and polypectomy when there is limited colonoscopy capacity. (Wong et al., 2014).
This study has four limitations. First, the impact of the existing Shanghai screening program may have been overestimated. We used data from the first three years of screening (2013)(2014)(2015) under the assumption that, given the triennial screening interval, this data would represent the prevalent screening round. As prevalent screening rounds generally have a higher yield than subsequent rounds, the calibrated test characteristics are likely to be overestimated. However, even with these overestimates our results demonstrate that the Shanghai FIT is inferior to the validated FIT. In addition, as some Chinese surveillance guidelines recommend individuals return to screening one year after a negative diagnostic colonoscopy, there may be some contamination in the data. However, as so few individuals participated in diagnostic colonoscopy (6.37% in Pudong), we believe that this is unlikely to affect our results. Second, the Shanghai FIT + RA was modelled in MISCAN-Colon as one single screening test episode. This approach ensures that potential correlation between both tests is appropriately taken into account. However, it is also likely that the outcomes of such questionnaires are correlated over screening rounds, which was not accounted for. Third, as there are conflicting surveillance guidelines for CRC screening in China we modelled surveillance according to the ESGE Guidelines. (Hassan et al., 2013) When we modelled Chinese surveillance guidelines our results did not change. Finally, although the model is adjusted to China, the underlying model structure is based on the European population. We assumed the same progression of adenoma to cancer in the model for China as in the Dutch model. As there may be differences between these populations, long-term impacts of screening may not be generalizable.
Based on this modelling study, CRC screening in Shanghai is highly cost-effective in reducing the CRC burden and shifting to a validated FIT could improve the cost-effectiveness. In addition, using the validated FIT, the quantitative value of faecal haemoglobin concentrations is known and it is therefore possible to adjust the positive thresholds to match the desired positive rate and colonoscopy referrals in different social contexts. (Symonds et al., 2018).
In conclusion, our findings show that the Shanghai CRC screening program is cost-effective. This study supports the continuity of the program and highlights switching to the validated FIT to increase its efficiency.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.