Association Between Smoking and Molecular Subtypes of Colorectal Cancer

Abstract Background Smoking is associated with colorectal cancer (CRC) risk. Previous studies suggested this association may be restricted to certain molecular subtypes of CRC, but large-scale comprehensive analysis is lacking. Methods A total of 9789 CRC cases and 11 231 controls of European ancestry from 11 observational studies were included. We harmonized smoking variables across studies and derived sex study–specific quartiles of pack-years of smoking for analysis. Four somatic colorectal tumor markers were assessed individually and in combination, including BRAF mutation, KRAS mutation, CpG island methylator phenotype (CIMP), and microsatellite instability (MSI) status. A multinomial logistic regression analysis was used to assess the association between smoking and risk of CRC subtypes by molecular characteristics, adjusting for age, sex, and study. All statistical tests were 2-sided and adjusted for Bonferroni correction. Results Heavier smoking was associated with higher risk of CRC overall and stratified by individual markers (Ptrend < .001). The associations differed statistically significantly between all molecular subtypes, which was the most statistically significant for CIMP and BRAF. Compared with never-smokers, smokers in the fourth quartile of pack-years had a 90% higher risk of CIMP-positive CRC (odds ratio = 1.90, 95% confidence interval = 1.60 to 2.26) but only 35% higher risk for CIMP-negative CRC (odds ratio = 1.35, 95% confidence interval = 1.22 to 1.49; Pdifference = 2.1 x 10-6). The association was also stronger in tumors that were CIMP positive, MSI high, or KRAS wild type when combined (Pdifference < .001). Conclusion Smoking was associated with differential risk of CRC subtypes defined by molecular characteristics. Heavier smokers had particularly higher risk of CRC subtypes that were CIMP positive and MSI high in combination, suggesting that smoking may be involved in the development of colorectal tumors via the serrated pathway.

to assess the association between smoking and risk of CRC subtypes by molecular characteristics, adjusting for age, sex, and study. All statistical tests were 2-sided and adjusted for Bonferroni correction. Results: Heavier smoking was associated with higher risk of CRC overall and stratified by individual markers (P trend < .001). The associations differed statistically significantly between all molecular subtypes, which was the most statistically significant for CIMP and BRAF. Compared with never-smokers, smokers in the fourth quartile of pack-years had a 90% higher risk of CIMP-positive CRC (odds ratio ¼ 1.90, 95% confidence interval ¼ 1.60 to 2.26) but only 35% higher risk for CIMP-negative CRC (odds ratio ¼ 1.35, 95% confidence interval ¼ 1.22 to 1.49; P difference ¼ 2.1 x 10 -6 ). The association was also stronger in tumors that were CIMP positive, MSI high, or KRAS wild type when combined (P difference < .001). Conclusion: Smoking was associated with differential risk of CRC subtypes defined by molecular characteristics. Heavier smokers had particularly higher risk of CRC subtypes that were CIMP positive and MSI high in combination, suggesting that smoking may be involved in the development of colorectal tumors via the serrated pathway.
Colorectal cancer (CRC) is one of the most common and fatal cancers (1). In the United States, there were an estimated 145 600 new cases and 51 020 deaths in 2019 (2). In addition, CRC is a disease with considerable genetic and molecular heterogeneity (3). Molecular classification of CRC using clinically informative genetic and epigenetic features has potential prognostic (4) and treatment implications (5). Mutations in the KRAS gene have been shown to promote the growth of colorectal adenomas in 30%-40% of sporadic CRC (6). Microsatellite instability (MSI), characterized by frequent alterations in tandemly repeated DNA sequences, has been reported to occur in 10%-15% of CRC and associated with a favorable prognosis (7,8). In addition, many MSI-high CRC also present the CpG island methylator phenotype (CIMP) or BRAF c.1799T>A (p.V600E) mutations (9).
Cigarette smoking has been established as a risk factor for CRC (10,11). Meta-analysis showed that current smokers had a 17% higher risk of developing CRC and 40% higher risk of CRC mortality than never-smokers (11). Recent evidence suggests that the association between smoking, including current smoking status, cumulative pack-years, duration of smoking or cessation periods, and CRC risk may differ by molecular characteristics. Several studies have found that smoking status has stronger associations with higher risks of MSI-high, CIMPpositive, or BRAF-mutated colorectal tumors but is less pronounced among MSI-low or microsatellite stable, CIMPnegative, or BRAF-wild-type CRC (12)(13)(14)(15)(16)(17)(18)(19). In addition, heavier smoking was found to be associated with an increased risk for KRAS-wild-type CRC but not KRAS-mutated tumors (20,21). However, 2 studies found no statistically significant difference in the association between smoking and CRC risk by KRAS mutation (15,22). Recent meta-analyses showed a statistically significant positive correlation between ever-smoking and BRAF mutation, MSI high, and CIMP positivity in CRC (23,24).
However, most studies assessed CRC molecular subtypes only by individual marker status. In this study, we aimed to comprehensively assess the association between smoking and CRC risk both by individual markers (MSI status, CIMP status, KRAS and BRAF mutations) and by combinations of all 4 markers, using pooled individual-level data from a large consortium.

Study Participants
This study consisted of 9789 patients diagnosed with CRC and 11 231 controls from 11 observational studies within the Genetics and Epidemiology of Colorectal Cancer Consortium and the Colon Cancer Family Registry with available tumor marker and smoking data. Participating studies were previously described and summarized in Table 1 (25,26). All participants provided written informed consent, and each study was approved by the relevant research ethics committee or institutional review board. CRC cases were confirmed by medical record, pathology report, or death certificate by study protocol. Controls were individuals without history of CRC at the time of case selection and were selected per study-specific matching criteria. Participants of non-European ancestry were excluded from the analysis because of small sample size.

Assessment and Harmonization of Tumor Marker Data
Details on data collection and harmonization of tumor marker data were summarized and published previously (25)(26)(27). Briefly, testing for MSI, BRAF gene mutations, KRAS gene mutations, and CIMP status was conducted previously by each study and according to individual study protocols. To harmonize markers across all studies, we created 2 categories for each marker for downstream analyses. In instances where studies categorized as MSI high (MSI-H), MSI low (MSI-L), and microsatellite stable (MSS), we collapsed MSI-L and MSS into an MSI-L/ MSS category. In instances where studies categorized as CIMP high, CIMP low, and CIMP negative, we collapsed CIMP-low and CIMP-high into the CIMP-positive (CIMPþ) category. We included any mutation identified by a study for BRAF and KRAS genes.
Additionally, we combined markers to create subtype classifications: subtypes 1-5 were created according to JASS classification (28), and type 6-16 were numbered consecutively by the status of MSI, CIMP, BRAF, and KRAS (summarized in Figure 1). Only cases with all markers assessed are included in combined molecular classifications and corresponding analysis.

Smoking and Other Exposure Data
Data collection and harmonization of epidemiologic data have been described elsewhere (29,30). Briefly, demographic and environmental risk factors were self-reported at in-person interviews or via structured self-administered questionnaires. Data were collected at study entry, blood draw, or 1 to 2 years prior to sample ascertainment. A multistep, iterative data harmonization procedure was applied, and multiple quality-control checks were performed, reconciling each study's unique protocols. Variables were combined into a single dataset with common definitions, standardized coding, and standardized permissible values.
Smoking status was categorized into never-, former, and current smokers at baseline in each study. In addition, sex study-specific quartiles were created for smoking pack-years among ever-smokers. Never-smokers were used as a reference group in all analyses.
Demographic variables included age and sex. Age was defined as age at diagnosis for cases and age at selection for controls. Other lifestyle covariates included body mass index (BMI; defined as weight[kg]/height[cm 2 ]), regular use of non-steroidal anti-inflammatory drugs (NSAIDs), history of colorectal screening, alcohol intake, and physical activity.

Statistical Analyses
All statistical tests were 2-sided. The distributions of individual tumor markers were summarized among CRC cases, and Pearson correlation test was used to assess the correlation among markers. We used multinomial logistic regression models to estimate odds ratios (OR) and 95% confidence intervals (CIs) for the association of smoking with the risk of CRC subtypes. To account for multiple testing in case-control analysis, we used a Bonferroni corrected P value threshold of .05/16 (4 markers x 2 status x 2 smoking comparisons) ¼ 3.1 x 10 -3 for categorical smoking status and .05/8 (4 markers x 2 status x 1 group linear comparison) ¼ .006 for smoking pack-years. We used logistic regression models to assess the differences in the associations between smoking and molecularly mutated subtypes (BRAF mutated [mut], KRAS-mut, MSI-H, or CIMPþ), as compared with wild-type subtypes (BRAF-wild type [wt], KRAS-wt, MSI-L/ MSS, or CIMP negative [CIMP-], respectively) among cases only (Bonferroni corrected P difference threshold: .05/8 ¼ 6.3 x 10 -3 for categorical smoking status and .05/4 ¼ 0.013 for smoking pack-years). Age at diagnosis, sex, and study were adjusted as covariates in the models. According to a priori knowledge about CRC risk factors that have been associated with smoking, we further simultaneously adjusted for BMI, use of NSAIDs, history of screening, alcohol intake, and physical activity as sensitivity analyses.
In analysis of combined marker status, CRC subtypes with at least 50 cases were assessed in their association with smoking. Similarly, we used multinomial logistic regression models, adjusting for age, sex, and study. The subtype with MSI-L/MSS, CIMP-, BRAF-wt and KRAS-wt was used as a reference group in the case-only analysis (Bonferroni corrected P difference threshold: .05/10 ¼ 5.0 x 10 -3 ).
Exploratory analysis of smoking-CRC association stratified by sex, colonic locations, and study design was also conducted for both individual and combined markers. All analyses were performed using R version 3.5.1 .

Association Between Smoking and Individual Marker Subtypes
In case-control analysis, individuals who smoked were associated with a higher risk of CRC overall and stratified by individual marker subtypes except for KRAS-mut tumors (P < 3.1 x 10 -3 ; Table 2). Associations were stronger among current smokers. For instance, current smoking was associated with a 2-fold risk in MSI-H CRC (OR ¼ 2.01, 95% CI ¼ 1.68 to 2.40), whereas former smoking was associated with higher risk in MSI-H CRC (OR ¼ 1.27, 95% CI ¼ 1.11 to 1.46), compared with never-smokers.
In case-only analyses, the association between smoking status and CRC risk was statistically significantly stronger for BRAF-mut, KRAS-wt, MSI-H, and CIMPþ CRC subtypes among current smokers only but not among former smokers, after Bonferroni correction (P difference < 6.3 x 10 -3 ; Table 2). We further assessed the dose-response relationship between smoking and CRC subtypes. Compared with nonsmokers, higher pack-years of smoking were associated with higher risk of CRC among all subtypes in case-control analysis (P < 1.6 x 10 -3 ; P trend < .001; Table 3). In case-only analysis, the association between pack-years and molecular subtypes was statistically significantly stronger for BRAF-mut, CIMPþ, and MSI-H subtypes compared with wild-type or negative CRC cases after Bonferroni correction (P difference < 6.3 x 10 -3 ; Table 3). The largest difference in case-control risk estimates were seen for Associations between former and current smokers and risk of CRC subtypes defined by combined marker status. Two-sided Wald test was used to calculate the P values from the case-control analysis (N controls ¼ 11 231) and case-only analysis (P difference ). A Bonferroni corrected P value threshold of 5.0 x 10 -3 was used for both case-control and case-only analyses. Error bars represent the 95% confidence intervals (CIs). CIMP ¼ CpG island methylation phenotype; CRC ¼ colorectal cancer; BRAF-mut and CIMPþ CRC. Participants in the highest quartile of smoking pack-years had nearly a 2-fold risk for CRC if they had BRAF-mut (OR ¼ 1.92, 95% CI ¼ 1.58 to 2.33) or CIMPþ tumors (OR ¼ 1.90, 95% CI ¼ 1.60 to 2.26), compared with nonsmokers, respectively. In comparison, the risk of BRAF-wt or CIMP-CRC was increased by only 36.9% (P difference ¼ 2.7 x 10 -6 ) and 34.8% (P difference ¼ 2.4 x 10 -6 ) among heaviest smokers, respectively. There was no statistically significant difference after Bonferroni correction in the associations of pack-years of smoking with KRAS-mut or KRAS-wt CRC (P difference ¼ 9.6 x 10 -3 ). Sensitivity analysis that additionally adjusted for BMI, family history of CRC, CRC screening history, NSAID use, or alcohol intake did not meaningfully change our conclusions (data not shown).

Association Between Smoking and Combined Marker Subtypes
Overall distribution of CRC cases by smoking status and combined marker subtypes are summarized in Supplementary  Figure 1 (available online). Of 16 possible combined CRC subtypes, 10 had 50 or more cases and were included in the analysis. Among them, former smoking was statistically significantly associated with higher risk of 4 CRC subtypes after Bonferroni correction (types 4, 3, 14, 1; Figure 1, A) compared with neversmokers. Comparatively, current smoking was associated with higher risk of CRC for 6 subtypes, but only 3 remained statistically significant after Bonferroni correction (types 5, 14, 1; Figure  1, B). The strongest association for both former and current smoking was observed in type 14 (MSI-H, CIMPþ, BRAF-wt, and KRAS-wt), where former and current smoking was associated with 87% and 264% higher risk of type 14 CRC compared with never-smokers, respectively. Using type 4 (all markers wild type/negative) as reference in case-only analyses, we observed no statistically significant differences between risks of CRC subtypes among former smokers. However, cases with current smoking were statistically significantly more likely to be type 5 (only MSI-H; P difference ¼ 4.8 x 10 -3 ) and type 14 CRC (P difference ¼ 7.3 x 10 -8 ) compared with never-smokers after Bonferroni correction (Figure 1, B).
Higher smoking pack-years was also statistically significantly associated with higher risk of 4 CRC subtypes after Bonferroni correction (types 4, 3, 14, and 1; Figure 2). Similarly, the association between smoking pack-years and CRC risk was strongest in type 14 (OR per quartile ¼ 1.37, 95% CI ¼ 1.22 to 1.53). When compared with type 4 in case-only analyses, higher smoking pack-years was associated only with higher risk of type 14 (P difference ¼ 1.8 x 10 -5 ) and type 1 CRC (MSI-H, CIMPþ, BRAF-mut, and KRAS-wt; P difference ¼ 2.5 x 10 -4 ).

Exploratory Stratified Analysis
When stratified by colonic location, BRAF-mut, CIMPþ, and MSI-H status were more frequent in proximal colon cancer, compared with distal colon or rectal cancer (Supplementary Table 2, available online). For proximal colon, current smoking and higher pack-years were associated with higher risk of BRAFmut, KRAS-wt, CIMPþ, and MSI-H tumors (Supplementary  Tables 3 and 4, available online) and higher risk of type 14 ( Supplementary Figures 2 and 3, available online). Although sample sizes were limited, a similar trend of smoking-CRC association was observed in distal colon and rectal cancer. Current smoking was associated with higher risk of type 8 distal colon cancer (only CIMPþ), but this did not remain statistically significant after Bonferroni correction (Supplementary Figure 2, available online). Interaction analysis between smoking and colonic location was not statistically significant.
When stratified by sex, similar trends of associations were observed, although the risk estimates varied slightly between sexes. For instance, current smoking was associated with a 90% and 66% increase in CIMPþ tumors among females and males, respectively (Supplementary Tables 5 and 6, available online). In addition, the dose-response association was stronger for BRAFmut and CIMPþ CRC among females. In combined marker analysis, current smoking was most strongly associated with type 14 CRC in both sexes ( Supplementary Figures 4 and 5, available online). Interaction analysis between smoking and sex was not statistically significant. Further stratification by sex in proximal colon tumors did not suggest statistically significant difference (data not shown). In addition, similar trends were also observed when stratified by study design (Supplementary  Tables 7 and 8 and Supplementary Figures 6 and 7, available online).

Discussion
In this large study, we found that smoking was associated with higher risk of all molecular CRC subtypes, and the association was statistically significantly stronger for BRAF-mut, MSI-H, or CIMPþ CRC cases. We also found that smoking had a statistically significantly stronger association with CRC subtypes that display MSI-H and CIMPþ status. Our results are consistent with previous evidence that smoking is associated with higher risk of CRC subgroups classified by individual marker status, including similar findings from 1 of the participating studies (19). Current smoking was associated with almost 2-fold higher risk of CRC with MSI-H, CIMPþ, or BRAF-mut compared with never smoking (12). A study in 2 prospective cohorts found that a longer cessation period was associated with MSI-H and CIMPþ CRC, but not with MSS or CIMP-CRC, compared with current smokers (13). In addition, longer duration of smoking was found to be associated with increased risk of MSI-H CRC (14). Several cohort and case-control studies also found that higher smoking pack-years were associated with higher risks of CRC with MSI-H, CIMPþ, or BRAF-mut, compared with wild-type or negative CRC subtypes (12)(13)(14)17). In a population-based, case-control study, current cigarette smoking and higher pack-years were found to be statistically significantly associated with higher risk of MSI-H than MSS colon tumors (16). Similar to our results on KRAS mutation status, several observational studies found that smoking status and packyears were associated only with higher risk of KRAS-wt but not KRAS-mut tumors, although the differences were not statistically significant (6,20,21). In contrast, a case-cohort study (648 cases) in the Netherlands observed a non-statistically significant increase in KRAS-wt CRC risk among former smokers but not among current smokers (20).
Individual markers were not independent from each other. CIMPþ CRC tumors tend to have a high frequency of MSI and BRAF mutation (9,(31)(32)(33)(34). However, few studies have assessed the combined subtypes of CRC. A prospective cohort study found that smoking 20 or more cigarettes per day was associated with higher risks of MSI-L/MSS and CIMPþ CRC, regardless of BRAF mutation status (17). No statistically significant association was found in CIMP-tumors. Another analysis in 2 prospective cohorts also found that smoking 40 or more pack-years of cigarettes was associated with higher risk of CIMPþ and MSI-H CRC compared with never-smokers (13). Consistent with previous findings, we found that higher smoking pack-years were statistically significantly associated with higher risk of CIMPþ and MSI-H CRC, regardless of BRAF mutation status.
Smoking is a well-established carcinogen for CRC (35). Metaanalyses of epidemiological studies have consistently found a statistically significant association, and dose-response relationships, between smoking and CRC risk (10,11,36). However, knowledge on the underlying mechanisms of smoking in CRC molecular subtypes is limited. In general, tobacco smoke contains a variety of toxic chemicals (37), many of which can induce DNA damage (38). Tobacco exposure has also been associated with CIMP in other cancer types, including lung (39,40), bladder (41), and head and neck cancer (42). Therefore, it is biologically plausible that smoking promotes colorectal tumor growth and progression by epigenetic alterations. In addition, the detoxification of smoking-induced carcinogens are metabolized by phase I and phase II enzymes such as CYP (P-450) family genes, which lead to the production of abnormal DNA and mutations in genes such as KRAS and BRAF (43).
Furthermore, the CIMPþ and MSI-H tumors are more likely to arise from serrated polyp pathways, as compared with traditional adenoma-carcinoma pathways. It is estimated that 10%-20% of CRCs arise via serrated polyp-carcinoma pathway (44). DNA methylation is key to the development of this type of cancer (45). CIMPþ phenotype is frequently observed in precursor serrated lesions and colorectal polyps, ranging between 40% and 80% (46,47). MSI-H phenotype has also been observed in 20%-36% of serrated adenomas (48,49). Consistent with previous evidence, our exploratory analysis showed that BRAF-mut, CIMPþ, and MSI-H were preferentially located in the proximal colon. However, we observed similar associations with small variations, when stratified by location, suggesting that our finding cannot be explained by tumor location. It is also estimated that serrated adenocarcinoma has a less favorable survival than traditional adenocarcinoma (50), which could be partially because of the interaction between smoking and the enrichment of BRAF mutations and CIMP expression levels. Therefore, better understanding of the risk factors of these molecular characteristics may help provide insights to the trajectory of serrated carcinogenesis and preventive and therapeutic implications. Several features in our study provided the opportunity to systematically evaluate associations between smoking and molecular subgroups of CRC. First, this is the largest study to investigate these associations with sufficient statistical power for primary analysis. In addition, we combined CRC subtypes by all 4 tumor markers, providing a more comprehensive analysis for tumor characteristics. With sufficient sample size, we were the first to extend the combined subtype analysis beyond 5 previously defined subtypes (28) and thus found a statistically significant association between smoking and new subtypes, suggesting a stronger impact of smoking on the serrated polypcarcinoma pathway. Furthermore, smoking variables and other CRC risk factors were assessed and harmonized among all participating studies, which allowed us to further adjust for potential confounders in sensitivity analysis.
There are also limitations. We did not investigate all 16 possible combinations of CRC subtypes, and the conclusions could not be inferred for the rarer subtypes. Both case-control and cohort studies were included. There is a possibility of misclassification of smoking status, especially in case-control studies because of recall bias. However, we observe similar trends of smoking-CRC associations when stratified by study design We also found almost identical estimates using random-effect meta-analysis across study-specific estimates of smoking-CRC associations (data not shown). In exploratory stratified analysis, we found potential variation in smoking-CRC association by sex or colonic locations. However, these exploratory results warrant further investigation in the future. Although we adjusted for several potential confounders in sensitivity analysis, we could not rule out the possibility of unmeasured confounding. Lastly, our study population was of European ancestry only. Therefore, our conclusions may not be generalizable to other race and ethnicity groups.
In conclusion, we found that heavier smoking was associated with higher risk among all subtypes of CRC, particularly for those that may arise from serrated polyp pathways. These findings may help better understand the tumorigenesis of serrated adenomas and provide insights to targeted CRC prevention and treatment.  . Associations between smoking pack-years and risk of CRC subtypes defined by combined marker status. Two-sided Wald test was used to calculate the P values from the case-control analysis (N control ¼ 10 199) and case-only analysis (P difference ). A Bonferroni corrected P value threshold of .005 was used for both case-control and case-only analyses. Error bars represent the 95% confidence intervals (CIs

Notes
Role of Funders: The funders had no role in the design of the study; the collection, analysis, and interpretation of the data; the writing of the manuscript; and the decision to submit the manuscript for publication.