Recent studies demonstrated that the polygenic background, defined as PRS based on disease-associated SNPs, modifies the risks for several cancers of the general population including CRC considerably, both in terms of age at onset and cumulative lifetime risks [12, 23, 27, 35–37]. In line with this, the risk alleles of those SNPs are found to also accumulate in unexplained familial and early-onset CRC cases [25, 38]. Whereas a low polygenic burden decreases the CRC risk down to one quarter on average, individuals with a high PRS (> 80%) doubles and those with a very high PRS (99%) almost quadruplicate their risk and thus, reach a CRC risk in an order of magnitude almost comparable to carriers of hereditary CRC with low PRS [31]. In a pervious study, Jia et al. found that the risk of CRC is significantly associated with its PRS: Compared with individuals in the lowest PRS quintile those in the highest quintile had a greater than threefold risk (during a 5.8-year follow-up period). Hazard Ratios estimated with the middle quintile as the reference resulted in a risk between 0.56–1.71, a threefold risk in those in the top 1% of PRS, and a 70% reduced CRC risk for individuals in the bottom 1% of the PRS [37].
To extend these studies on how the CRC prevalence is influenced by genetic susceptibility using, we used the sufficiently larger, more robust dataset of the most recent UKBB cohort, incorporate the family history (FH) as an additional factor for risk stratification, and include a single gene analysis. We considered both the genetic component driven by rare high-penetrance PV associated with hereditary CRC and common low-penetrance variants captured by the PRS.
Firstly, our results confirm that the polygenic background strongly modulates CRC risk in the general population. Compared to the average polygenic burden, individuals with a low (< 20%) or high (> 80%) PRS are estimated to have a 0.5-fold or 2.1-fold change in the odds for CRC, respectively. The additional time-to-event analysis revealed a corresponding cumulative lifetime risk of 6% and 22% by age 75. Hence, when the PRS is included in risk calculation, around 20% of healthy individuals of the general population with no FH of CRC have a doubled CRC risk, which is similar to those with a first degree relative affected by CRC [39]. These so far unknown and otherwise unrecognisable at-risk individuals might need surveillance 10–15 years earlier than usually recommended [40]. On the other hand, the around 20% of individuals with low PRS and no FH might need less surveillance than the general population due to a considerably lowered risk, while even those with low PRS and positive FH might not need a more intense surveillance than the general population.
It is well known that among patients with hereditary CRC syndromes, the age of onset and cumulative CRC incidence is very heterogeneous, even within PV carriers of the same family. The estimated gene-specific, individual CRC lifetime risks of LS patients with MLH1 or MSH2 PV can be lower than 10% but as high as 90%-100% in a considerable fraction. In the past, the analysis of modifying effects based on common CRC-associated variants in LS and other high-risk groups has been restricted to selected cohorts and small subsets of SNPs [41, 42]. A recent study demonstrated that the polygenic background also substantially influences the CRC risk in LS using UKBB data, even though the ORs for CRC risks could only be predicted due to the small sample sizes [31]. In the present work, ORs could be calculated directly from the model since over three times more UKBB individuals have been included with six times more CRC cases, and five times more PV carriers.
So secondly, we were able to show that the PRS modifies the CRC risks not only in the general population considerably, but also in carriers of a MMR gene PV identified in the general population. For the first time we demonstrated, that this is also true for APC PV. Depending on the PRS, the cumulative CRC lifetime incidence in PV carriers ranged between 40% and 74%, and thus, the PRS is able to explain parts of the interindividual variation in CRC risk among PV carriers.
However, the single-gene analysis revealed heterogeneous effects across genes and therefore the modifying role of the polygenic background should be framed within the absolute risk attributable to individual genes. As expected, the effect of the PRS seems to be relevant in particular in less penetrant CRC risk genes such as PMS2 where the OR ranges between 0.94 and 5.43 respectively (Supplementary Table S6). This is in line with findings in moderate breast cancer risk genes such as CHEK2, PALB2 and ATM [43–45] and suggests that PRS inclusion in risk stratification may in particular be relevant to prevent excess of surveillance measures in PV carriers of those genes.
In addition, our results provide evidence that the inclusion of FH can further and independently improve the risk stratification in both carriers and non-carriers. Including PRS and FH in risk assessment, the cumulative CRC lifetime incidence ranged between 8% and 26%, and in PV carriers between 30% and 98%, and thus, outperformed the consideration of a single risk factor. This suggests that familial clustering points to additional risk factors besides those captured by common low-risk SNPs (PRS) and rare PV [46, 47]. These might be common and rare structural genetic alterations including copy number variants, rare non-coding variants, or other intermediate and low-impact risk variants not included routinely in PRS models, and non-genetic contributors such as environmental / lifestyle factors.
Only few PRS studies considered the FH. In line with our results, Jenkins et al. found no correlation between SNP-based and FH-based risks and an improved risk stratification when both PRS and FH are considered [46]. In the analyses by Jia et al., the AUC derived from PRS (0.609) was substantially higher compared to the one derived using FH (0.523). Adding PRS and FH of cancer in first-degree relatives improved the model’s discriminatory performance (AUC 0.613) [17, 48]. Our AUC calculations point in the same direction with a higher AUC (0.704) when all three risk factors (PRS, FH, carrier status) are considered.
Interestingly and in apparent contrast to our results and those of others, a study using 826 European-descent carriers of PV in the DNA MMR genes MLH1, MSH2, MSH6, PMS2, and EPCAM (i.e. LS carriers) from the Colon Cancer Family Registry (CCFR) did not find evidence of an association between the PRS and CRC risk, irrespective of sex or mutated gene, although an almost identical set of SNPs was used for PRS calculations [49]. A reason which might partly explain different risk estimates between studies using individuals from a population-based repository such as the UKBB and those using curated clinical data registries, where patients / families with suspected hereditary disease are included (e.g. the CCFR), is a potentially different risk composition across cohorts recruited in different ways (recruitment bias). That way, a familial clustering of CRC might reflect the existence of several genetic and non-genetic risk factors as outlined above, which are not captured by the PRS and which may superimpose the polygenic impact.
In particular, the composition of cases and controls is different between the Jenkins et al. study on the one hand and the Fahed et al. and present study on the other hand. In the Jenkins et al. study, obviously both cases (i.e., PV carriers with CRC) and controls (healthy PV carriers) derived from the same LS families, while the UKBB controls are PV carriers not apparently related to the PV cases. This is also reflected by the different ratio between cases and controls (7.5% CRC cases among PV carriers in the present study, but 61% in the Jenkins et al. study). Hence, the controls in the Jenkins et al. study are relatives of the cases and thus, it is likely that they share parts of the polygenic background and other risk factors of their affected relatives (cases) to a certain extent which may explain the observed missing effect of the PRS. The comparison between population-based and registry-based predictions indicates that the study design and recruitment strategy may strongly influence the results and conclusions. Consequently, the application of PRS in clinical practice should consider the familial background and ascertainment of the patient.
Our data analyses provide evidence that the PRS acts as a relevant risk modifier for CRC among both the general population and population-based PV carriers in genes causing hereditary CRC. The findings of us and others qualify the PRS as important component of risk stratification and resulting risk-adapted surveillance strategies in terms of age of onset and frequency. Given the risk distribution across PRS groups, the PRS can define a considerable proportion of the general population at a CRC risk level which is considered sufficient for a more or a less intensive surveillance. Importantly, the non-carriers with high PRS are a much larger target group compared to PV carriers and thus might generate an even higher preventive effect form a healthcare perspective. A small group of non-carriers with positive FH and high PRS even has CRC risks almost in the same order of magnitude as LS carriers without additional risk factors and thus may need similar intensive surveillance measures.
According to these findings, there should be a potential benefit for both the general population and at-risk individuals carrying PV, from the inclusion of PRS in healthcare prevention policies, as risk-stratified surveillance improves early disease detection and prevention. A recent study demonstrated that individuals with a higher genetic risk benefited more substantially from preventive measures than those with a lower risk: CRC screening was associated with a significantly reduced CRC incidence and more than 30% reduced mortality among individuals with a high PRS high PRS [50, 51]. Preliminary calculations indicate that polygenic-risk-stratified CRC screening could become cost-effective under certain conditions including an AUC value above 0.65 which was reached in our analyses [52].
Based on the striking different penetrance between individual hereditary CRC genes, very recent guidelines start to recommend a more gene-specific surveillance intensity in LS and polyposis [53, 54]. Given the strong modifying effect, the inclusion of additional risk factors will result in a more appropriate, clinically relevant risk stratification. Our results demonstrate that a combined risk assessment including FH and PRS will likely improve precise risk estimations and tailored preventive measures not only in the general population, but also in patients with hereditary disease.
Our study has some limitations. Firstly, there is evidence of a “healthy volunteers” selection bias of the UKBB population (UKBB participants tend to be healthier than the general population), and thus the results might not be completely generalizable in terms of effect sizes [55]. Secondly, we cannot exclude that few carriers of APC PV who were classified as controls, are affected by a polyposis but have not been recognized as such or did not develop CRC due to intensive surveillance and / or prophylactic surgery, so that the calculated CRC risk of APC PV might be slightly underestimated. As in other similar studies, the presence of colorectal polyps could not be considered due to the lack of appropriate data. Thirdly, our risk assessment was based solely on genetic variants and FH and did not include other risk factors. Previous studies on UKBB showed that lifestyle modifiable risk factors play a pivotal role in cancer prevalence, and a shared lifestyle within families could influence FH with the disease [48, 56]. That might explain the partly independent association of the FH and the genetic risk. Finally, although we performed the analysis on the whole UKBB cohort, we could not test the risk stratification generalizability across different populations due to the limited sample size. PRS could be biased towards the European population as PRS was constructed based on European reference GWAS. Thus, these PRS might be a worse predictor in non-European or admixed individuals, as previously discussed in different studies [57].