Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Synonymous mutations reveal genome-wide levels of positive selection in healthy tissues

Abstract

Genetic alterations under positive selection in healthy tissues have implications for cancer risk. However, total levels of positive selection across the genome remain unknown. Passenger mutations are influenced by all driver mutations, regardless of type or location in the genome. Therefore, the total number of passengers can be used to estimate the total number of drivers—including unidentified drivers outside of cancer genes that are traditionally missed. Here we analyze the variant allele frequency spectrum of synonymous mutations from healthy blood and esophagus to quantify levels of missing positive selection. In blood, we find that only 30% of passengers can be explained by single-nucleotide variants in driver genes, suggesting high levels of positive selection for mutations elsewhere in the genome. In contrast, more than half of all passengers in the esophagus can be explained by just the two driver genes NOTCH1 and TP53, suggesting little positive selection elsewhere.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: A model of genetic hitchhiking.
Fig. 2: Synonymous variants in healthy blood.
Fig. 3: Age dependence of the synonymous VAF spectrum in healthy blood.
Fig. 4: Synonymous variants in healthy esophagus.
Fig. 5: Age dependence of the synonymous VAF spectrum in healthy esophagus.
Fig. 6: Fraction of positive selection explained by nonsynonymous variants in the top 50 driver genes in each tissue.
Fig. 7: Targeted alternative driver discovery in individuals without nonsynonymous drivers.

Similar content being viewed by others

Data availability

The principal dataset from Bolton et al. can be downloaded using the link https://raw.githubusercontent.com/papaemmelab/bolton_NG_CH/master/M_long.txt. The dataset from Razavi et al. can be downloaded from the European Genome-Phenome Archive (EGA) under accession no. EGAS00001003755. All synonymous variants analyzed in this manuscript are listed in Supplementary Tables 13. The sequencing data for healthy esophagus were originally reported by Martincorena et al.; they may be found in the EGA under accession codes EGAD00001004158 and EGAD00001004159 and can be downloaded directly from https://www.science.org/doi/suppl/10.1126/science.aau3879/suppl_file/aau3879_tables2.xlsx.

Code availability

All code used in this study will be available on the Blundell laboratory GitHub page: https://github.com/blundelllab/Genetic-hitchhiking.

References

  1. Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 371, 2477–2487 (2014).

    Article  Google Scholar 

  2. Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 371, 2488–2498 (2014).

    Article  Google Scholar 

  3. Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).

    Article  CAS  Google Scholar 

  4. Martincorena, I. et al. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).

    Article  CAS  Google Scholar 

  5. Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).

    Article  CAS  Google Scholar 

  6. Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041.e21 (2017).

    Article  CAS  Google Scholar 

  7. Young, A. L., Challen, G. A., Birmann, B. M. & Druley, T. E. Clonal haematopoiesis harbouring AML-associated mutations is ubiquitous in healthy adults. Nat. Commun. 7, 12484 (2016).

    Article  CAS  Google Scholar 

  8. Young, A. L., Tong, R. S., Birmann, B. M. & Druley, T. E. Clonal haematopoiesis and risk of acute myeloid leukemia. Haematologica https://doi.org/10.3324/haematol.2018.215269 (2019).

  9. Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).

    Article  CAS  Google Scholar 

  10. Loh, P.-R. et al. Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations. Nature 559, 350–355 (2018).

    Article  CAS  Google Scholar 

  11. Loh, P.-R., Genovese, G. & McCarroll, S. A. Monogenic and polygenic inheritance become instruments for clonal selection. Nature https://doi.org/10.1038/s41586-020-2430-6 (2020).

  12. Moore, L. et al. The mutational landscape of normal human endometrial epithelium. Nature 580, 640–646 (2020).

    Article  CAS  Google Scholar 

  13. Abelson, S. et al. Prediction of acute myeloid leukaemia risk in healthy individuals. Nature 559, 400–404 (2018).

    Article  CAS  Google Scholar 

  14. Desai, P. et al. Somatic mutations precede acute myeloid leukemia years before diagnosis. Nat. Med. 24, 1015–1023 (2018).

    Article  CAS  Google Scholar 

  15. Bolton, K. L. et al. Cancer therapy shapes the fitness landscape of clonal hematopoiesis. Nat. Genet. https://doi.org/10.1038/s41588-020-00710-0 (2020).

  16. Razavi, P. et al. High-intensity sequencing reveals the sources of plasma circulating cell-free DNA variants. Nat. Med. 25, 1928–1937 (2019).

    Article  CAS  Google Scholar 

  17. Lawrence, M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).

    Article  CAS  Google Scholar 

  18. Watson, C. J. et al. The evolutionary dynamics and fitness landscape of clonal hematopoiesis. Science 367, 1449–1454 (2020).

    Article  CAS  Google Scholar 

  19. Williams, M. J. et al. Measuring the distribution of fitness effects in somatic evolution by combining clonal dynamics with dN/dS ratios. eLife 9, e48714 (2020).

    Article  CAS  Google Scholar 

  20. Hess, J. M. et al. Passenger hotspot mutations in cancer. Preprint at bioRxiv https://doi.org/10.1101/675801 (2019).

  21. Luria, S. E. & Delbrück, M. Mutations of bacteria from virus sensitivity to virus resistance. Genetics 28, 491–511 (1943).

    Article  CAS  Google Scholar 

  22. Desai, M. M. & Fisher, D. S. Beneficial mutation–selection balance and the effect of linkage on positive selection. Genetics 176, 1759–1798 (2007).

    Article  Google Scholar 

  23. Williams, M. J., Werner, B., Barnes, C. P., Graham, T. A. & Sottoriva, A. Identification of neutral tumor evolution across cancer types. Nat. Genet. 48, 238–244 (2016).

    Article  CAS  Google Scholar 

  24. Loeb, L. A. et al. Extensive subclonal mutational diversity in human colorectal cancer and its significance. Proc. Natl Acad. Sci. USA 116, 26863–26872 (2019).

    Article  CAS  Google Scholar 

  25. Blundell, J. R. et al. The dynamics of adaptive genetic diversity during the early stages of clonal evolution. Nat. Ecol. Evol. 3, 293–301 (2019).

    Article  Google Scholar 

  26. Fusco, D., Gralka, M., Kayser, J., Anderson, A. & Hallatschek, O. Excess of mutational jackpot events in expanding populations revealed by spatial Luria–Delbrück experiments. Nat. Commun. 7, 12760 (2016).

    Article  CAS  Google Scholar 

  27. Schreck, C. F. et al. Impact of crowding on the diversity of expanding populations. Preprint at bioRxiv https://doi.org/10.1101/743534 (2019).

  28. Lohmueller, K. E. et al. Natural selection affects multiple aspects of genetic variation at putatively neutral sites across the human genome. PLoS Genet. 7, e1002326 (2011).

    Article  CAS  Google Scholar 

  29. Simons, B. D. Deep sequencing as a probe of normal stem cell fate and preneoplasia in human epidermis. Proc. Natl Acad. Sci. USA 113, 128–133 (2016).

    Article  CAS  Google Scholar 

  30. Chapman, M. S. et al. Lineage tracing of human embryonic development and foetal haematopoiesis through somatic mutations. Preprint at bioRxiv https://doi.org/10.1101/2020.05.29.088765 (2020).

  31. Gao, T. et al. Interplay between chromosomal alterations and gene mutations shapes the evolutionary trajectory of clonal hematopoiesis. Nat. Commun. 12, 338 (2021).

    Article  CAS  Google Scholar 

  32. Danielsson, M. et al. Longitudinal changes in the frequency of mosaic chromosome Y loss in peripheral blood cells of aging men varies profoundly between individuals. Eur. J. Hum. Genet. 28, 349–357 (2020).

    Article  CAS  Google Scholar 

  33. Thompson, D. J. et al. Genetic predisposition to mosaic Y chromosome loss in blood. Nature 575, 652–657 (2019).

    Article  CAS  Google Scholar 

  34. Miyamoto, T., Weissman, I. L. & Akashi, K. AML1/ETO-expressing nonleukemic stem cells in acute myelogenous leukemia with 8;21 chromosomal translocation. Proc. Natl Acad. Sci. USA 97, 7521–7526 (2000).

    Article  CAS  Google Scholar 

  35. Corces-Zimmerman, M. R. & Majeti, R. Pre-leukemic evolution of hematopoietic stem cells: the importance of early mutations in leukemogenesis. Leukemia 28, 2276–2282 (2014).

    Article  CAS  Google Scholar 

  36. Aguet, F. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

    Article  Google Scholar 

  37. Khurana, E. et al. Role of non-coding sequence variants in cancer. Nat. Rev. Genet. 17, 93–108 (2016).

    Article  CAS  Google Scholar 

  38. Rheinbay, E. et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature 578, 102–111 (2020).

    Article  CAS  Google Scholar 

  39. Kumar, S. et al. Passenger mutations in more than 2,500 cancer genomes: overall molecular functional impact and consequences. Cell https://doi.org/10.1016/j.cell.2020.01.032 (2020).

  40. Li, S. et al. Distinct evolution and dynamics of epigenetic and genetic heterogeneity in acute myeloid leukemia. Nat. Med. 22, 792–799 (2016).

    Article  CAS  Google Scholar 

  41. Gebhard, C. et al. Profiling of aberrant DNA methylation in acute myeloid leukemia reveals subclasses of CG-rich regions with epigenetic or genetic association. Leukemia 33, 26–36 (2019).

    Article  CAS  Google Scholar 

  42. Gerstung, M. et al. The evolutionary history of 2,658 cancers. Nature 578, 122–128 (2020).

    Article  CAS  Google Scholar 

  43. Liu, X. et al. Genetic alterations in esophageal tissues from squamous dysplasia to carcinoma. Gastroenterology 153, 166–177 (2017).

    Article  CAS  Google Scholar 

  44. Colom, B. et al. Spatial competition shapes the dynamic mutational landscape of normal esophageal epithelium. Nat. Genet. https://doi.org/10.1038/s41588-020-0624-3 (2020).

  45. Supek, F., Miñana, B., Valcárcel, J., Gabaldón, T. & Lehner, B. Synonymous mutations frequently act as driver mutations in human cancers. Cell 156, 1324–1335 (2014).

    Article  CAS  Google Scholar 

  46. Sharma, Y. et al. A pan-cancer analysis of synonymous mutations. Nat. Commun. 10, 2569 (2019).

  47. Supek, F., Skunca, N., Repar, J., Vlahovicek, K. & Smuc, T. Translational selection is ubiquitous in prokaryotes. PLoS Genet. 6, e1001004 (2010).

    Article  Google Scholar 

  48. Drummond, D. A. & Wilke, C. O. Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134, 341–352 (2008).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank K. Bolton, A. Zehir and E. Papaemmanuil for sharing unpublished data. We also thank D. Solit, P. Razavi, D. Brown and J. Reis-Filho for sharing data and I. Martincorena for sharing data and for discussions. G.Y.P.P., C.J.W. and J.R.B. are funded by the CRUK Cambridge Centre and CRUK Early Detection Programme. J.R.B. is supported by a UKRI Future Leaders Fellowship. D.S.F. and J.R.B. are supported by the Stand Up to Cancer Foundation and the National Science Foundation via grant no. PHY-1545840.

Author information

Authors and Affiliations

Authors

Contributions

J.R.B. conceived the project. G.Y.P.P. developed the theory with input from J.R.B. and D.S.F. Data analysis methods, plotting and numerical simulations were all developed by G.Y.P.P. with input from J.R.B. and C.J.W. The manuscript was written by G.Y.P.P. and J.R.B., with input from C.J.W. All authors provided comments and edits on the manuscript.

Corresponding authors

Correspondence to Gladys Y. P. Poon or Jamie R. Blundell.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Genetics thanks Ruben van Boxtel, Benjamin Werner and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Model performance in recovering driver mutation rates in simulations.

(a) Our method is able to recover driver mutation rates accurately across a range of mutation rates (5 × 105 simulation runs were performed). At higher driver mutation rate, it is mainly limited by clonal interference which causes clones to reach sizes lower than that predicted by our theory. Best-fit values are presented with their 95% confidence intervals. (b) This shows the simulation (run no. = 15000) corresponding to driver mutation rate μb = 3 × 10-6 (τ = 1 year). The neutral mutation frequency spectrum above Ψ = 3 × 10-3 was fitted with our passenger prediction to infer the underlying driver mutation rates driving the expansions. Simulated data are presented as mean values ± sampling error. (c) The likelihood plot shows the fit for the driver mutation rate and fitness by examining the ‘nonsynonymous’ variant allele (that is driver mutation) frequency spectrum only. It is overlaid with the maximum likelihood value (white cross) and best-fit value found by the Nelder-Mead optimization algorithm (green cross). (d) The likelihood plot shows the best-fit value as well as 95% confidence intervals for the inferred total driver mutation rate from the ‘synonymous’ variant (neutral mutation) allele frequency spectrum based on the inferred fitness from the ‘nonsynonymous’ variant allele frequency spectrum.

Extended Data Fig. 2 Developmental mutation rates averages to 2-4 SNVs across entire genome per cell doubling.

(a) SNV VAFs in HSPC single-cell colonies in an 8 - week foetus30 where coverage is 22.6x per colony. SNVs found between 35% - 65% (within the dashed lines) are likely clonal in the colony. (b) SNV VAFs in HSPC single-cell colonies in an 18 - week foetus30 where coverage is 12.2x per colony. SNVs found between 30% - 70% (within the dashed lines) are likely clonal in the colony. (c) The best-fit to the reverse cumulative for the number of mutations per cell doubling per haploid is 1.86 (95% CI = 1.6 - 2.1) for Lee Six et al. data (green line and datapoints), 1.0 (95% CI = 1.0-1.1) for Chapman et al. 8-week foetus (purple line and datapoints) and 1.0 (95% CI = 0.9-1.1) for Chapman et al. 18-week foetus (orange line and datapoints).

Extended Data Fig. 3 Inferring the unobserved driver mutation rate using nonsynonymous VAF spectra in Bolton et al.

The best-fit nonsynonymous VAF spectrum based on the distribution of ages in the cohort (n = 4160) includes nonsynonymous developmental contribution estimated by considering sizes of the genomic regions (light purple line, Supplementary note 3B) and possible nonsynonymous passengers (orange dashed lines). (a) Best-fit haploid driver rate of the most commonly mutated gene (DNMT3A) is 2.9 × 10-6 per year based on the DFE defined by equation 18 (Supplementary note 3C). (b) Best-fit haploid driver rate of the top 5 genes (DNMT3A, TET2, PPM1D, SF3B1, ATM) is 4.1 × 10-6 per year. (c) Best-fit haploid driver rate of the top 10 genes (DNMT3A, TET2, PPM1D, SF3B1, ATM, ASXL1, JAK2, TP53, SRSF2, CHEK2) is 4.8 × 10-6 per year. Data are presented as mean values ± sampling error.

Extended Data Fig. 4 Mutation rates of missing drivers assuming different fitness effects.

(a) The higher the fitness effects of the unobserved drivers, the lower the mutation rate needed to explain the discrepancy in the synonymous VAF density. Inset: Pie chart showing the fraction of explained, unexplained positive selection by observed drivers (all nonsynonymous SNVs on the panel15) and developmental contribution to the observed synonymous VAF spectra. (b) The observed synonymous VAF spectra (data points, variant number = 344) compared to the density predicted by observed drivers and developmental mutations (dashed orange line) and the predicted density by also including unobserved drivers with different fitness effects (solid orange lines). Data are presented as mean values ± sampling error.

Extended Data Fig. 5 Contribution from different parts of the DFE to the predicted passenger spectrum.

(a) The age distribution of the 4160 individuals in Bolton et al.15. (b) The predicted passenger spectrum in healthy blood according to the inferred distribution of fitness effects in healthy blood (Supplementary note 3C, ‘p = 3’) and best-fit total driver mutation rate from the synonymous VAF spectrum in blood (Supplementary note 3E) for the age distribution of the 4160 individuals. (c) The relative contribution to the passenger spectrum of driver mutations with different fitness effects changes as the individual ages. The total (grey line) represents the passenger VAF spectrum contributed by all driver mutations whose fitness s > 3.5%, below which contribution to the passenger spectrum is very small.

Extended Data Fig. 6 Nonsynonymous VAF spectra in Martincorena et al.

(a) The nonsynonymous VAF spectra of the top 10 genes (ranked by nonsynonymous SNV occurrence) were analyzed based on N τ = 7800 (Supplementary note 4C) to estimate their respective fitness and mutation rates. The analysis treats the distribution of fitness effects as delta functions each with a single-valued mutation rate and fitness, taking into account developmental contribution and possible passengers among nonsynonymous SNVs. (b) The nonsynonymous VAF spectra of genes beyond the top 10 (ranked by nonsynonymous SNV occurrence) were analyzed based on the chosen DFE (Supplementary note 3C). Similarly, developmental contribution and possible passengers among nonsynonymous SNVs were taken into account. Data are presented as mean values ± sampling error.

Supplementary information

Supplementary Information

Supplementary Notes 1–4 and Figs. 1–13.

Reporting Summary

Peer Review Information

Supplementary Tables

Table 1. Synonymous SNVs from Bolton et al. that were analyzed. Table 2. Synonymous SNVs from Razavi et al. included. Table 3. Synonymous SNVs from two studies by Young et al. included.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Poon, G.Y.P., Watson, C.J., Fisher, D.S. et al. Synonymous mutations reveal genome-wide levels of positive selection in healthy tissues. Nat Genet 53, 1597–1605 (2021). https://doi.org/10.1038/s41588-021-00957-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-021-00957-1

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer