Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Resurrecting the alternative splicing landscape of archaic hominins using machine learning

Abstract

Alternative splicing contributes to adaptation and divergence in many species. However, it has not been possible to directly compare splicing between modern and archaic hominins. Here, we unmask the recent evolution of this previously unobservable regulatory mechanism by applying SpliceAI, a machine-learning algorithm that identifies splice-altering variants (SAVs), to high-coverage genomes from three Neanderthals and a Denisovan. We discover 5,950 putative archaic SAVs, of which 2,186 are archaic-specific and 3,607 also occur in modern humans via introgression (244) or shared ancestry (3,520). Archaic-specific SAVs are enriched in genes that contribute to traits potentially relevant to hominin phenotypic divergence, such as the epidermis, respiration and spinal rigidity. Compared to shared SAVs, archaic-specific SAVs occur in sites under weaker selection and are more common in genes with tissue-specific expression. Further underscoring the importance of negative selection on SAVs, Neanderthal lineages with low effective population sizes are enriched for SAVs compared to Denisovan and shared SAVs. Finally, we find that nearly all introgressed SAVs in humans were shared across the three Neanderthals, suggesting that older SAVs were more tolerated in human genomes. Our results reveal the splicing landscape of archaic hominins and identify potential contributions of splicing to phenotypic differences among hominins.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The identification, distribution and origin of archaic SAVs.
Fig. 2: Genes with archaic-specific SAVs are enriched for roles in many phenotypes.
Fig. 3: Most SAVs result in isoforms that trigger NMD or yield altered transcripts and proteins.
Fig. 4: Lineage-specific archaic variants are enriched for SAVs compared to shared archaic variants.
Fig. 5: Introgressed SAVs present in modern humans were shared across archaic individuals and are associated with increased tissue specificity.
Fig. 6: Example archaic SAVs leading to NMD in loci with evidence of recent adaptive evolution.

Similar content being viewed by others

Data availability

The SpliceAI annotated archaic variant dataset is available on Dryad94. Source data are provided with this paper.

Code availability

The archived version of the code used to conduct analyses and generate figures has been deposited in Zenodo95. A non-archived version is available on GitHub (https://github.com/brandcm/Archaic_Splicing).

References

  1. Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Prüfer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).

    Article  PubMed  Google Scholar 

  3. Prüfer, K. et al. A high-coverage Neandertal genome from Vindija Cave in Croatia. Science 358, 655–658 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Mafessoni, F. et al. A high-coverage Neandertal genome from Chagyrskaya Cave. Proc. Natl Acad. Sci. USA 117, 15132 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Brand, C. M., Colbran, L. L. & Capra, J. A. Predicting archaic hominins phenotypes from genomic data. Annu. Rev. Genomics Hum. Genet. 23, 591–612 (2022).

  6. Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Castellano, S. et al. Patterns of coding variation in the complete exomes of three Neandertals. Proc. Natl Acad. Sci. USA 111, 6666 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Colbran, L. L. et al. Inferred divergent gene regulation in archaic hominins reveals potential phenotypic differences. Nat. Ecol. Evol. 3, 1598–1606 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Gokhman, D. et al. Reconstructing Denisovan anatomy using DNA methylation maps. Cell 179, 180–192 (2019).

    Article  CAS  PubMed  Google Scholar 

  10. McArthur, E. et al. Reconstructing the 3D genome organization of Neanderthals reveals that chromatin folding shaped phenotypic and sequence divergence. Preprint at bioRxiv https://doi.org/10.1101/2022.02.07.479462 (2022).

  11. Lopez, A. J. Alternative slicing of pre-mRNA: developmental consequences and mechanisms of regulation. Annu. Rev. Genet. 32, 279–305 (1998).

    Article  CAS  PubMed  Google Scholar 

  12. Graveley, B. R. Alternative splicing: Increasing diversity in the proteomic world. Trends Genet. 17, 100–107 (2001).

    Article  CAS  PubMed  Google Scholar 

  13. Black, D. L. Mechanisms of alternative pre-messenger RNA splicing. Annu. Rev. Biochem. 72, 291–336 (2003).

    Article  CAS  PubMed  Google Scholar 

  14. Baralle, F. E. & Giudice, J. Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol. 18, 437–451 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Cáceres, J. F. & Kornblihtt, A. R. Alternative splicing: multiple control mechanisms and Involvement in human disease. Trends Genet. 18, 186–193 (2002).

    Article  PubMed  Google Scholar 

  16. Faustino, N. A. & Cooper, T. A. Pre-mRNA splicing and human disease. Genes Dev. 17, 419–437 (2003).

    Article  CAS  PubMed  Google Scholar 

  17. Nissim-Rafinia, M. & Kerem, B. Splicing regulation as a potential genetic modifier. Trends Genet. 18, 123–127 (2002).

    Article  CAS  PubMed  Google Scholar 

  18. Krawczak, M., Reiss, J. & Cooper, D. N. The mutational spectrum of single base-pair substitutions in mRNA splice junctions of human genes: causes and consequences. Hum. Genet. 90, 41–54 (1992).

    Article  CAS  PubMed  Google Scholar 

  19. Wang, G.-S. & Cooper, T. A. Splicing in disease: disruption of the splicing code and the decoding machinery. Nat. Rev. Genet. 8, 749–761 (2007).

    Article  CAS  PubMed  Google Scholar 

  20. Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Scotti, M. M. & Swanson, M. S. RNA mis-splicing in disease. Nat. Rev. Genet. 17, 19–32 (2016).

    Article  CAS  PubMed  Google Scholar 

  22. Li, X. et al. The impact of rare variation on gene expression across tissues. Nature 550, 239–243 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Verta, J.-P. & Jacobs, A. The role of alternative splicing in adaptation and evolution. Trends Ecol. Evol. 37, 299–308 (2022).

    Article  CAS  PubMed  Google Scholar 

  24. Singh, P. & Ahi, E. P. The importance of alternative splicing in adaptive evolution. Mol. Ecol. 31, 1928–1938 (2022).

  25. Wright, C. J., Smith, C. W. J. & Jiggins, C. D. Alternative splicing as a source of phenotypic diversity. Nat. Rev. Genet. 23, 697–710 (2022).

  26. Blekhman, R., Marioni, J. C., Zumbo, P., Stephens, M. & Gilad, Y. Sex-specific and lineage-specific alternative splicing in primates. Genome Res. 20, 180–189 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Barbosa-Morais, N. L. et al. The evolutionary landscape of alternative splicing in vertebrate species. Science 338, 1587–1593 (2012).

    Article  CAS  PubMed  Google Scholar 

  28. Merkin, J., Russell, C., Chen, P. & Burge, C. B. Evolutionary dynamics of gene and isoform regulation in mammalian tissues. Science 338, 1593–1599 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Sibley, C. R., Blazquez, L. & Ule, J. Lessons from non-canonical splicing. Nat. Rev. Genet. 17, 407–421 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Jenkinson, G. et al. LeafCutterMD: an algorithm for outlier splicing detection in rare diseases. Bioinformatics 36, 4609–4615 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Zhang, Y., Liu, X., MacLeod, J. & Liu, J. Discerning novel splice junctions derived from RNA-seq alignment: a deep learning approach. BMC Genomics 19, 971 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Mertes, C. et al. Detection of aberrant splicing events in RNA-seq data using FRASER. Nat. Commun. 12, 529 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Cheng, J. et al. MMSplice: modular modeling improves the predictions of genetic variant effects on splicing. Genome Biol. 20, 48 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Jagadeesh, K. A. et al. S-CAP extends pathogenicity prediction to genetic variants that affect RNA splicing. Nat. Genet. 51, 755–763 (2019).

    Article  CAS  PubMed  Google Scholar 

  35. Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548 (2019).

    Article  CAS  PubMed  Google Scholar 

  36. Danis, D. et al. Interpretable prioritization of splice variants in diagnostic next-generation sequencing. Am. J. Hum. Genet. 108,1564–1577 (2021).

  37. Zeng, T. & Li, Y. I. Predicting RNA splicing from DNA sequence using pangolin. Genome Biol. 23, 103 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Collins, L. & Penny, D. Complex spliceosomal organization ancestral to extant eukaryotes. Mol. Biol. Evol. 22, 1053–1066 (2005).

    Article  CAS  PubMed  Google Scholar 

  39. Tweedie, S. et al. Genenames.Org: the HGNC and VGNC resources in 2021. Nucleic Acids Res. 49, D939–D946 (2021).

    Article  CAS  PubMed  Google Scholar 

  40. Lowy-Gallego, E. et al. Variant calling on the GRCh38 assembly with the data from phase three of the 1000 Genomes Project. Wellcome Open Res. 4, 50 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 122 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Aqeilan, R. I. et al. The WWOX tumor suppressor is essential for postnatal survival and normal bone metabolism. J. Biol. Chem. 283, 21629–21639 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Shiina, T., Hosomichi, K., Inoko, H. & Kulski, J. K. The HLA Genomic Loci Map: expression, interaction, diversity and disease. J. Hum. Genet. 54, 15–39 (2009).

    Article  CAS  PubMed  Google Scholar 

  45. Rodenas-Cuadrado, P., Ho, J. & Vernes, S. C. Shining a light on CNTNAP2: complex functions to complex disorders. Eur. J. Hum. Genet. 22, 171–178 (2014).

    Article  CAS  PubMed  Google Scholar 

  46. Rogers, A. R., Harris, N. S. & Achenbach, A. A. Neanderthal-Denisovan ancestors interbred with a distantly related hominin. Sci. Adv. 6, eaay5483 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Vernot, B. et al. Excavating Neandertal and Denisovan DNA from the genomes of Melanesian individuals. Science 352, 235–239 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Browning, S. R., Browning, B. L., Zhou, Y., Tucci, S. & Akey, J. M. Analysis of human sequence data reveals two pulses of archaic Denisovan admixture. Cell 173, 53–61 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Buniello, A. et al. The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).

    Article  CAS  PubMed  Google Scholar 

  50. Köhler, S. et al. The human phenotype ontology in 2021. Nucleic Acids Res. 49, D1207–D1217 (2021).

    Article  PubMed  Google Scholar 

  51. Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Kriventseva, E. V. et al. Increase of functional diversity by alternative splicing. Trends Genet. 19, 124–128 (2003).

    Article  CAS  PubMed  Google Scholar 

  53. Rong, S. et al. Large scale functional screen identifies genetic variants with splicing effects in modern and archaic humans. Preprint at bioRxiv https://doi.org/10.1101/2022.11.20.515225 (2022).

  54. Petr, M., Pääbo, S., Kelso, J. & Vernot, B. Limits of long-term selection against Neandertal introgression. Proc. Natl Acad. Sci. USA 116, 1639 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Telis, N., Aguilar, R. & Harris, K. Selection against archaic hominin genetic variation in regulatory regions. Nat. Ecol. Evol. 4, 1558–1566 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  56. McArthur, E., Rinker, D. C. & Capra, J. A. Quantifying the contribution of Neanderthal introgression to the heritability of complex traits. Nat. Commun. 12, 4481 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Aqil, A., Speidel, L., Pavlidis, P. & Gokcumen, O. Balancing selection on genomic deletion polymorphisms in humans. eLife https://doi.org/10.7554/eLife.79111 (2023).

  58. Dannemann, M., Andrés, A. M. & Kelso, J. Introgression of Neandertal- and Denisovan-like haplotypes contributes to adaptive variation in human toll-like receptors. Am. J. Hum. Genet. 98, 22–33 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. McCoy, R. C., Wakefield, J. & Akey, J. M. Impacts of Neanderthal-introgressed sequences on the landscape of human gene expression. Cell 168, 916–927 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Saudemont, B. et al. The fitness cost of mis-splicing is the main determinant of alternative splicing patterns. Genome Biol. 18, 208 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  61. Mendez, F. L., Watkins, J. C. & Hammer, M. F. Global genetic variation at OAS1 provides evidence of archaic admixture in Melanesian populations. Mol. Biol. Evol. 29, 1513–1520 (2012).

    Article  CAS  PubMed  Google Scholar 

  62. Sams, A. J. et al. Adaptively introgressed Neandertal haplotype at the OAS locus functionally impacts innate immune responses in humans. Genome Biol. 17, 246 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Rinker, D. C. et al. Neanderthal introgression reintroduced functional ancestral alleles lost in Eurasian populations. Nat. Ecol. Evol. 4, 1332–1341 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Huerta-Sánchez, E. et al. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature 512, 194–197 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Jeong, C. et al. Detecting past and ongoing natural selection among ethnically Tibetan women at high altitude in Nepal. PLoS Genet. 14, e1007650 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  66. Peng, Y. et al. Down-regulation of EPAS1 transcription and genetic adaptation of Tibetans to high-altitude hypoxia. Mol. Biol. Evol. 34, 818–830 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. Andrés, A. M. et al. Balancing selection maintains a form of ERAP2 that undergoes nonsense-mediated decay and affects antigen presentation. PLoS Genet. 6, e1001157 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Trujillo, C. A. et al. Reintroduction of the archaic variant of NOVA1 in cortical organoids alters neurodevelopment. Science 371, eaax2537 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Karlebach, G. et al. The impact of biological sex on alternative splicing. Preprint at bioRxiv https://doi.org/10.1101/490904 (2020).

  70. Rogers, T. F., Palmer, D. H. & Wright, A. E. Sex-specific selection drives the evolution of alternative splicing in birds. Mol. Biol. Evol. 38, 519–530 (2021).

    Article  CAS  PubMed  Google Scholar 

  71. Ge, Y. & Porse, B. T. The functional consequences of intron retention: alternative splicing coupled to NMD as a regulator of gene expression. BioEssays 36, 236–243 (2014).

    Article  CAS  PubMed  Google Scholar 

  72. Smith, J. E. & Baker, K. E. Nonsense-mediated RNA decay—a switch and dial for regulating gene expression. BioEssays 37, 612–623 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Chollet, F. et al. Keras. Github https://github.com/fchollet/keras (2015).

  75. Abadi, M. et al. TensorFlow: large-scale machine learning on heterogeneous systems. Preprint at arXiv https://doi.org/10.48550/arXiv.1603.04467 (2016).

  76. Harrow, J. et al. GENCODE: the reference human genome annotation for the ENCODE Project. Genome Res. 22, 1760–1774 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Hinrichs, A. S. et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 34, D590–D598 (2006).

    Article  CAS  PubMed  Google Scholar 

  78. Plagnol, V. & Wall, J. D. Possible ancestral structure in human populations. PLoS Genet. 2, e105 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  79. Vernot, B. & Akey, J. M. Resurrecting surviving Neandertal lineages from modern human genomes. Science 343, 1017–1021 (2014).

    Article  CAS  PubMed  Google Scholar 

  80. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 Gene List enrichment analysis tool. BMC Bioinf. 14, 128 (2013).

    Article  Google Scholar 

  82. Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis Web Server 2016 Update. Nucleic Acids Res. 44, W90–W97 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Xie, Z. et al. Gene set knowledge discovery with Enrichr. Curr. Protoc. 1, e90 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  84. Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  86. Vallat, R. Pingouin: statistics in Python. J. Open Source Softw. 3, 1026 (2018).

    Article  Google Scholar 

  87. Inkscape Project version 1.1.2 (Inkscape, 2020).

  88. Wickham, H. Ggplot2: Elegant Graphics for Data Analysis (Springer-Verlag, 2016).

  89. R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2020).

  90. Krassowski, M. ComplexUpset. Github https://github.com/krassowski/complex-upset (2020).

  91. Larsson, J. eulerr: Area-proportional Euler and Venn diagrams with ellipses manual. R package version 6.1.1 (2021).

  92. Wickham, H. Reshaping data with the RESHAPE package. J. Stat. Softw. 21, 1–20 (2007).

  93. Wickham, H. et al. Welcome to the tidyverse. J. Open Source Softw. 4, 1686 (2019).

    Article  Google Scholar 

  94. Brand C. M. et al. Splice altering variant predictions in four archaic hominin genomes. Dryad https://doi.org/10.7272/Q6H993F9 (2023).

  95. Brand C. M. et al. Code from: Resurrecting the alternative splicing landscape of archaic hominins using machine learning. Zenodo https://doi.org/10.5281/zenodo.7844032 (2023).

Download references

Acknowledgements

We thank M. L. Benton for kindly sharing data on tissue specificity and Z. Gao for helpful discussion on recurrent mutations. E. McArthur and D. Rinker provided comments that improved this manuscript. We also thank members of the Capra Lab for feedback on figures. This research greatly benefited from access to the Wynton high-performance compute cluster at the University of California, San Francisco. L.L.C. was funded by National Institutes of Health grant no. T32HG009495 to the University of Pennsylvania. J.A.C. and C.M.B. were funded by National Institutes of Health grant no. R35GM127087.

Author information

Authors and Affiliations

Authors

Contributions

The work was conceived by C.M.B., L.L.C. and J.A.C. Formal analysis was undertaken by C.M.B. and L.L.C. The manuscript was drafted, reviewed and edited by all authors.

Corresponding author

Correspondence to John A. Capra.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Ecology & Evolution thanks Maxime Rotival and Peter Robinson for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Shared phenotype enrichment.

(A) Phenotype associations enriched among genes with archaic-specific shared SAVs based on annotations from the 2019 GWAS Catalog. Phenotypes are ordered by increasing enrichment within manually curated systems. Circle size indicates enrichment magnitude. Enrichment and p-values were calculated from a one-sided permutation test based on an empirical null distribution generated from 10,000 shuffles of maximum ∆ across the entire dataset (Methods). Dotted and dashed lines represent false-discovery rate (FDR) corrected p-value thresholds at FDR = 0.05 and 0.1, respectively. At least one example phenotype with a p-value ≤ the stricter FDR threshold (0.05) is annotated per system. (B) Phenotypes enriched among genes with archaic-specific shared SAVs based on annotations from the Human Phenotype Ontology (HPO). Data were generated and visualized as in A. See Supplementary Data 2 for all phenotype enrichment results.

Source data

Extended Data Fig. 2 Altai phenotype enrichment.

(A) Phenotype associations enriched among genes with archaic-specific Altai SAVs based on annotations from the 2019 GWAS Catalog. Phenotypes are ordered by increasing enrichment within manually curated systems. Circle size indicates enrichment magnitude. Enrichment and p-values were calculated from a one-sided permutation test based on an empirical null distribution generated from 10,000 shuffles of maximum ∆ across the entire dataset (Methods). Dotted and dashed lines represent false-discovery rate (FDR) corrected p-value thresholds at FDR = 0.05 and 0.1, respectively. At least one example phenotype with a p-value ≤ the stricter FDR threshold (0.05) is annotated per system. (B) Phenotypes enriched among genes with archaic-specific Altai SAVs based on annotations from the Human Phenotype Ontology (HPO). Data were generated and visualized as in A. See Supplementary Data 2 for all phenotype enrichment results.

Source data

Extended Data Fig. 3 Chagyrskaya phenotype enrichment.

(A) Phenotype associations enriched among genes with archaic-specific Chagyrskaya SAVs based on annotations from the 2019 GWAS Catalog. Phenotypes are ordered by increasing enrichment within manually curated systems. Circle size indicates enrichment magnitude. Enrichment and p-values were calculated from a one-sided permutation test based on an empirical null distribution generated from 10,000 shuffles of maximum ∆ across the entire dataset (Methods). Dotted and dashed lines represent false-discovery rate (FDR) corrected p-value thresholds at FDR = 0.05 and 0.1, respectively. At least one example phenotype with a p-value ≤ the stricter FDR threshold (0.05) is annotated per system. (B) Phenotypes enriched among genes with archaic-specific Chagyrskaya SAVs based on annotations from the Human Phenotype Ontology (HPO). Data were generated and visualized as in A. See Supplementary Data 2 for all phenotype enrichment results.

Source data

Extended Data Fig. 4 Denisovan phenotype enrichment.

(A) Phenotype associations enriched among genes with archaic-specific Denisovan SAVs based on annotations from the 2019 GWAS Catalog. Phenotypes are ordered by increasing enrichment within manually curated systems. Circle size indicates enrichment magnitude. Enrichment and p-values were calculated from a one-sided permutation test based on an empirical null distribution generated from 10,000 shuffles of maximum ∆ across the entire dataset (Methods). Dotted and dashed lines represent false-discovery rate (FDR) corrected p-value thresholds at FDR = 0.05 and 0.1, respectively. At least one example phenotype with a p-value ≤ the stricter FDR threshold (0.05) is annotated per system. (B) Phenotypes enriched among genes with archaic-specific Denisovan SAVs based on annotations from the Human Phenotype Ontology (HPO). Data were generated and visualized as in A. See Supplementary Data 2 for all phenotype enrichment results.

Source data

Extended Data Fig. 5 Neanderthal phenotype enrichment.

(A) Phenotype associations enriched among genes with archaic-specific Neanderthal SAVs based on annotations from the 2019 GWAS Catalog. Phenotypes are ordered by increasing enrichment within manually curated systems. Circle size indicates enrichment magnitude. Enrichment and p-values were calculated from a one-sided permutation test based on an empirical null distribution generated from 10,000 shuffles of maximum ∆ across the entire dataset (Methods). Dotted and dashed lines represent false-discovery rate (FDR) corrected p-value thresholds at FDR = 0.05 and 0.1, respectively. At least one example phenotype with a p-value ≤ the stricter FDR threshold (0.05) is annotated per system. CVD = cardiovascular disease, Lp-PLA2 = Lipoprotein phospholipase A2. (B) Phenotypes enriched among genes with archaic-specific Neanderthal SAVs based on annotations from the Human Phenotype Ontology (HPO). Data were generated and visualized as in A. See Supplementary Data 2 for all phenotype enrichment results.

Source data

Extended Data Fig. 6 Vindija phenotype enrichment.

(A) Phenotype associations enriched among genes with archaic-specific Vindija SAVs based on annotations from the 2019 GWAS Catalog. Phenotypes are ordered by increasing enrichment within manually curated systems. Circle size indicates enrichment magnitude. Enrichment and p-values were calculated from a one-sided permutation test based on an empirical null distribution generated from 10,000 shuffles of maximum ∆ across the entire dataset (Methods). Dotted and dashed lines represent false-discovery rate (FDR) corrected p-value thresholds at FDR = 0.05 and 0.1, respectively. At least one example phenotype with a p-value ≤ the stricter FDR threshold (0.05) is annotated per system. (B) Phenotypes enriched among genes with archaic-specific Vindija SAVs based on annotations from the Human Phenotype Ontology (HPO). Data were generated and visualized as in A. See Supplementary Data 2 for all phenotype enrichment results.

Source data

Extended Data Fig. 7 Modelling SAV effects on the canonical transcript.

We used the SpliceAI output to construct a novel transcript per SAV by modifying the canonical transcript for that gene. We considered only one effect per SAV (for example, either an acceptor gain, acceptor loss, donor gain or donor loss) based on the effect with the largest ∆. Therefore, we did not model multiple effects for a single SAV (for example, an acceptor gain and acceptor loss). Here, we illustrate all the possible consequences of a SAV for each of the four classes. We indicate the variant position with a red ‘X’ and the position of the effect with a red vertical line (sequence deletion) or box (sequence addition). Each scenario includes a two exon gene (boxes) with a single intron (horizontal line).

Extended Data Fig. 8 ∆ max exhibits a variable relationship to 1KG allele frequency.

(A) 1KG allele frequency and ∆ max for all ancient variants per48. Allele frequencies are from 1KG. Dashed lines reflect both ∆ thresholds. (B) 1KG allele frequency and ∆ max for all introgressed variants per48. Allele frequencies are from 1KG. If the introgressed allele was the reference allele, we subtracted the 1KG allele frequency from 1. (C) 1KG allele frequency and ∆ max for all ancient variants per47. Allele frequencies are from 1KG. (D) 1KG allele frequency and ∆ max for all introgressed variants per47. Allele frequencies represent the mean from the AFR, AMR, EAS, EUR, SAS frequencies from the47 metadata.

Source data

Extended Data Fig. 9 Vernot et al. 2016 introgressed phenotype enrichment.

(A) Phenotype associations enriched among genes with47 introgressed SAVs based on annotations from the 2019 GWAS Catalog. Phenotypes are ordered by increasing enrichment within manually curated systems. Circle size indicates enrichment magnitude. Enrichment and p-values were calculated from a one-sided permutation test based on an empirical null distribution generated from 10,000 shuffles of maximum ∆ across the entire dataset (Methods). Dotted and dashed lines represent false-discovery rate (FDR) corrected p-value thresholds at FDR = 0.05 and 0.1, respectively. At least one example phenotype with a p-value ≤ the stricter FDR threshold (0.05) is annotated per system. (B) Phenotypes enriched among genes with47 introgressed SAVs based on annotations from the Human Phenotype Ontology (HPO). Data were generated and visualized as in A. See Supplementary Data 2 for all phenotype enrichment results.

Source data

Extended Data Fig. 10 Browning et al. 2018 introgressed phenotype enrichment.

(A) Phenotype associations enriched among genes with48 introgressed SAVs based on annotations from the 2019 GWAS Catalog. Phenotypes are ordered by increasing enrichment within manually curated systems. Circle size indicates enrichment magnitude. Enrichment and p-values were calculated from a one-sided permutation test based on an empirical null distribution generated from 10,000 shuffles of maximum ∆ across the entire dataset (Methods). Dotted and dashed lines represent false-discovery rate (FDR) corrected p-value thresholds at FDR = 0.05 and 0.1, respectively. At least one example phenotype with a p-value ≤ the stricter FDR threshold (0.05) is annotated per system. CEA = carcinoembryonic antigen, FGF = fibroblast growth factor. (B) Phenotypes enriched among genes with48 introgressed SAVs based on annotations from the Human Phenotype Ontology (HPO). Data were generated and visualized as in A. See Supplementary Data 2 for all phenotype enrichment results.

Source data

Supplementary information

Supplementary Information

Supplementary Text, Figs. 1–27 and Tables 1–15.

Reporting Summary

Peer Review File

Supplementary Data

Supplementary Data 1, 2 and 3. Ensembl VEP predictions for archaic-specific spliceosome variants, phenotype enrichment results and splice variant predicted effects on the resulting transcript and protein.

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical Source data.

Source Data Fig. 5

Statistical source data.

Source Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 9

Statistical source data.

Source Data Extended Data Fig. 10

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Brand, C.M., Colbran, L.L. & Capra, J.A. Resurrecting the alternative splicing landscape of archaic hominins using machine learning. Nat Ecol Evol 7, 939–953 (2023). https://doi.org/10.1038/s41559-023-02053-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41559-023-02053-5

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research