Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Control-independent mosaic single nucleotide variant detection with DeepMosaic

Abstract

Mosaic variants (MVs) reflect mutagenic processes during embryonic development and environmental exposure, accumulate with aging and underlie diseases such as cancer and autism. The detection of noncancer MVs has been computationally challenging due to the sparse representation of nonclonally expanded MVs. Here we present DeepMosaic, combining an image-based visualization module for single nucleotide MVs and a convolutional neural network-based classification module for control-independent MV detection. DeepMosaic was trained on 180,000 simulated or experimentally assessed MVs, and was benchmarked on 619,740 simulated MVs and 530 independent biologically tested MVs from 16 genomes and 181 exomes. DeepMosaic achieved higher accuracy compared with existing methods on biological data, with a sensitivity of 0.78, specificity of 0.83 and positive predictive value of 0.96 on noncancer whole-genome sequencing data, as well as doubling the validation rate over previous best-practice methods on noncancer whole-exome sequencing data (0.43 versus 0.18). DeepMosaic represents an accurate MV classifier for noncancer samples that can be implemented as an alternative or complement to existing methods.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Image representation, model training strategies and framework of DeepMosaic.
Fig. 2: DeepMosaic performance on simulated benchmark variants.
Fig. 3: DeepMosaic performance validated on biological data.

Similar content being viewed by others

Data availability

WGS data used to generate the training set are available at the SRA (accession nos. SRP028833 and SRP100797, BioData1). The gold-standard WGS data and validated capstone project data are available at the National Institute of Mental Health Data Archive (NIMH Data Archive ID 792 and 919: https://nda.nih.gov/study.html?id=792, BioData2, and https://nda.nih.gov/study.html?id=919, BioData3) and the Brain Somatic Mosaicism Consortium Data Portal, independent benchmark brain genotyping is also part of the SRA accession no. PRJNA736951 (BioData3). Simulated data generated from NA24385 (HG002) are available at https://humanpangenome.org/hg002/. The independent sperm and blood deep WGS data are available at SRA (accession nos. PRJNA588332 and PRJNA660493, BioData4). Independent WES data from brain, blood and saliva samples were available in NIMH Data Archive under study number 1484 (https://nda.nih.gov/study.html?id=1484, BioData5). TCGA-MC3 data are available on the GDC portal (https://portal.gdc.cancer.gov/, sample IDs provided with variants in Supplementary Table 3). Annotations downloaded from UCSC genome browser (https://genome.ucsc.edu/) and ANNOVAR (https://annovar.openbioinformatics.org/en/latest/).

Code availability

DeepMosaic is currently implemented in Python; the source code, documentation and demos are available at https://github.com/Virginiaxu/DeepMosaic. Codes for running different MV callers are documented in the Methods section.

References

  1. Dou, Y., Gold, H. D., Luquette, L. J. & Park, P. J. Detecting somatic mutations in normal cells. Trends Genet. 34, 545–557 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Biesecker, L. G. & Spinner, N. B. A genomic view of mosaicism and human disease. Nat. Rev. Genet. 14, 307–320 (2013).

    Article  CAS  PubMed  Google Scholar 

  3. Lee, J. H. et al. Human glioblastoma arises from subventricular zone cells with low-level driver mutations. Nature 560, 243–247 (2018).

    Article  CAS  PubMed  Google Scholar 

  4. Yang, X. et al. MosaicBase: a knowledgebase of postzygotic mosaic variants in noncancer disease-related and healthy human individuals. Genom. Proteom. Bioinform. 18, 140–149 (2020).

    Article  Google Scholar 

  5. Poduri, A., Evrony, G. D., Cai, X. & Walsh, C. A. Somatic mutation, genomic variation, and neurological disease. Science 341, 1237758 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Freed, D., Stevens, E. L. & Pevsner, J. Somatic mosaicism in the human genome. Genes 5, 1064–1094 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Yang, X. et al. Developmental and temporal characteristics of clonal sperm mosaicism. Cell 184, 4772–4783 e4715 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Breuss, M. W., Yang, X. & Gleeson, J. G. Sperm mosaicism: implications for genomic diversity and disease. Trends Genet. 37, 890–902 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).

    Article  CAS  PubMed  Google Scholar 

  11. Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).

    Article  CAS  PubMed  Google Scholar 

  12. Huang, A. Y. et al. MosaicHunter: accurate detection of postzygotic single-nucleotide mosaicism through next-generation sequencing of unpaired, trio, and paired samples. Nucleic Acids Res. 45, e76 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Dou, Y. et al. Accurate detection of mosaic variants in sequencing data without matched controls. Nat. Biotechnol. 38, 314–319 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Dou, Y. et al. Postzygotic single-nucleotide mosaicisms contribute to the etiology of autism spectrum disorder and autistic traits and the origin of mutations. Hum. Mutat. 38, 1002–1013 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. McNulty, S. N. et al. Diagnostic utility of next-generation sequencing for disorders of somatic mosaicism: a five-year cumulative cohort. Am. J. Hum. Genet. 105, 734–746 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Wang, Y. et al. Comprehensive identification of somatic nucleotide variants in human brain tissue. Genome Biol. 22, 92 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Huang, A. Y. et al. Postzygotic single-nucleotide mosaicisms in whole-genome sequences of clinically unremarkable individuals. Cell Res. 24, 1311–1327 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Huang, A. Y. et al. Distinctive types of postzygotic single-nucleotide mosaicisms in healthy individuals revealed by genome-wide profiling of multiple organs. PLoS Genet. 14, e1007395 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Breuss, M. W. et al. Somatic mosaicism reveals clonal distributions of neocortical development. Nature 604, 689–696 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the inception architecture for computer vision. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (eds. Bajcsy, R., Li, F.F., & Tuytelaars, T.) 2818–2826 (IEEE, 2016).

  22. He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (eds. Bajcsy, R., Li, F.F., & Tuytelaars, T.) 770–778 (IEEE, 2016).

  23. Iandola, F. et al. Densenet: implementing efficient convnet descriptor pyramids. Preprint at arXiv arXiv:1404.1869 (2014) https://arxiv.org/abs/1404.1869

  24. Tan, M. & Le, Q. V. Efficientnet: rethinking model scaling for convolutional neural networks. PMLR 97, 6105–6114 (2019).

    Google Scholar 

  25. Springenberg, J. T., Dosovitskiy, A., Brox, T. & Riedmiller, M. Striving for simplicity: the all convolutional net. Preprint at arXiv arXiv:1412.6806 (2014) https://arxiv.org/abs/1412.6806

  26. Ewing, A. D. et al. Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nat. Methods 12, 623–630 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat. Biotechnol. 37, 555–560 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Breuss, M. W. et al. Autism risk in offspring can be assessed through quantification of male sperm mosaicism. Nat. Med. 26, 143–150 (2020).

    Article  CAS  PubMed  Google Scholar 

  29. Pelorosso, C. et al. Somatic double-hit in MTOR and RPS6 in hemimegalencephaly with intractable epilepsy. Hum. Mol. Genet. 28, 3755–3765 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Fan, Y. et al. MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data. Genome Biol. 17, 178 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Larson, D. E. et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics 28, 311–317 (2012).

    Article  CAS  PubMed  Google Scholar 

  32. Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Radenbaugh, A. J. et al. RADIA: RNA and DNA integrated analysis for somatic mutation detection. PLoS ONE 9, e111516 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Ellrott, K. et al. Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Syst. 6, 271–281 e277 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Sahraeian, S. M. E. et al. Deep convolutional neural networks for accurate somatic mutation detection. Nat. Commun. 10, 1041 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Zink, F. et al. Clonal hematopoiesis, with and without candidate driver mutations, is common in the elderly. Blood 130, 742–752 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Lawson, A. R. J. et al. Extensive heterogeneity in somatic mutation and selection in the human bladder. Science 370, 75–82 (2020).

    Article  CAS  PubMed  Google Scholar 

  38. Xia, Y., Liu, Y., Deng, M. & Xi, R. Pysim-sv: a package for simulating structural variation data with GC-biases. BMC Bioinf. 18, 53 (2017).

    Article  Google Scholar 

  39. Koressaar, T. & Remm, M. Enhancements and modifications of primer design program Primer3. Bioinformatics 23, 1289–1291 (2007).

    Article  CAS  PubMed  Google Scholar 

  40. Hansen, R. S. et al. Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc. Natl Acad. Sci. USA 107, 139–144 (2010).

    Article  CAS  PubMed  Google Scholar 

  41. Chung, C. et al. Comprehensive multiomic profiling of somatic mutations in malformations of cortical development. Nat. Genet. (in the press).

Download references

Acknowledgements

We thank Y. Dou for helping to set up the MosaicForecast pipeline. We thank M. K. Gilson for the help with computational resources. We thank P. J. Park, G. W. Cottrell, J. V. Moran, M. Gymrek, P. J. Reed, A. Y. Huang, S.-J. Cheng and Y. Chen for their valuable comments, help and suggestions. This work was supported by the National Institute of Mental Health (NIMH) (grant nos. U01MH108898 and R01MH124890 to J.G.G.), Rady Children’s Institute for Genomic Medicine and the Howard Hughes Medical Institute. We thank San Diego Supercomputer Center (grant no. TG-IBN190021 to X.Y. and J.G.G.) for computational help. This publication includes data generated at the UC San Diego IGM Genomics Center using an Illumina NovaSeq 6000 platform that was purchased with funding from a National Institutes of Health SIG grant (no. S10OD026929 X.Y. and J.G.G.).

Author information

Authors and Affiliations

Authors

Consortia

Contributions

X.Y., X.X. and J.G.G. conceived this project with input from M.W.B. and D.A. X.Y. designed the study and managed the project. X.X. implemented the image representation and neural network classifier under supervision and instruction by X.Y. X.Y., C.L., X.X., J.S. and Y.C. generated and collected all the training and benchmark data with the help from D.A., R.D.G., L.W. and L.B.A. X.X. performed the training and model selection under supervision by X.Y. The independent dataset was processed by M.W.B., D.A. and R.D.G. under supervision by J.L.S. and J.G.G. X.Y. and M.W.B. performed the validation experiments with help from L.L.B. and C.C. X.Y. and X.X. wrote the original and revised manuscript with input from all listed authors. X.Y. and J.G.G. revised and edited the manuscript. DeepMosaic is benchmarked on part of the BSMN Reference Tissue Project and common analysis pipeline for SNVs contributed by Y.W., T.B. under supervision by A.A. and the BSMN capstone project contributed by M.W.B., X.Y., D.A. and X.X. under supervision by J.G.G. All authors discussed the results and contributed to the final manuscript.

Corresponding authors

Correspondence to Xiaoxu Yang or Joseph G. Gleeson.

Ethics declarations

Competing interests

L.B.A. is a compensated consultant and has equity interest in io9, LLC. His spouse is an employee of Biotheranostics, Inc. L.B.A. is an inventor of a US Patent 10,776,718 and he also declares US provisional applications with serial numbers: 63/289,601; 63/269,033; 63/366,392 and 63/367,846. All other authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks Anders Skanderup, Moritz Gerstung and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Training strategies and examples of training data for DeepMosaic.

(a) More than 200,000 training and validation variants were generated for DeepMosaic, including computational simulations (SimData1), and biologically validated variants from existing studies with manually curated technical artifacts (BioData1). We further included 1 gold-standard dataset for testing and model selection (BioData2); all selected positive or negative variants underwent amplicon sequencing in at least one tissue sample according to the publication. We further included independent simulated data (SimData2 and SimData3) and validated independent biological data (BioData3-WGS, BioData4-WGS, and BioData5-WES) to benchmark DeepMosaic. (b) The overall strategies of model training and benchmarking for each tested model. (c) The distribution of probability density of expected AFs for different variants from the training set. Red: Reference homozygous variants and technical artifacts are labeled ‘Negative’ in the training set. Green: Heterozygous variants are also labeled ‘Negative’ in the training set. Blue: True mosaic variants are labeled ‘Positive’ in the training set. (d) Two examples of false positive variants with different sequencing artifacts, left: multiple alternative alleles from sequencing bias or alignment artifacts; right: reads truncated because of sequencing or alignment artifacts. (e) All training images were down-sampled and up-sampled into 30×, 50×, 100×, 150×, 200×, 250×, 300×, 400× and 500×, mutant allelic fractions (AFs) from the simulated data that were set as 1%, 2%, 3%, 4%, 5%, 10%, 15%, 20%, 25% and shown.

Extended Data Fig. 2 Network model selection based on an independent gold-standard testing set.

(a) Comparison of network structures implementing a variety of classification algorithms. For different build versions of EfficientNet, only a general structure is shown. Inception v3 was used in DeepVariant, and Resnet was used in NeuSomatic. (b) All models were trained on 180,000 training variants from BioData1 and SimData1 until the models reach training accuracy > 0.9. Accuracy, Matthews’s correlation coefficient (MCC), and Sensitivity of different network structures trained with the same data with different epochs. EfficientNet-b4 trained at 6 epochs demonstrated the highest Accuracy, MCC, and TPR (true positive rate, sensitivity) on the gold standard validation set16 (BioData2); thus it was used as the default core model for DeepMosaic. We additionally provide an option for experienced users to train their own models with self-labeled training data. (c) EfficientNet-b4 models were trained on 5 additional datasets, each for 15 epochs. The training datasets were generated with different compositions of biologically validated data and simulated data. Models trained only on simulated data showed overall higher sensitivity but much lower specificity on the gold standard evaluation set (BioData2) due to the high fraction of false-positive calls. Models trained only on biological data showed similar overall performance compared with models trained on a mixture of biological and simulated data. All three training sets are generated with the same number of positive and negative data points as the biological data and with the same number of total variants. M2S2 Positive: training variants were labeled positive by both MuTect2 and Strelka2. n = 15, boundaries are the range for each violin plot, for data in the inner boxplot, the center is the median, upper bound is the upper hinge/75% quantile, the lower bond is the lower hinge/25% percentile, lower whisker represents lower hinge – 1.5*IQR, upper whisker represents upper hinge + 1.5*IQR.

Extended Data Fig. 3 The convolutional neural network of the DeepMosaic default model and gradient visualization with guided backpropagation for the DeepMosaic default model (EfficientNet-b4).

(a) Down-sampled and up-sampled image files coded from the original BAM files were used as input. 16 mobile convolutional layers were adapted from EfficientNet-b4, with optimized parameter size and structures. Numbers represent the dimensions of trained hyperparameters. (b) A mosaic, a homozygous, and a heterozygous variant with artifacts, as well as a technical artifact, are shown here for the gradient visualization with guided backpropagation method25 implemented for the DeepMosaic core model, EfficientNet-b4 trained at epoch 6, left: image coding, right: gradient heatmap. The edges of bases, the sequence information, as well as other high-dimensional information, are highlighted by the model.

Extended Data Fig. 4 Performance of DeepMosaic default model (EfficientNet-b4) on data hidden from training.

(a) Receiver operating characteristic (ROC) curve for DeepMosaic. True positive rates (TPR) and false-positive rates (FPR) were evaluated from 20,265 variants (BioData1 and SimData1) hidden from model training and model selection. Colors show groups of intended read depth. (b) Precision-recall curves for DeepMosaic, evaluated from the 20,265 hidden variants, dots showed the performance of the default parameters for DeepMosaic-CM. (c) ROC curve for DeepMosaic. TPR and FPR were evaluated from 20,265 variants (BioData1 and SimData1) hidden from model training and model selection. Colors show groups of bins of different expected AFs. (d) Precision-recall curves for DeepMosaic, evaluated from the 20,265 hidden variants, dots showed the performance of the default parameters for DeepMosaic-CM for different AF bins. Iso-F1 curves were shown for each precision-recall pair with identical F1 scores labeled in (b) and (d).

Extended Data Fig. 5 Performance of DeepMosaic and other mosaic variant callers on SimData2.

Sensitivity of DeepMosaic and other mosaic callers on 439,200 independently simulated benchmark variants (SimData2) at simulated read depths and AFs. DeepMosaic performed equally well or better than other tested methods, especially at lower expected AFs. The true positive sites to calculate sensitivity do not include variants that fall into genomic repetitive regions.

Extended Data Fig. 6 Sensitivity and specificity of DeepMosaic and other mosaic variant callers on BioData4.

Sensitivity and specificity were calculated from the orthogonal validation experiment of 239 variants from BioData4. Mosaic variant detection was carried out with DeepMosaic, MosaicForecast, MosaicHunter, MuTect2, NeuSomatic, and Strelka2 on 16 WGS samples sequenced at 200×. Raw variant calls are provided in Supplementary Table 1, and a summary of performance is provided in Supplementary Table 3. SM: single mode, variant calling without control; PM: paired mode, variant calling by comparing the sequences between two samples. PM: paired mode; SM: single mode.

Extended Data Fig. 7 Comparison of DeepMosaic and traditional mosaic variant calling strategies on a WGS biological dataset (BioData4).

(a) Compared with the mosaic variant calling strategy (M2S2MH) used in a previous publication28, DeepMosaic, and MosaicForecast13 strategies are also listed. (b) Schematics for amplicon validation. Primers were designed for different candidates and amplicons were collected for Illumina sequencing. Information from aligned reads was calculated and genotypes were determined. (c) Venn diagram of the experimentally validated results and the portions of variants from different study strategies. DeepMosaic demonstrated a 96.3% (158/164) validation rate. Of all the 819 variants identified by DeepMosaic, 33.0% (271/819) were missed by the MuTect2 Strelka2 MosaicHunter pipeline with a validation rate of 97.26 (71/73) and 21.0% (172/819) were missed by the MosaicForecast pipeline with validation rate 97.06 (33/34). (d) Examples of validated variants are called by DeepMosaic and MosaicForecast (i), only by DeepMosaic (ii), or by DeepMosaic and other traditional methods (iii).

Extended Data Fig. 8 Comparison of features of variants called by DeepMosaic and other pipelines.

(a) Different overlapping groups of variants detected by the 3 pipelines were separated into 7 groups. (b) DeepMosaic-specific (G1) variants present similar base-substitution features compared with variants detected by the MuTect2-Strelka2-MosaicHunter combined pipeline as well as the MosaicForecast pipeline (G2-G7). (c) Allelic fractions of the variants detected in the original WGS sample showed that DeepMosaic-specific variants (G1, G2, and G4) showed a significantly lower average AF than variants detectable by all 3 pipelines (G3, p < 2.2e-16 by a two-tailed Wilcoxon rank sum test with continuity correction) and lower than variants detectable only in other pipelines (G5, G6, and G7, p = 0.0027 by a two-tailed Wilcoxon rank sum test with continuity correction; n = 160 for G1; n = 99 for G2; n = 548 for G3; n = 12 for G4; n = 203 for G5; n = 143 for G6; n = 130 for G7; for data in the inner boxplot, centre is the median, upper bound is the upper hinge/75% quantile, lower bond is the lower hinge/25% percentile, lower whisker represent lower hinge – 1.5*IQR, upper whisker represent upper hinge + 1.5*IQR, boundry of the violin plot is the range). (d) Recovery rate of DeepMosaic, M2S2MH, and MosaicForecast at different depths from downsampling of BioData3. DeepMosaic showed a similar variant recovery rate compared with M2S2MH and MosaicForecast, even when considering the lower AF variants detected by DeepMosaic.

Extended Data Fig. 9 Enrichment of genomic features for variants called by DeepMosaic and conventional methods.

(a) Variants called from different pipelines shared similar variant types and contributions. The groups are defined the same as Extended Data Fig. 8a. The relative contribution of different types of MVs is stable between different variant groups. (b) Enrichment analysis of variants in different genomic features. Unlike the variants shared with other callers, DeepMosaic-specific (G1) variants present depletion in high nucleosome occupancy regions. 10,000 permutation was carried out on randomly selected gnomAD variants, significant comparisons are shown in pink. Overall DeepMosaic-specific variants (G1) do not show significantly different genomic features compared with permutation intervals.

Extended Data Fig. 10 Comparison of DeepMosaic and traditional mosaic variant calling strategies on a WES biological dataset (BioData5), and the computational resources required for WES (BioData6) and WGS (BioData4).

(a) Compared with the mosaic variant calling strategy (GATK Haplotypecaller ‘polidy’ 50 with Heuristic filters) established in the previous publication and DeepMosaic strategies. (b) Venn diagram of the experimentally validated results and the portions of variants from different study strategies. DeepMosaic demonstrated a 43.1% (25/58) validation rate, significantly overperforming the 17.6% (44/250) validation rate established before16. (c) DeepMosaic consumes on average 1403.8 (range 9.1 – 50168.9) seconds to run an exome and 22718.2 (range 6565.8–60800.0) seconds for a 300× genome, respectively, on a 12-core CPU node. (d) DeepMosaic consumes an average of 1.3 Gb (range 0.9 Gb–1.8 Gb) maximum memory for an exome and an average of 1.2 Gb (range 1.1 Gb–1.3 Gb) for a genome. Some exomes required more resources than others and formed a bimodal distribution, but the cause for this was not explored. Results were calculated from real data run at the San Diego Supercomputer Center. For data in(c) and (d), upper and lower boundary of the violin plot is the range.

Supplementary information

Supplementary Information

Extended Data Figs. 1–10, Tables 1–5 and Text.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–5.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, X., Xu, X., Breuss, M.W. et al. Control-independent mosaic single nucleotide variant detection with DeepMosaic. Nat Biotechnol 41, 870–877 (2023). https://doi.org/10.1038/s41587-022-01559-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41587-022-01559-w

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing