Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Picky comprehensively detects high-resolution structural variants in nanopore long reads

Abstract

Acquired genomic structural variants (SVs) are major hallmarks of cancer genomes, but they are challenging to reconstruct from short-read sequencing data. Here we exploited the long reads of the nanopore platform using our customized pipeline, Picky (https://github.com/TheJacksonLaboratory/Picky), to reveal SVs of diverse architecture in a breast cancer model. We identified the full spectrum of SVs with superior specificity and sensitivity relative to short-read analyses, and uncovered repetitive DNA as the major source of variation. Examination of genome-wide breakpoints at nucleotide resolution uncovered micro-insertions as the common structural features associated with SVs. Breakpoint density across the genome is associated with the propensity for interchromosomal connectivity and was found to be enriched in promoters and transcribed regions of the genome. Furthermore, we observed an over-representation of reciprocal translocations from chromosomal double-crossovers through phased SVs. We demonstrate that Picky analysis is an effective tool for comprehensive detection of SVs in cancer genomes from long-read data.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: A customized pipeline for long-read SV analysis.
Fig. 2: The sensitivity of the Picky pipeline in SV detection.
Fig. 3: Long reads uncover repeat-rich SVs and the presence of micro-insertions within SV junctions.
Fig. 4: Analysis of the genomic distribution of breakpoints and their affected genes.

Similar content being viewed by others

References

  1. Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  2. Feuk, L., Carson, A. R. & Scherer, S. W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).

    Article  PubMed  CAS  Google Scholar 

  3. 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  CAS  Google Scholar 

  4. Bochukova, E. G. et al. Large, rare chromosomal deletions associated with severe early-onset obesity. Nature 463, 666–670 (2010).

    Article  PubMed  CAS  Google Scholar 

  5. Diskin, S. J. et al. Copy number variation at 1q21.1 associated with neuroblastoma. Nature 459, 987–991 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  6. Edwards, P. A. Fusion genes and chromosome translocations in the common epithelial cancers. J. Pathol. 220, 244–254 (2010).

    PubMed  CAS  Google Scholar 

  7. Menghi, F. et al. The tandem duplicator phenotype as a distinct genomic configuration in cancer. Proc. Natl. Acad. Sci. USA 113, E2373–E2382 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. Weischenfeldt, J., Symmons, O., Spitz, F. & Korbel, J. O. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat. Rev. Genet. 14, 125–138 (2013).

    Article  PubMed  CAS  Google Scholar 

  9. Stankiewicz, P. & Lupski, J. R. Structural variation in the human genome and its role in disease. Annu. Rev. Med. 61, 437–455 (2010).

    Article  PubMed  CAS  Google Scholar 

  10. Chaisson, M. J. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015).

    Article  PubMed  CAS  Google Scholar 

  11. Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).

    Article  PubMed  CAS  Google Scholar 

  12. Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Mills, R. E. et al. Mapping copy number variation by population-scale genome sequencing. Nature 470, 59–65 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Sović, I. et al. Fast and sensitive mapping of nanopore sequencing reads with GraphMap. Nat. Commun. 7, 11307 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. Spies, N. et al. Genome-wide reconstruction of complex structural variants using read clouds. Nat. Methods 14, 915–920 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Cretu Stancu, M. et al. Mapping and phasing of structural variation in patient genomes using nanopore sequencing. Nat. Commun. 8, 1326 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. Sedlazeck, F.J. et al. Accurate detection of complex structural variations using single molecule sequencing. bioRxiv Preprint at https://www.biorxiv.org/content/early/2017/07/28/169557 (2017).

  18. Jain, M. et al. Improved data analysis for the MinION nanopore sequencer. Nat. Methods 12, 351–356 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  19. Deamer, D., Akeson, M. & Branton, D. Three decades of nanopore sequencing. Nat. Biotechnol. 34, 518–524 (2016).

    Article  PubMed  CAS  Google Scholar 

  20. Jain, M., Olsen, H. E., Paten, B. & Akeson, M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 239 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345 (2018).

  22. Gazdar, A. F. et al. Characterization of paired tumor and non-tumor cell lines established from patients with breast cancer. Int. J. Cancer 78, 766–774 (1998).

    Article  PubMed  CAS  Google Scholar 

  23. Li, H. Minimap2: fast pairwise alignment for long nucleotide sequences. arXiv Preprint at https://arxiv.org/abs/1708.01492 (2017).

  24. Kiełbasa, S. M., Wan, R., Sato, K., Horton, P. & Frith, M. C. Adaptive seeds tame genomic sequence comparison. Genome Res. 21, 487–493 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Frith, M. C., Hamada, M. & Horton, P. Parameters for accurate genome alignment. BMC Bioinformatics 11, 80 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Stephens, P. J. et al. Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature 462, 1005–1010 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  28. Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. Bignell, G. R. et al. Architectures of somatic genomic rearrangement in human cancer amplicons at sequence-level resolution. Genome Res. 17, 1296–1303 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. Cahill, D., Connor, B. & Carney, J. P. Mechanisms of eukaryotic DNA double strand break repair. Front. Biosci. 11, 1958–1976 (2006).

    Article  PubMed  CAS  Google Scholar 

  32. Howarth, K. D. et al. Array painting reveals a high frequency of balanced translocations in breast cancer cell lines that break in cancer-relevant genes. Oncogene 27, 3345–3359 (2008).

    Article  PubMed  CAS  Google Scholar 

  33. Kalhor, R., Tjong, H., Jayathilaka, N., Alber, F. & Chen, L. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat. Biotechnol. 30, 90–98 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  34. Branco, M. R. & Pombo, A. Intermingling of chromosome territories in interphase suggests role in translocations and transcription-dependent associations. PLoS Biol. 4, e138 (2006).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Tjong, H. et al. Population-based 3D genome structure analysis reveals driving forces in spatial genome organization. Proc. Natl. Acad. Sci. USA 113, E1663–E1672 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. Chung, I. F. et al. DriverDBv2: a database for human cancer driver gene research. Nucleic Acids Res. 44, D975–D979 (2016).

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

The authors thank P. Shreckengast for collecting the HCC1187 cells; C. Robinett and A. Lau for their comments on the manuscript; and B. Hanson and M. Bolisetty for their help in setting up the initial nanopore runs. Research reported in this publication was partially supported by the National Cancer Institute of the National Institutes of Health under Award Number P30CA034196. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Authors and Affiliations

Authors

Contributions

L.G., C.-H.W., and C.-L.W. designed the experiment, analyzed the data, and wrote the manuscript. L.G. performed the experiments. C.-H.W. developed the Picky pipeline. W.-C.C. analyzed the TCGA data. H.T. performed the ICP analysis. F.M., C.Y.N., E.T.L., and C.-L.W. contributed to manuscript preparation.

Corresponding author

Correspondence to Chia-Lin Wei.

Ethics declarations

Competing interests

L.G., C.-H.W., and C.-L.W. have received a few batches of reagent from Oxford Nanopore. C.-L.W. has received travel and accommodation support from Oxford Nanopore as an invited speaker at the Oxford Nanopore user meeting.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Correlation between read length and percentage of reads with breakpoints.

Each blue dot represents a single 2D nanopore run. N = 13

Supplementary Figure 2 Analysis for phased SVs using multi-breakpoint long reads.

(a) The total counts and (b) the log-likelihood of the adjacent SVs phased by the multi-breakpoint long reads. Red count indicates observation > 2X expected. Blue count indicates observation < 0.5X expected. N = 2,374.

Supplementary Figure 3 Examples of validated breakpoints and their detailed junction sequences.

Nanopore read-to-genome alignments, junction sequences and affected genes were shown in each SV class. The micro-homologous sequences shared between junctions were highlighted in red boxes. (a) TDJ. (b) INS. (c) DEL. (d) INV. (e) TLC. The translocation t(1;8) identified is consistent with translocation identified previously by spectral karyotyping (SKY)32 with base resolution. (f) Amplified PCR fragments across breakpoints for each SVs shown in (a)-(e) were analyzed by Bioanalyzer (Agilent Technologies). L: molecular size markers. Independent repeats = 2.

Supplementary Figure 4 The sensitivity and specificity of the Picky-called SVs.

(a) Summary of the validated SVs by PCR strategy. (b) Numbers of SVs called by LUMPY from different depth of short-read data. *: deletion and DEL in INDEL. **: thresholds used in SV calling by LUMPY (see Online Methods). ***: not called by standard LUMPY pipeline. (c) The numbers of high confidence SVs previously described in HCC1187 detected by nanopore sequencing.

Supplementary Figure 5 The prevalence of SV heterozygosity in the HCC1187 genome.

(a) PCR products corresponding to different haplotypes in two validated SVs. Independent repeats = 2. (b) Reads supporting both SV and the normal genotypes from the same locus were visualized in IGV browser. (c) Heterozygosity analysis from 50 randomly selected loci from each of the seven SV types.

Supplementary Figure 6 A comprehensive comparison of SV detection in long-read and short-read analyses.

LR, long-read. SR, short-read. (a) Numbers of SVs found in each data and their overlaps. (b) Distributions of the SV span size.

Supplementary Figure 7 Comparison of Picky, Sniffles, and NanoSV.

Overview of the different components and features among Picky, Sniffles and NanoSV. Yes represents the SV type can be reported by the pipeline while N/A represents that cannot be reported.

Supplementary Figure 8 The SV span distributions and the SVs enriched in repeat regions.

(a) The span distribution of DEL, INS and INDEL. (b) Relative percentages of repeats across different span sizes in simple DEL. (c) Relative percentages of repeats across different span sizes in simple INS.

Supplementary Figure 9 Selected cases of micro-insertions from nanopore results confirmed by PacBio sequencing.

(a) A 36 bp insertion associated with a 329 bp deletion on chromosome 20. (b) A 75 bp insertion associated with a 3,262 bp deletion on chromosome X.

Supplementary Figure 10 Distribution of the SV breakpoints along the genomic features of transcription.

(a) Enrichment of breakpoint from each SV class. (b) Distributions of the breakpoints from different types of TDCs.

Supplementary Figure 11 Control of the multidimensional scaling (MDS) analysis.

(a) Histogram of gene expression from SVs-genes (log2 transferred). (b) Histogram of gene expression from the control genes. Similar expression profiles and the equivalent numbers of SVs-genes are shown (log2 transferred). (c) The MDS plot expressions of the SVs-genes by sample-wise permutation. Red, non-TNBC tissues, n = 851. Green, TNBC tissues, n = 113. (d) The MDS plot of the expressions from the control genes. All data are from the breast carcinoma (BRCA) dataset within the cancer genome atlas (TCGA). Red, non-TNBC tissues, n = 851. Green, TNBC tissues, n = 113.

Supplementary Figure 12 The logic and criteria used to define seven SV types by Picky.

High-scoring Segment Pair (HSP) i between the read segment and the reference segment is denoted by Qi and Si respectively. Linked alignment extensions with 3 segments will have 3 HSPs indicated by Q1:S1, Q2:S2 and Q3:S3. Each segment span is denoted by (start,end] as per UCSC 0-start, half-open coordinate system. sDiff between reference segment Si and Si+1 is given by Si+1(start)-Si(end). qDiff between read segment Qi and Qi+1 is given by Qi+1(start)-Qi(end).

Supplementary Figure 13 Homopolymer analysis of nanopore reads.

(a) The ratio of the observed versus expected instances of all 1,024 5-mers. Highlighted are the 4 under-called homopolymers. (b) The annotated current trace for the segment harboring basecalled deletion. The trace indicates the clear existence of the two homopolymers (marked (A)20 and (T)18) rather than the deletion flanked by (A)5 and (T)5.

Supplementary Figure 14

Overview of the process of assigning breakpoints to their corresponding genomic features on the basis of the gene model.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–14

Reporting Summary

Combined Supplementary Information

Supplementary Note 1

Supplementary Table 1

Summary of the 15 nanopore runs in this study

Supplementary Table 2

Summary of the mapping and SV-calling results

Supplementary Table 3

List of seven SV types detected in nanopore data

Supplementary Table 4

SVs selected for validation analysis

Supplementary Table 5

Details of nanopore sequencing kits, devices, and software

Supplementary Table 6

List of all primers used in this study

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gong, L., Wong, CH., Cheng, WC. et al. Picky comprehensively detects high-resolution structural variants in nanopore long reads. Nat Methods 15, 455–460 (2018). https://doi.org/10.1038/s41592-018-0002-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-018-0002-6

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer