skip to main content
10.1145/3584371.3612987acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article
Open Access

On de novo Bridging Paired-end RNA-seq Data

Authors Info & Claims
Published:04 October 2023Publication History

ABSTRACT

The high-throughput short-reads RNA-seq protocols often produce paired-end reads, with the middle portion of the fragments being unsequenced. We explore if the full-length fragments can be computationally reconstructed from the sequenced two ends in the absence of the reference genome---a problem here we refer to as de novo bridging. Solving this problem provides longer, more informative RNA-seq reads, and benefits downstream RNA-seq analysis such as transcript assembly, expression quantification, and splicing differential analysis. However, de novo bridging is a challenging and complicated task owing to alternative splicing, transcript noises, and sequencing errors. It remains unclear if the data provides sufficient information for accurate bridging, let alone efficient algorithms that determine the true bridges. Methods have been proposed to bridge paired-end reads in the presence of reference genome (called reference-based bridging), but the algorithms are far away from scaling for de novo bridging as the underlying compacted de Bruijn graph (cdBG) used in the latter task often contains millions of vertices and edges. We designed a new truncated Dijkstra's algorithm for this problem, and proposed a novel algorithm that reuses the shortest path tree to avoid running the truncated Dijkstra's algorithm from scratch for all vertices for further speeding up. These innovative techniques result in scalable algorithms that can bridge all paired-end reads in a cdBG with millions of vertices. Our experiments showed that paired-end RNA-seq reads can be accurately bridged to a large extent. The resulting tool is freely available at https://github.com/Shao-Group/rnabridge-denovo.

References

  1. K.F. Au, H. Jiang, L. Lin, Y. Xing, and W.H. Wong. 2010. Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res. 38, 14 (2010), 4570--4578.Google ScholarGoogle ScholarCross RefCross Ref
  2. N.L. Bray, H. Pimentel, P. Melsted, and L. Pachter. 2016. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 5 (2016), 525--527.Google ScholarGoogle ScholarCross RefCross Ref
  3. Rayan Chikhi, Antoine Limasset, and Paul Medvedev. 2016. Compacting de Bruijn graphs from sequencing data quickly and in low memory. Bioinformatics 32, 12 (2016), i201--i208.Google ScholarGoogle ScholarCross RefCross Ref
  4. Maciej Długosz and Sebastian Deorowicz. 2017. RECKONER: read error corrector based on KMC. Bioinformatics 33, 7 (2017), 1086--1089.Google ScholarGoogle ScholarCross RefCross Ref
  5. A. Dobin, C.A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. Chaisson, and T.R. Gingeras. 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 1 (2013), 15--21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Griebel, B. Zacher, P. Ribeca, E. Raineri, V. Lacroix, R. Guigó, and M. Sammeth. 2012. Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic Acids Res. 40, 20 (2012), 10073--10083.Google ScholarGoogle ScholarCross RefCross Ref
  7. Brian J Haas, Alexander Dobin, Bo Li, Nicolas Stransky, Nathalie Pochet, and Aviv Regev. 2019. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biology 20, 1 (2019), 213.Google ScholarGoogle ScholarCross RefCross Ref
  8. Guillaume Holley and Páll Melsted. 2020. Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs. Genome Biology 21, 1 (2020), 1--20.Google ScholarGoogle ScholarCross RefCross Ref
  9. W James Kent. 2002. BLAT---the BLAST-like alignment tool. Genome research 12, 4 (2002), 656--664.Google ScholarGoogle Scholar
  10. D. Kim, B. Langmead, and S.L. Salzberg. 2015. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 4 (2015), 357--360.Google ScholarGoogle ScholarCross RefCross Ref
  11. Sam Kovaka, Aleksey V Zimin, Geo M Pertea, Roham Razaghi, Steven L Salzberg, and Mihaela Pertea. 2019. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biology 20, 1 (2019), 1--13.Google ScholarGoogle ScholarCross RefCross Ref
  12. Bo Li and Colin N Dewey. 2011. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 1 (2011), 323.Google ScholarGoogle ScholarCross RefCross Ref
  13. Yang I Li, David A Knowles, Jack Humphrey, Alvaro N Barbeira, Scott P Dickinson, Hae Kyung Im, and Jonathan K Pritchard. 2018. Annotation-free quantification of RNA splicing using LeafCutter. Nature Genetics 50, 1 (2018), 151--158.Google ScholarGoogle ScholarCross RefCross Ref
  14. J. Liu, T. Yu, T. Jiang, and G. Li. 2016. TransComb: genome-guided transcriptome assembly via combing junctions in splicing graphs. Genome Biol. 17, 1 (2016), 213.Google ScholarGoogle ScholarCross RefCross Ref
  15. Juntao Liu, Ting Yu, Zengchao Mu, and Guojun Li. 2019. TransLiG: a de novo transcriptome assembler that uses line graph iteration. Genome Biology 20, 1 (2019), 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  16. C. Ma, Shao, M., and C. Kingsford. 2018. SQUID: transcriptomic structural variation detection from RNA-seq. Genome Biol. 19, 1 (2018), 52.Google ScholarGoogle ScholarCross RefCross Ref
  17. R. Patro, G. Duggal, M.I. Love, R.A. Irizarry, and C. Kingsford. 2017. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14 (2017), 417--419. Google ScholarGoogle ScholarCross RefCross Ref
  18. M. Pertea, G.M. Pertea, C.M. Antonescu, T.-C. Chang, J.T. Mendell, and S.L. Salzberg. 2015. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 3 (2015), 290--295.Google ScholarGoogle ScholarCross RefCross Ref
  19. ENCODE project. 2019. Bulk RNA-seq Data Standards and Processing Pipeline. https://www.encodeproject.org/data-standards/rna-seq/long-rnas/Google ScholarGoogle Scholar
  20. Shao, M. and C. Kingsford. 2017. Accurate assembly of transcripts through phase-preserving graph decomposition. Nature Biotechnology 35, 12 (2017), 1167--1169.Google ScholarGoogle ScholarCross RefCross Ref
  21. Trung Nghia Vu, Wenjiang Deng, Quang Thinh Trac, Stefano Calza, Woochang Hwang, and Yudi Pawitan. 2018. A fast detection of fusion genes from paired-end RNA-seq data. BMC Genomics 19, 1 (2018), 1--13.Google ScholarGoogle ScholarCross RefCross Ref
  22. Qimin Zhang, Qian Shi, and Mingfu Shao. 2022. Accurate assembly of multi-end RNA-seq data with Scallop2. Nature Computational Science 2, 3 (2022), 148--152.Google ScholarGoogle ScholarCross RefCross Ref
  23. Zijun Zhang, Zhicheng Pan, Yi Ying, Zhijie Xie, Samir Adhikari, John Phillips, Russ P Carstens, Douglas L Black, Yingnian Wu, and Yi Xing. 2019. Deep-learning augmented RNA-seq analysis of transcript splicing. Nature Methods 16, 4 (2019), 307--310.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. On de novo Bridging Paired-end RNA-seq Data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        BCB '23: Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
        September 2023
        626 pages
        ISBN:9798400701269
        DOI:10.1145/3584371

        Copyright © 2023 Owner/Author(s)

        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 October 2023

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate254of885submissions,29%
      • Article Metrics

        • Downloads (Last 12 months)97
        • Downloads (Last 6 weeks)20

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader