Reconstruction and functional annotation of Ascosphaera apis full-length transcriptome utilizing PacBio long reads combined with Illumina short reads

https://doi.org/10.1016/j.jip.2020.107475Get rights and content

Highlights

  • A full-length transcriptome of Ascosphaera apis was constructed and annotated.

  • 1205 long non-coding RNAs were identified in Ascosphaera apis mycelium.

  • 253 members from 17 transcription factor families in Ascosphaera apis.

Abstract

Ascosphaera apis is a widespread fungal pathogen of honeybee larvae that results in chalkbrood disease, leading to heavy losses for the beekeeping industry in China and many other countries. This work was aimed at generating a full-length transcriptome of A. apis using PacBio single-molecule real-time (SMRT) sequencing. Here, more than 23.97 Gb of clean reads was generated from long-read sequencing of A. apis mycelia, including 464,043 circular consensus sequences (CCS) and 394,142 full-length non-chimeric (FLNC) reads. In total, we identified 174,095 high-confidence transcripts covering 5141 known genes with an average length of 2728 bp. We also discovered 2405 genic loci and 11,623 isoforms that have not been annotated yet within the current reference genome. Additionally, 16,049, 10,682, 4520 and 7253 of the discovered transcripts have annotations in the Non-redundant protein (Nr), Clusters of Eukaryotic Orthologous Groups (KOG), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Moreover, 1205 long non-coding RNAs (lncRNAs) were identified, which have less exons, shorter exon and intron lengths, shorter transcript lengths, lower GC percent, lower expression levels, and fewer alternative splicing (AS) evens, compared with protein-coding transcripts. A total of 253 members from 17 transcription factor (TF) families were identified from our transcript datasets. Finally, the expression of A. apis isoforms was validated using a molecular approach. Overall, this is the first report of a full-length transcriptome of entomogenous fungi including A. apis. Our data offer a comprehensive set of reference transcripts and hence contributes to improving the genome annotation and transcriptomic study of A. apis.

Introduction

Chalkbrood is a widespread disease of the honeybee caused by Ascosphaera apis (Maassen ex Claussen) Olive and Spiltoir (Spiltoir, 1955, Spiltoir and Olive, 1955), an entomopathogenic fungus that exclusively infects western honeybee larvae. Recently, A. apis was reported to infect the larvae of eastern honeybee drones and workers (Chen et al., 2018). This brood disease weakens colony productivity and honey production by lowering the number of newly emerged bees and, under certain circumstances, may result in colony losses (Evison, 2015).

The transcriptome can provide the information associated with the number and variety of intracellular genes and uncover the physiological and biochemical processes at a molecular level (Dong et al., 2018). To date, an array of technologies has been developed and applied for transcriptome sequencing. Among these, short-read sequencing (i.e., Illumina and Ion Torrent) has become a useful tool for precisely analyzing RNA transcripts and gene expression levels (Ugrappa et al., 2008, Djebali et al., 2012). However, most second-generation sequencing (also known as next-generation sequencing (NGS)) platforms offer a read-length shorter than the typical length of a eukaryotic mRNA, including a methylated cap at the 5′ end and poly-A at the 3′ end. To overcome the limitation of short-read sequences, single-molecule real-time (SMRT) sequencing (Pacific Biosciences of California, Inc., CA, USA) was developed, which can produce kilobase-sized sequencing reads, thus eliminating the need for sequence assembly (Au et al., 2013, Sharon et al., 2013). For example, the average read length of PacBio SMRT sequencing is around 10 kb and the subread length can reach up to 35 kb (Sharon et al., 2013). The full-length transcriptome based on long reads can be used for the exploration and functional characterization of genes, the collection of large-scale long-read transcripts with complete coding sequences, and the identification of gene families (Koren et al., 2012, Treutlein et al., 2014). However, the technology has a high sequencing-error rate (~15%) when compared to Illumina sequencing (~1%); and it cannot currently be directly used to quantify gene expression (Luo, et al., 2017, Zuo et al., 2018). Fortunately, the limitations of SMRT can be algorithmically improved and corrected by short and high-accuracy sequencing reads (Hack et al., 2014, Li et al., 2014a, Li et al., 2014b). Hence, hybrid data derived from SMRT and NGS can offer high-quality and more complete assemblies for genome and transcriptome studies (Huddleston et al., 2014, Xu et al., 2015).

The genome of A. apis was published in 2006 with a total size of 20.31 Mb (Qin et al., 2006). This version of the reference genome (AAP 1.0) is composed of 8092 contigs which are further assembled into 1627 scaffords (Qin et al., 2006); however, it is yet to be fully assembled into complete chromosomes. Transcriptome analysis is a powerful tool for uncovering the relationships between genotypes and phenotypes, leading to a better understanding of the underlying pathways and genetic mechanisms controlling cell growth, development, immune defense, and so forth (Qian et al., 2013, Chen et al., 2017b, Guo et al., 2019). Transcriptome construction and annotation, particularly for species without a reference genome or a complete genome, has greatly improved with the development and revolution of sequencing techniques and plays a critical role in gene discovery, genomic signature exploration, and genome annotation (Trapnell et al., 2010, Haas et al., 2013). Our group previously sequenced the A. apis-infected honeybee larval guts using the Illumina HiSeq platform, and de novo assembled and annotated a transcriptome with the short reads from A. apis (Zhang et al., 2017). Based on this reference transcriptome, we further investigated the transcriptomic dynamics and pathogeneisis of A. apis during the infection process of two different bee species, Apis mellifera ligustica and Apis cerana cerana (Chen et al., 2017a, Guo et al., 2017). However, it’s still challenging to reliably assemble full-length from the short reads, and such transcripts are essential to explore post-transcriptional processes, such as alternative splicing (AS) and alternative polyadenylation (APA) events.

To provide a high-quality full-length transcriptome of A. apis, in this current work, the A. apis mycelia were prepared and subjected to third-generation sequencing (TGS) using the PacBio Sequel™ system (PacBio, Menlo Park, CA, USA). In parallel, Illumina paired short RNA reads generated separately from clean mycelia of A. apis were used to support the SMRT data. The full-length transcriptome of A. apis was constructed and annotated, followed by in-depth characterization of AS and APA. In addition, prediction and analysis of long non-coding RNAs (lncRNAs) and transcription factors (TFs) were further performed on basis of PacBio SMRT long reads. Overall, to the best of our knowledge, this is the first documentation of PacBio-based transcriptomic investigation of fungi including A. apis.

Section snippets

Preparation of A. Apis mycelia samples

A. apis was previously isolated from a fresh chalkbrood mummy of A. m. ligustica larvae (Guo et al., 2018b) and kept at the Honeybee Protection Laboratory of the College of Animal Sciences (College of Bee Science) at Fujian Agriculture and Forestry University.

A. apis was cultured at 33 ± 0.5 °C on plates of Potato-Dextrose Agar (PDA) medium (20% peeled potatoes, 2% agar, 1% dextrose, and 0.5% yeast extract) according to the method developed by Jensen et al., 2013, Li et al., 2018. One week

PacBio SMRT sequencing and error correction of long reads

The workflow of the current work is presented in Fig. 1. To obtain a representative full-length transcriptome for A. apis, the mycelia of A. apis were sequenced using PacBio Sequel system, and a total of 13,302,489 subreads (about 23.97 Gb) were yielded from the long-read sequencing, with an average read length of 1802 bp and an N50 of 3077 bp. To provide more accurate sequence information, CCS were generated from subreads that passed at least once through the insert, and 464,043 CCS with a

Discussion

PacBio SMRT sequencing provides better completeness to the sequencing of both the 5′ and 3′ ends of cDNA molecules; thus, it is a superior strategy for the direct generation of a comprehensive transcriptome with precise AS isoforms and novel genes (Chuang et al., 2018, Filichkin et al., 2018). Yi et al. identified 33,300 full-length transcripts (transcript N50 of 5234 bp) of Misgurnus anguillicaudatus based on SMRT sequencing, and constructed a full-length transcriptome by performing functional

Conclusions

Taken together, for the first time, this work proposed the full-length transcriptome of A. apis based on PacBio SMRT long reads and Illumina short reads, providing a reference for future study on A. apis transcriptome and genome. Additionally, AS and APA of A. apis genes, lncRNA, and TF were comprehensively investigated, uncovering the complexity of the A. apis transcriptome and improving the annotation of the current reference genome.

Author contributions

Conceptualization, D.C. and R.G. designed this study. D.C., Y.D., X.F., Z.Z., J.W., H.J., Y.F., H.C., D.Z., X.X., Q.L., C.X. and Y.Z. conducted laboratory work. R.G. and Y.D. performed bioinformatic analysis. D.C., R.G. and Y.D. supervised the work and contributed to preparation of the manuscript. All authors read and approved the final manuscript.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

We thank all editors and reviewers for their invaluable comments. We also thank Wandong Qi for her important technical assistance. Thanks to my dear wife Dr. Qian Cai and their lovely baby to be born.

Funding

This research was supported by the Earmarked Fund for China Agriculture Research System (No. CARS-44-KXJ7), the Science and Technology Planning Project of Fujian Province (No. 2018J05042), the Teaching and Scientific Research Fund of Education Department of Fujian Province (No. JAT170158), the Master Supervisor Team Fund of Fujian Agriculture and Forestry University, the Scientific and Technical Innovation Fund of Fujian Agriculture and Forestry University (No. CXZX2017342, No. CXZX2017343),

References (68)

  • K.F. Au

    Characterization of the human ESC transcriptome by hybrid sequencing

    Proc. Nati. Acad. Sci. USA

    (2013)
  • D. Bentley

    Coupling mRNA processing with transcription in time and space

    Nat. Rev. Genet.

    (2014)
  • Y.H. Chao

    Analysis of transcripts and splice isoforms in red clover (Trifolium pratense L.) by single-molecule long-read sequencing

    BMC Plant Biol.

    (2018)
  • Q. Chao

    The developmental dynamics of the Populus stem transcriptome

    Plant Biotechnol. J.

    (2019)
  • D.F. Chen

    Transcriptomic analysis of Ascosphaera apis stressing larval gut of Apis mellifera ligustica (Hyemenoptera: Apidae)

    Acta Entomol. Sin.

    (2017)
  • D.F. Chen

    Morphological and molecular identification of chalkbrood disease pathogen Ascosphaera apis in Apis cerana cerana

    J. Apic. Res.

    (2018)
  • S.Y. Chen

    A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing

    Sci. Rep.

    (2017)
  • T.J. Chuang

    Integrative transcriptome sequencing reveals extensive alternative trans-splicing and cis-backsplicing in human cells

    Nucleic Acids Res.

    (2018)
  • A.S. Conesa

    Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research

    Bioinformatics

    (2005)
  • S. Djebali

    Landscape of transcription in human cells

    Nature

    (2012)
  • J. Dong

    SMRT sequencing of full-length transcriptome of fea beetle Agasicles hygrophila (Selman and Vogt)

    Sci. Rep.

    (2018)
  • R. Elkon

    Alternative cleavage and polyadenylation: extent, regulation and function

    Nat. Rev. Genet.

    (2013)
  • S.A. Filichkin

    Abiotic stresses modulate landscape of poplar transcriptome via alternative splicing, differential intron retention, and isoform ratio switching

    Front. Plant Sci.

    (2018)
  • B. Gaertner

    A human ESC-based screen identifies a role for the translated lncRNA LINC00261 in pancreatic endocrine differentiation

    eLife

    (2020)
  • A. Gardini

    The many faces of long noncoding RNAs

    FEBS J.

    (2018)
  • S.P. Gordon

    Widespread polycistronic transcripts in fungi revealed by single-molecule mRNA sequencing

    PLoS One

    (2015)
  • R. Guo

    Transcriptome analysis of Ascosphaera apis stressing larval gut of Apis cerana cerana

    Acta Microbiol. Sin.

    (2017)
  • R. Guo

    First identification of long non-coding RNAs in fungal parasite Nosema ceranae

    Apidologie

    (2018)
  • B.J. Haas

    De novo transcript sequence reconstruction from RNA-seq using the trinity platform for reference generation and analysis

    Nat. Protoc.

    (2013)
  • T. Hack

    Proovread: large-scale high-accuracy pacbio correction through iterative short read consensus

    Bioinformatics

    (2014)
  • J. Huddleston

    Reconstructing complex regions of genomes using long-read sequencing technology

    Genome Res.

    (2014)
  • A.B. Jensen

    Standard methods for fungal brood disease research

    J. Apic. Res.

    (2013)
  • D. Jia

    SMRT sequencing of full-length transcriptome of flea beetle Agasicles hygrophila (Selman and Vogt)

    Sci. Rep.

    (2018)
  • S. Koren

    Hybrid error correction and de novo assembly of single-molecule sequencing reads

    Nat. Biotechnol.

    (2012)
  • Cited by (7)

    View all citing articles on Scopus
    1

    These authors contributed equally to this work.

    View full text