Reconstruction and functional annotation of Ascosphaera apis full-length transcriptome utilizing PacBio long reads combined with Illumina short reads
Graphical abstract
Introduction
Chalkbrood is a widespread disease of the honeybee caused by Ascosphaera apis (Maassen ex Claussen) Olive and Spiltoir (Spiltoir, 1955, Spiltoir and Olive, 1955), an entomopathogenic fungus that exclusively infects western honeybee larvae. Recently, A. apis was reported to infect the larvae of eastern honeybee drones and workers (Chen et al., 2018). This brood disease weakens colony productivity and honey production by lowering the number of newly emerged bees and, under certain circumstances, may result in colony losses (Evison, 2015).
The transcriptome can provide the information associated with the number and variety of intracellular genes and uncover the physiological and biochemical processes at a molecular level (Dong et al., 2018). To date, an array of technologies has been developed and applied for transcriptome sequencing. Among these, short-read sequencing (i.e., Illumina and Ion Torrent) has become a useful tool for precisely analyzing RNA transcripts and gene expression levels (Ugrappa et al., 2008, Djebali et al., 2012). However, most second-generation sequencing (also known as next-generation sequencing (NGS)) platforms offer a read-length shorter than the typical length of a eukaryotic mRNA, including a methylated cap at the 5′ end and poly-A at the 3′ end. To overcome the limitation of short-read sequences, single-molecule real-time (SMRT) sequencing (Pacific Biosciences of California, Inc., CA, USA) was developed, which can produce kilobase-sized sequencing reads, thus eliminating the need for sequence assembly (Au et al., 2013, Sharon et al., 2013). For example, the average read length of PacBio SMRT sequencing is around 10 kb and the subread length can reach up to 35 kb (Sharon et al., 2013). The full-length transcriptome based on long reads can be used for the exploration and functional characterization of genes, the collection of large-scale long-read transcripts with complete coding sequences, and the identification of gene families (Koren et al., 2012, Treutlein et al., 2014). However, the technology has a high sequencing-error rate (~15%) when compared to Illumina sequencing (~1%); and it cannot currently be directly used to quantify gene expression (Luo, et al., 2017, Zuo et al., 2018). Fortunately, the limitations of SMRT can be algorithmically improved and corrected by short and high-accuracy sequencing reads (Hack et al., 2014, Li et al., 2014a, Li et al., 2014b). Hence, hybrid data derived from SMRT and NGS can offer high-quality and more complete assemblies for genome and transcriptome studies (Huddleston et al., 2014, Xu et al., 2015).
The genome of A. apis was published in 2006 with a total size of 20.31 Mb (Qin et al., 2006). This version of the reference genome (AAP 1.0) is composed of 8092 contigs which are further assembled into 1627 scaffords (Qin et al., 2006); however, it is yet to be fully assembled into complete chromosomes. Transcriptome analysis is a powerful tool for uncovering the relationships between genotypes and phenotypes, leading to a better understanding of the underlying pathways and genetic mechanisms controlling cell growth, development, immune defense, and so forth (Qian et al., 2013, Chen et al., 2017b, Guo et al., 2019). Transcriptome construction and annotation, particularly for species without a reference genome or a complete genome, has greatly improved with the development and revolution of sequencing techniques and plays a critical role in gene discovery, genomic signature exploration, and genome annotation (Trapnell et al., 2010, Haas et al., 2013). Our group previously sequenced the A. apis-infected honeybee larval guts using the Illumina HiSeq platform, and de novo assembled and annotated a transcriptome with the short reads from A. apis (Zhang et al., 2017). Based on this reference transcriptome, we further investigated the transcriptomic dynamics and pathogeneisis of A. apis during the infection process of two different bee species, Apis mellifera ligustica and Apis cerana cerana (Chen et al., 2017a, Guo et al., 2017). However, it’s still challenging to reliably assemble full-length from the short reads, and such transcripts are essential to explore post-transcriptional processes, such as alternative splicing (AS) and alternative polyadenylation (APA) events.
To provide a high-quality full-length transcriptome of A. apis, in this current work, the A. apis mycelia were prepared and subjected to third-generation sequencing (TGS) using the PacBio Sequel™ system (PacBio, Menlo Park, CA, USA). In parallel, Illumina paired short RNA reads generated separately from clean mycelia of A. apis were used to support the SMRT data. The full-length transcriptome of A. apis was constructed and annotated, followed by in-depth characterization of AS and APA. In addition, prediction and analysis of long non-coding RNAs (lncRNAs) and transcription factors (TFs) were further performed on basis of PacBio SMRT long reads. Overall, to the best of our knowledge, this is the first documentation of PacBio-based transcriptomic investigation of fungi including A. apis.
Section snippets
Preparation of A. Apis mycelia samples
A. apis was previously isolated from a fresh chalkbrood mummy of A. m. ligustica larvae (Guo et al., 2018b) and kept at the Honeybee Protection Laboratory of the College of Animal Sciences (College of Bee Science) at Fujian Agriculture and Forestry University.
A. apis was cultured at 33 ± 0.5 °C on plates of Potato-Dextrose Agar (PDA) medium (20% peeled potatoes, 2% agar, 1% dextrose, and 0.5% yeast extract) according to the method developed by Jensen et al., 2013, Li et al., 2018. One week
PacBio SMRT sequencing and error correction of long reads
The workflow of the current work is presented in Fig. 1. To obtain a representative full-length transcriptome for A. apis, the mycelia of A. apis were sequenced using PacBio Sequel system, and a total of 13,302,489 subreads (about 23.97 Gb) were yielded from the long-read sequencing, with an average read length of 1802 bp and an N50 of 3077 bp. To provide more accurate sequence information, CCS were generated from subreads that passed at least once through the insert, and 464,043 CCS with a
Discussion
PacBio SMRT sequencing provides better completeness to the sequencing of both the 5′ and 3′ ends of cDNA molecules; thus, it is a superior strategy for the direct generation of a comprehensive transcriptome with precise AS isoforms and novel genes (Chuang et al., 2018, Filichkin et al., 2018). Yi et al. identified 33,300 full-length transcripts (transcript N50 of 5234 bp) of Misgurnus anguillicaudatus based on SMRT sequencing, and constructed a full-length transcriptome by performing functional
Conclusions
Taken together, for the first time, this work proposed the full-length transcriptome of A. apis based on PacBio SMRT long reads and Illumina short reads, providing a reference for future study on A. apis transcriptome and genome. Additionally, AS and APA of A. apis genes, lncRNA, and TF were comprehensively investigated, uncovering the complexity of the A. apis transcriptome and improving the annotation of the current reference genome.
Author contributions
Conceptualization, D.C. and R.G. designed this study. D.C., Y.D., X.F., Z.Z., J.W., H.J., Y.F., H.C., D.Z., X.X., Q.L., C.X. and Y.Z. conducted laboratory work. R.G. and Y.D. performed bioinformatic analysis. D.C., R.G. and Y.D. supervised the work and contributed to preparation of the manuscript. All authors read and approved the final manuscript.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
We thank all editors and reviewers for their invaluable comments. We also thank Wandong Qi for her important technical assistance. Thanks to my dear wife Dr. Qian Cai and their lovely baby to be born.
Funding
This research was supported by the Earmarked Fund for China Agriculture Research System (No. CARS-44-KXJ7), the Science and Technology Planning Project of Fujian Province (No. 2018J05042), the Teaching and Scientific Research Fund of Education Department of Fujian Province (No. JAT170158), the Master Supervisor Team Fund of Fujian Agriculture and Forestry University, the Scientific and Technical Innovation Fund of Fujian Agriculture and Forestry University (No. CXZX2017342, No. CXZX2017343),
References (68)
- et al.
Long non-coding RNAs and control of gene expression in the immune system
Trends Mol. Med.
(2014) Uncovering the immune responses of Apis mellifera ligustica larval gut to Ascosphaera apis infection utilizing transcriptome sequencing
Gene
(2017)Chalkbrood: epidemiological perspectives from the host-parasite relationship
Chalkbrood Disease Honey Bees
(2015)Identification of long non-coding RNAs in the chalkbrood disease pathogen Ascospheara apis
J. Invertebr. Pathol.
(2018)Transcriptomic investigation of immune responses of the Apis cerana cerana larval gut infected by Ascosphaera apis
J. Invertebr. Pathol.
(2019)Full-length transcriptome analysis of Litopenaeus vannamei reveals transcript variants involved in the innate immune system
Fish Shellfish Immunol.
(2019)De novo assembly of a reference transcriptome and development of SSR markers for Ascosphaera apis
Acta Entomol. Sin.
(2017)Isoform sequencing and state-of-art applications for unravelling complexity of plant transcriptomes
Genes
(2018)A survey of the sorghum transcriptome using single-molecule long reads
Nat. Commun.
(2016)Gene ontology: tool for the unifcation of biology. The Gene Ontology Consortium
Nat. Genet.
(2000)
Characterization of the human ESC transcriptome by hybrid sequencing
Proc. Nati. Acad. Sci. USA
Coupling mRNA processing with transcription in time and space
Nat. Rev. Genet.
Analysis of transcripts and splice isoforms in red clover (Trifolium pratense L.) by single-molecule long-read sequencing
BMC Plant Biol.
The developmental dynamics of the Populus stem transcriptome
Plant Biotechnol. J.
Transcriptomic analysis of Ascosphaera apis stressing larval gut of Apis mellifera ligustica (Hyemenoptera: Apidae)
Acta Entomol. Sin.
Morphological and molecular identification of chalkbrood disease pathogen Ascosphaera apis in Apis cerana cerana
J. Apic. Res.
A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing
Sci. Rep.
Integrative transcriptome sequencing reveals extensive alternative trans-splicing and cis-backsplicing in human cells
Nucleic Acids Res.
Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research
Bioinformatics
Landscape of transcription in human cells
Nature
SMRT sequencing of full-length transcriptome of fea beetle Agasicles hygrophila (Selman and Vogt)
Sci. Rep.
Alternative cleavage and polyadenylation: extent, regulation and function
Nat. Rev. Genet.
Abiotic stresses modulate landscape of poplar transcriptome via alternative splicing, differential intron retention, and isoform ratio switching
Front. Plant Sci.
A human ESC-based screen identifies a role for the translated lncRNA LINC00261 in pancreatic endocrine differentiation
eLife
The many faces of long noncoding RNAs
FEBS J.
Widespread polycistronic transcripts in fungi revealed by single-molecule mRNA sequencing
PLoS One
Transcriptome analysis of Ascosphaera apis stressing larval gut of Apis cerana cerana
Acta Microbiol. Sin.
First identification of long non-coding RNAs in fungal parasite Nosema ceranae
Apidologie
De novo transcript sequence reconstruction from RNA-seq using the trinity platform for reference generation and analysis
Nat. Protoc.
Proovread: large-scale high-accuracy pacbio correction through iterative short read consensus
Bioinformatics
Reconstructing complex regions of genomes using long-read sequencing technology
Genome Res.
Standard methods for fungal brood disease research
J. Apic. Res.
SMRT sequencing of full-length transcriptome of flea beetle Agasicles hygrophila (Selman and Vogt)
Sci. Rep.
Hybrid error correction and de novo assembly of single-molecule sequencing reads
Nat. Biotechnol.
Cited by (7)
Factors that limit the productive use of Apis mellifera L.– a review
2024, International Journal of Tropical Insect ScienceResolving the Microalgal Gene Landscape at the Strain Level: A Novel Hybrid Transcriptome of Emiliania huxleyi CCMP3266
2022, Applied and Environmental MicrobiologyFull-length transcriptome analysis of pecan (Carya illinoinensis) kernels
2021, G3: Genes, Genomes, Genetics
- 1
These authors contributed equally to this work.