Construction of a long-read reference transcriptome for P. huashanica
In order to obtain desirable transcript isoforms, high-quality RNA was extracted from four tissues of P. huashanica to generate four RNA samples. These RNA samples were pooled with the same amount of each sample. Two size-fractionated, full-length cDNA libraries (1–5 kb and 4.5–10 kb) were constructed and subsequently sequenced using the Pacific Bioscience Sequel platform with three cells (2 and 1 cell, respectively). A total of 24.15 Gb raw polymerase reads with an average of 8.05 Gb per cell and 20.77 Gb subreads with an average of 6.9 Gb per cell were collected from SMRT sequencing (Additional file : Table S1 and Figure S1). After quality control, 3,677,921,244 bp CCS (reads of insert, 1,688,359 reads) were obtained, including 1,140,528 in the 1–5 kb library and 547,831 in the 5–10 kb library (Table. 1). The ROI length distributions of each library are shown in the Additional File: Figure S2). Of these reads of insert, 507,104 (~ 30.04%) were classified as full-length and non-chimeric, and 902,746 (~ 53.47%) were classified as non-full-length reads based on whether 5’ and 3’ ends and poly A tails were detected. Only reads with two ends and a poly A tail were classified as full-length (Additional file: Figure S3). Only full-length non-chimeric reads and non-full-length reads were used in further analysis, while short reads with a length of < 300 bp and chimeric reads were discarded from the subsequent analysis.
Functional annotation of coding genes
Full-length non-chimeric reads were clustered into consensus groups. For each cluster, if there was sufficient FL and non-FL coverage, then Quiver was run to polish the consensus. Only high QV consensus sequences were used for downstream analysis. High quality consensus isoforms of each library were merged into the final results, and redundancy was removed. After clustering and polishing these three libraries, we obtained 112,596 isoforms from 177,882 isoforms (Table 2), and the length distribution of final consensus isoforms is shown in (Fig. 1).To improve PacBio transcript quality, RNA-Seq reads were used to trim all final consensus isoforms, and mapping identity, substitution, deletion, insertion and mapping coverage were obtained (Additional file: Table S2). Then, we used BLAST, BLAST2GO and InterProScan5 to perform functional annotation terms from the non-redundant (NR), NCBI nucleotide sequence (NT), Gene Ontology (GO), Cluster of Eukaryotic Orthologous Groups (KOG), Kyoto Encyclopaedia of Genes and Genomes (KEGG), SwissProt, and Interpro protein databases with transcripts (Additional file: Table S3). As a result, 103,875 (92.25%) unigenes were annotated with at least one functional database (Table 3). Also, with KOG, GO and KEGG annotation, functional distribution was statistically assessed and the results are shown in (Fig. 2). In addition, we obtained 36,283 annotated transcripts among NR, KEGG, Swissprot and Interpro (Fig. 3).
Table 1
Summary of reads of insert classification, the size fraction of F01 and G01 libraries were 1–5 kb, and the size fraction of H01 library was 4.5–10 kb
Sample | Library | reads of insert | five prime reads | three prime reads | poly-A reads | full-length non-chimeric reads | full-length non-chimeric read length(bp) |
Psathyrostachys_huashanica | F01 | 565467 | 301,093(53.25%) | 316,921(56.05%) | 292,646(51.75%) | 194,333(34.37%) | 451.5 |
Psathyrostachys_huashanica | G01 | 575061 | 345,129(60.02%) | 360,354(62.66%) | 333,960(58.07%) | 232,059(40.35%) | 457.5 |
Psathyrostachys_huashanica | H01 | 547831 | 267,902(48.9%) | 269,092(49.12%) | 139,934(25.54%) | 80,712(14.73%) | 2723 |
Table 2
Summary of the number of isoforms, high-quality sequences from sLibrary were clusted and corrected to create Final isoforms
Library | | sLibrary_isoforms | Final_isoforms | |
Psathyrostachys_huashanica_F01 | | 70137 | 49273 | |
Psathyrostachys_huashanica_H01 | | 21969 | 3158 | |
Psathyrostachys_huashanica_G01 | | 85776 | 60165 | |
TOTAL | | 177882 | 112596 | |
Table 3
Summary of functional annotation result, final isoforms were used to perform functional annotation with seven databases (Nr, Nt, Swissprot, KEGG, KOG, Interpro, GO)
Values | Total | Nr | Nt | Swissprot | KEGG | KOG | Interpro | GO | Intersection | Overall |
Number | 112,596 | 92,515 | 95,790 | 65,275 | 67,659 | 67,051 | 54,331 | 45,466 | 20,919 | 103,875 |
Percentage | 100% | 82.17% | 85.07% | 57.97% | 60.09% | 59.55% | 48.25% | 40.38% | 18.58% | 92.25% |
1. Polymerase reads length distribution |
Illumina sequencing and assessment of sequencing quality
To determine the appropriate time points for RNA-seq analysis in P. huashanica, time course virus titers were quantified using qRT-PCR at different time points after BYDV-GAV inoculation. As shown in (Fig. 4), BYDV-GAV titer increased dramatically at 3 dpi and then peaked at 10 dpi and then declined gradually until 14 dpi when the virus was barely detectable. Then, mock-infected and BYDV-GAV-infected leaves (3 dpi, 7 dpi and 14 dpi) were selected to conduct RNA-seq analysis.
Since the genome sequence of P. huashanica was not available, we used our PacBio sequencing dataset as a reference to perform RNA-SEq. High quality RNA was extracted from the four samples; each had three repetitions to generate 12 libraries. The libraries were sequenced on an Illumina HiSeq2000 platform. More than 16 GB raw reads and 115 million clean reads were generated from each library using Illumina paired-end sequencing (Additional file: Table S4). Thereafter, clean reads were aligned to the PacBio dataset and a total of 76,313 transcripts were detected, including 16,827 lncRNA transcripts and 47,196 mRNA transcripts which were used for FPKM analysis. The assessment of transcript coding quality was determined by three types of prediction software, and the results are shown in (Fig. 5).
Identification of differentially expressed genes (DEGs)
Differentially expressed genes (DEGs) were obtained by comparison of BYDV-GAV-inoculated samples with mock-inoculated samples at each time point. DEGs were identified with the criteria of |fold change| ≥ 2 and q-value ≤ 0.001 at particular time points (Fig. 6). In this way, 11,282 genes were identified to be DEGs at 3 dpi, of which 5991 genes were upregulated and 5291 were downregulated. Interestingly, there were 4689 genes downregulated, and there were more than 3864 upregulated genes at 7 dpi. At the later time points, the upregulated genes increased to 4729, which again exceeded the 4528 downregulated genes. In addition, we used volcano plots to visualize DEGs in a more intuitive way (Fig. 7). The results of the distribution of upregulated and downregulated genes were consistent with the accumulation of virus in P. huashanica. All statistically significant DEGs were further characterized by functional annotation and enrichment analysis.
Short time series expression miner (STEM) analysis
To explore the major expression trends of DEGs from the sequencing dataset over the four time points, we performed a clustering and visualization analysis using STEM software and found that all DEGs were assigned to 46 clusters that displayed similar expression patterns. We found that 15 clusters were statistically significant (P-value ≤ 0.05) (Fig. 8a). Among them, clusters 45, 46, 47, 48 and 49 exhibited upregulated expression patterns during viral infection, while the rest of the clusters showed downregulated expression patterns. Clusters 31, 33, 34, 35, 36, 43, 44 and 45 showed initial increased expression levels at 3 dpi but were downregulated at a later infection stage. Clusters 29, 30, 38, 39, 40, 41, 42, 47, 48 and 49 exhibited gradually increasing expression levels from 3 dpi to 7dpi, but showed downregulated expression levels in the following period. Those upregulated clusters were induced by the BYDV-GAV infection to resist viral replication and movement.
In addition, we also used the R ‘Mfuzz’ package to cluster the expression levels of all detective genes, and achieved 12 clusters (Fig. 8b), among which, clusters 9 (7.9%) and 10 (5.2%) exhibited upregulation during all stages of viral infection. On the contrary, clusters 6 (9.1%) and 11 (5.6%) showed downregulation during all stages of viral infection. Moreover, clusters 5 (5.3%), 7 (6.2%), 8 (8.0%) and 12 (11.8%) also played critical roles in the response to the viral invasion.
GO and KEGG enrichment analysis of DEGs
To investigate the biological function of viral-induced genes in P. huashanica, GO and KEGG pathway analysis were performed on DEGs at different time points. The GO enrichment results for DEGs between 3 dpi and mock-infected indicated that the cellular process and metabolic process were the most significantly represented groups in the biological process category, while cells and cell parts were the most common enrichment terms in the cellular component. Within the molecular functional category, catalytic activity and binding were the predominant enrichment groups (Fig. 9a). The above results were similar to what had been observed for other viral infection time points relative to mock (Fig. 9b and Fig. 9c). However, certain differences also occurred between these time points and mock. For example, the number of upregulated genes enriched in metabolic processes and catalytic activity were more downregulated at 3dpi, while upregulated enriched genes decreased at 7 dpi and increased at 14 dpi, which corresponded to the amount of virus.
According to the annotation from the KEGG database, the most enriched pathway was the metabolic pathway, followed by biosynthesis of secondary metabolites at all infection stages. Additionally, we found that DEGs related with plant-pathogen interaction, phenylpropanoid biosynthesis, starch and sucrose metabolism, the MAPK signalling pathway and plant hormone signal transduction were significantly enriched at 3 dpi (Fig. 10a). The pathways of plant-pathogen interaction, mRNA surveillance, phenylpropanoid biosynthesis, starch and sucrose metabolism and glycolysis/gluconeogenesis were predominant at 7 dpi (Fig. 10b). Among main pathway categories, RNA transport, plant-pathogen interaction, plant hormone signal transduction, phenylpropanoid biosynthesis, mRNA surveillance pathway and the MAPK signalling pathway were significantly represented at 14 dpi (Fig. 10c). In summary, most of the DEGs were putatively involved in resistance-related pathways. These genes appeared to participate in the interactions between P. huashanica and BYDV-GAV.
Identification of transcription factors and transcriptional regulators
We identified 605 TFs distributed in 46 families among the 15,499 DEGs. Then, these TFs were classified, and mainly include the following families: AP2-EREBP (apetala2-ethylene-responsive element binding proteins) (25 genes), WRKY (41 genes), bHLH (basic helix-loop-helix) (33 genes), MYB-related (40 genes), MYB (13 genes), NAC (NAM, ATAF1-2, and CUC2) (13 genes), C2H2 (13 genes), NF-YA (13 genes), GARP-G2-like (17 genes) and bZIP (basic region/leucine zipper motif) (22 genes) (Additional file: Table S6). Transcriptional regulators mainly focused on AUX/IAA (35 genes), SET (19 genes), TRAF (Tumour necrosis factor receptor-associated factor) (18 genes), SNF2 (17 genes), GNAT (Gcn5-related N-acetyltransferases) (9 genes), Jumonji (9 genes), HMG (high-mobility-group) (8 genes) and PHD (8 genes) (Additional file: Table S5). The results indicated TFs widely participate in fighting BYDV-GAV infection.
Validation of gene expression profiles using qRT-PCR
The expression trends of 20 representative genes were verified by qRT-PCR in a separate experiment. The genes selected that were related to the defence response were divided into three categories. The first class included protein kinases, such as cysteine-rich receptor-like protein kinase (CRPK), mitogen-activated protein kinase (MAPK), calcium-dependent protein kinase related genes (CDPK) and wall-associated receptor kinase (WARK). The second class included genes involved in defence-related proteins, such as pathogenesis-related protein (PR), heat shock protein 70 (HSP70) and WRKY transcription factor (WRKY). The last class selected included resistance genes, such as those coding for disease resistance proteins. The results of all 20 genes selected using RT-qPCR were consistent with the results obtained from RNA-Seq analysis (Fig. 11).