Dynamic transcriptome profiling of Bean Common Mosaic Virus (BCMV) infection in Common Bean (Phaseolus vulgaris L.)

Bean common mosaic virus (BCMV) is widespread, with Phaseolus species as the primary host plants. Numerous BCMV strains have been identified on the basis of a panel of bean varieties that distinguish the pathogenicity types with respect to the viral strains. The molecular responses in Phaseolus to BCMV infection have not yet been well characterized. We report the transcriptional responses of a widely susceptible variety of common bean (Phaseolus vulgaris L., cultivar ‘Stringless green refugee’) to two BCMV strains, in a time-course experiment. We also report the genome sequence of a previously unreported BCMV strain. The interaction with the known strain NL1-Iowa causes moderate symptoms and large transcriptional responses, and the newly identified strain (Strain 2 or S2) causes severe symptoms and moderate transcriptional responses. The transcriptional profiles of host plants infected with the two isolates are distinct, and involve numerous differences in splice forms in particular genes, and pathway specific expression patterns. We identified differential host transcriptome response after infection of two different strains of Bean common mosaic virus (BCMV) in common bean (Phaseolus vulgaris L.). Virus infection initiated a suite of changes in gene expression level and patterns in the host plants. Pathways related to defense, gene regulation, metabolic processes, photosynthesis were specifically altered after virus infection. Results presented in this study can increase the understanding of host-pathogen interactions and provide resources for further investigations of the biological mechanisms in BCMV infection and defense.


Background
Bean common mosaic virus (BCMV) is a seed borne, aphid-transmitted virus with worldwide distribution [1]. Common bean (Phaseolus vulgaris L.) is the main host of this virus, although it also infects other Phaseolus species [2,3]. This virus can cause significant yield losses (50-100 %) in the host crop plants [4][5][6]. As a member of the family Potyviridae, it is a single-stranded, positivesense RNA virus approximately 10 kb in length and encodes an open reading frame for a polyprotein containing 10 genes and a eleventh gene is created by ribosomal slippage in the P3 protein and is called PIPO [7][8][9][10]. Each of the 10 other proteins is cleaved out after polyprotein synthesis by one of three viral encoded proteases [7].
There are 8 pathogenicity groups in the BCMV complex, identified by the virus response on a standard set of differential bean lines [11][12][13]. The differential bean lines contain either a single resistance gene, or none, or combinations of stacked resistance (R) genes [14,15]. Seven resistance loci have been identified: a dominant I locus, and recessive loci, bc-u, bc-1, bc-1 2 , bc-2, bc-2 2 , and bc-3 [11,14,16]. The I locus is associated with a cluster of R gene homologs belonging to the TIR-NB-LRR (Toll/interleukin-1-nucleotide binding site-leucine rich repeat) plant immunity receptors [16]. However, the responsible gene within this locus is yet to be identified. Only one resistance locus, "bc-3", has been identified positively (as the gene eIF4E [13,17]). Although there are a number of different strains of BCMV, the sequences of these strains are closely related to one other and may recombine into new strains that are able to break resistance [13,18,19].
Host plants activate a number of signaling cascades to recognize different pathogens and to develop suitable defensive strategies [20][21][22][23][24]. Defense strategies involve differential regulation of various genes related to metabolism, signal transduction, protein modifications and other cellular functions [25] and modulation of alternate splicing patterns after the virus infection [26,27]. Defense responses in many cases are mediated by transcription factors that control signaling pathways and other host-pathogen interactions [28,29]. Thus, identification of various regulatory components involved in host-pathogen interactions can provide a roadmap to explore the resistance mechanisms, identify candidate genes, and development of suitable molecular markers for screening germplasm [30].
Despite the economic importance of BCMV, the molecular pathways associated with BCMV infection are not fully characterized. In order to identify host plant response upon BCMV infection, we characterized the response of common bean (Phaseolus vulgaris L.) to two BCMV strains on a widely susceptible common bean line, Stringless green refugee. Previous work has demonstrated that pathogenicity group I is present in Iowa and only shows symptoms on bean lines containing no known BCMV resistance genes [31]. However, during characterization of the NL1-Iowa BCMV strain, a second BCMV strain with different severity and symptoms was identified on a second susceptible cultivar, Dubbele witte. Due to the differences, this second strain was tentatively named BCMV-Strain 2 or S2. The two BCMV strains were characterized by comparing the genome sequences and the host plant responses after infection. The changes in gene expression and alternate splicing levels at two different stages of BCMV infection were evaluated using RNA-Seq data. Our results support two distinct BCMV strains, with distinct interactive profiles in the infected host plants. We describe the ways that systemic virus infection modulates various local or coregulated defense pathways for the BCMV strains in the evaluated host plant.

Plant growth and virus infection
P. vulgaris cv. Stringless green refugee was planted in the greenhouse under ambient conditions for summer (16 h day, 8 h night). Fourteen days after planting, the first true leaves were mechanically inoculated by first dusting 600-mesh carborundum and then spreading 50 mM potassium phosphate buffer, pH 7.0, containing infectious leaf sap of BCMV NL1-I (Genbank: KM023744, [31], BCMV-S2 (Genbank: KU896809), or buffer only provided the mock inoculation.

Strain determination of unknown virus
To further characterize the Unknown strain (BCMV-S2), a test of P. vulgaris cultivars obtained from the Germplasm resources information network (GRIN) was conducted to identify the strain. P. vulgaris cultivars Dubbele witte (

Sample harvesting, RNA isolation and sequencing
Leaf tissue was harvested from healthy and virus infected plants using three biological replicates and at several developmental stages. The first trifoliolate leaf of five plants for each replicate was collected at four days post-inoculation, then the second at six days and so on every two days until 14 days post-inoculation. Harvested tissues were immediately frozen in liquid nitrogen and were stored at −80°C. Two leaves for each time point were tested for the presence of virus by ELISA using the general potyvirus antibody from Agdia Inc., leaving three leaves of each replicate for RNA extraction. Similarly, uninoculated leaves were used as healthy controls.
Total RNA was extracted from the infected and control leaf tissues using Qiagen RNeasy® Plant mini kit (Qiagen, Germantown, MD) as per the manufacturer's instructions. The extracted RNA was purified using the Qiagen RNA purification kit and treated with DNaseI to remove any DNA contamination. Subsequently, RNA samples were tested for their quantity and quality using a NanoDrop spectrophotometer and Agilent 2100 Bioanalyzer. RNA was extracted from three treatment groups at two developmental stages and three biological replicates (Day 4 Healthy, Day 4 BCMV-S2, Day 4 NL1-I, Day 8 Healthy, Day 8 BCMV-S2, Day 8 NL1-I) for a total of 18 RNA sequencing libraries that were prepared using 1-3 μg of total RNA from healthy and treatment groups using TruSeq RNA Sample Preparation Kits (Illumina Inc., San Diego, CA) per the manufacturer's protocol. Each library was barcoded and multiplexed in pools of six samples per lane. A total of three lanes were used on the Illumina HiSeq 2000. The sequencing reaction produced single-end reads of 50 bps. The sequencing was conducted at the DNA sequencing core facility, Iowa State University, Ames, IA.

Mapping, transcript assembly, and alternate splicing analysis
The single-end raw read data obtained above was processed further, as follows. Reads were separated by sample using the barcode information. The quality of raw reads was evaluated using fastqc (http://www.bioinformatics.ba braham.ac.uk/projects/fastqc/). The sequencing adapters and low quality bases at the 5 prime and 3 prime ends were removed using quality scores 20 or below in Trimmomatic [32]. Resulting reads with average quality score of 20 or below were excluded from subsequent analyses. The cleaned reads were once again tested using fastqc. The common bean genome [33] and related annotation information were downloaded from the Phytozome (http://www.phytozome.net/) and high quality reads were aligned against it using Tophat software [34]. Default settings were used for aligning the RNA-Seq reads with maximum two mismatches (−v 2). The alignment statistics were obtained using RNA-SeQC bioinformatics tool [35]. The transcript specific analysis was performed using the Tuxedo pipeline [36,37], comprising Cufflinks2 (assembling transcripts), Cuffmerge2 (merging transcript assemblies from different samples), Cuffdiff2 (differential transcript profiling and splicing analysis). The transcripts were assembled from the mapped reads with cufflinks version 2.0.0 using the default parameters except parameter '-j' (minimum depth of coverage in the intronic region, value -0.3). The transcripts were classified based on their abundance, and lowly expressed (FPKM < 0.3) transcripts were eliminated to avoid misassembly issues due to low read count. The program Cuffmerge version 2.0.0 was used to merge the transcripts from different samples. The Cuffdiff2 tool was used to identify significant alternate splicing events (p value < 0.05 with FDR correction) and their fpkm values. Various splicing events were annotated using ASTALAVISTA bioinformatics software [38]. This tool can classify the extent of various alternate splicing events such as exon skipping, alternate donor, alternate acceptor, intron retention etc., and using transcript models obtained from cufflinks.

Differential expression analysis
The mapping information was used to count the number of raw reads using HTSeq package [39]. The HTSeq software provides raw read counts for uniquely aligned reads to a single gene model and discards any reads aligning to multiple locations in the genome. The R package DESeq2 [40] was used to perform differential expression (DE) analysis between different treatments. It uses the raw read count from previous alignments to fit a generalized linear model, and estimates the differences in expression level between genes. The equation parameters are modeled using a negative binomial distribution. To call a gene differentially expressed, the DESeq2 output was filtered using two criteria. First, an adjusted p-value for multiple testing corrections with value less than 0.05 was used for DE genes. Second, genes showing less than two fold change were not considered as DE. The DE genes were categorized as up-regulated and down-regulated gene sets to perform the gene ontology (GO) analysis. The GO and functional enrichment of various biological, cellular and molecular gene classes was performed using agriGO toolkit [41]. The enrichment analysis was performed using Fisher's exact test with a false discovery rate correction to obtain adjusted p-value for each class. A pvalue threshold of 0.05 was used to determine significant enrichment. The KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis was performed using KOBAS 2.0 [42]. The transcript sequences specific to upregulated and down-regulated genes were extracted from the common bean genome transcripts and were blasted against the KEGG pathway. The pathway enrichment was determined using Fisher's exact test using a false discovery rate correction method [43]. A p-value of 0.05 or less was used as threshold to determine significantly enriched pathways. To identify the putative up-regulated or downregulated transcription factors (TFs) in the expression data, Hidden Markov Model (HMM) motif sequences of different TFs were downloaded from Legume Transcription Factor (TF) Database [44]. This database has a collection of 61 TF families from soybean, Lotus japonicus, and Medicago truncatula. The HMM motif sequences were blasted against the DE gene sequences. The putative TFs were detected using 90 % sequence homology and e-value ≤ 10 −10 .

De novo assembly of virus genomes
A virus BLAST database was built using a large collection of genomes and genes of different plant virus strains from NCBI (http://www.ncbi.nlm.nih.gov/). The unmapped raw reads from individual time points were extracted and combined separately for the NL1-I and the unknown potyvirus treatment in a single dataset. The resulting datasets were searched against the virus database to identify the virus-specific read sequences.
The matched read sequences were extracted to perform a de novo assembly of the virus genome using default settings in the Trinity de novo assembly platform [45]. Viral reads were normalized to compare reads to the gene size by the following equation = ((read count single gene)/(total virus read count *kb size of gene)) *100.

Results
Identification of BCMV strains utilizing a P. vulgaris bean screen Previously, the identification of NL1-I was conducted using a differential selection of bean cultivars. It was found that this strain successfully infected the differential bean cultivars Stringless green refugee and Dubbele witte [31]. In contrast, BCMV-S2 was tested against the same common bean panel as NL1-I [31], and was able to successfully infect the P. vulgaris cvs. Sutter pink, Dubbele witte, Stringless green refugee, Puregold wax and Imuna. The infection in the cvs Puregold wax and Imuna places this strain of BCMV in pathogenicity group II [14].

BCMV infection assay during common bean development
Because BCMV-NL1-I and BCMV-S2 caused different reactions in susceptible host plants, we wanted to characterize the effects of these two isolates on host gene expression by using RNA-Seq analysis. To determine the best time points for RNA-Seq analysis, a time series analysis was conducted to establish the virus detection time in non-inoculated, trifoliolate, systematic leaves. At four days post-inoculation (dpi), virus was not detected by ELISA in the first non-inoculated trifoliolate leaves. At six dpi although no symptoms were observed, NL1-I was detected by ELISA in the same grouping of trifoliolate leaves as day 4 in two of the three replicates, and BCMV-S2 was detected in all three replicates. Both viruses were detected in the non-inoculated leaves in all replicates at the subsequent sampling times of 8, 10, 12, and 14 dpi. Plants infected by either virus became symptomatic at between 11-12 dpi. Based this time course, we used the 4 dpi (the first time point before any positives were recorded) and the 8 dpi (the day that all plants tested positive) samples for RNA-Seq analyses.

Building virus genomes and comparative analysis
The RNA-Seq data collected for transcriptome profiling provided the opportunity to determine the sequence of the new pathogenicity group II isolate, BCMV-S2 and confirm the identity of BCMV-NL1-I. To determine the viral sequences, we aligned the RNA-Seq reads from 8 dpi with the common bean (P. vulgaris L. version1) genome [33]. The proportion of unaligned reads was comparatively higher in virus-inoculated plants (12.5 %) than the healthy plants (3.6 %) at 8 dpi, which was expected due to the presence of virus transcripts along with plant genomic sequences. We utilized the unmapped reads to assemble the corresponding virus genomes for each BCMV strain (Additional file 1: Figure S1). To identify reads of virus origin, we performed a sequence similarity search of unmapped reads against different virus sequences from NCBI using blastn implemented in Basic Local Alignment Search Tool (BLAST+) [46]. Reads showing hits to various virus sequences were identified and used to build separate de novo assemblies for the complete genome sequences of BCMV-NL1-I (10049 bases) and BCMV-S2 (10048 bases). A search of one of the recovered and assembled viral genomes matched the NL1 genome present in NCBI (Genbank Accession AY112735), so we refer to that virus and assembly as NL1-I. A comparison of the NL1-I and the BCMV-S2 strain nucleotide sequences showed a significant number of polymorphic sites (identity = 9823/ 10048) between the two virus genomes (Fig. 1). This analysis confirms that BCMV-S2 is different from the NL1-I strain. There are two nucleotide changes in the 5' UTR and one nucleotide change in the 3' UTR, and 225 nucleotide changes in the coding regions. This resulted in a total of 45 amino acid changes between the two strains, 27 of them were changes between amino acids which had the same properties and likely to be silent mutations and 18 amino acids were changes between amino acids that had different properties (Fig. 1). There was an average of 3957 total viral reads for NL1 at 4 as compared to an average of 522,837 total NL1 reads at 8 dpi. Similarly, BCMV-S2 had an average of 32,080 total viral reads at 4 dpi and 776,986 total viral reads at 8 dpi. After normalizing the reads to account for the various gene sizes, we determined that overall the genes had equal number of reads except for P1 which contained roughly twice as many reads as any other BCMV gene (Additional file 2: Figure S4).

Mapping statistics
To investigate the genetic mechanisms of common bean-BCMV interactions, we analyzed the RNA-Seq data to determine how alternate splicing and transcriptome profiles were altered in response to BCMV-NL1-I and BCMV-S2 at early (4 days) and late (8 days) stages of infection. The bioinformatics analysis pipeline is presented in Additional file 1: Figure S1. We obtained approximately 582 million raw sequencing reads with an average of approximately 185-200 million reads per lane, and 85-100 million reads per treatment (Table 1, Additional file 3:  Table S1). Average read count for each treatment ranged from approximately 28 to 33 million. Total read count was reduced after quality filtration and barcode removal, but mostly reads were of high quality and the percentage of excluded reads was low (<1 %) (Additional file 4: Table S2). Analysis of viral reads from each treatment show much higher numbers of virus reads at day 8 than day 4 and controlsfor example, 24-fold more NL1-I reads at day 8 than day 4, and 132-fold more S2 reads at day 8 than day 4 (Additional file 5: Table S3).
The quality-filtered reads were aligned against the available common bean reference genome (P. vulgaris v1.0). The overall mapping percentage ranged from 87.4 % to 96.4 %, with the percentage of uniquely aligned reads varying from 80.0 % to 88.1 % (Table 1). Uniquely mapped reads are particularly suitable for differential gene expression analysis [39]. Approximately 8.2 % to 10.3 % reads showed matches to multiple positions across the genome. The alignment statistics identified that most of the reads aligned to genic locations, with the highest mapping rate for exon features (89-91 %). The mapping rate to introns and intergenic regions was comparatively much lower at 4-5 % for introns and 4-6 % for intergenic regions.
The populations of viral sequences were not entirely uniform. When BCMVS2 reads are mapped against the BCMVS2 genome, there are 3 SNPs (all heterozygous, with minor allele frequency of~10 % to 20 %); and when NL1-I reads are mapped against the NL-I genome, there are 6 SNPs (all heterozygous, with minor allele frequency of 8 % to 45 %) (Additional files 6, 7, 8 and 9). This indicates that there was a small amount of variation in the viral populations within our samples (about 0.03 % -0.06 % of the genomes are variant).
When BCMVS2 reads are mapped against the NL1-I genome, there are 202 SNPs (15 heterozygous); and when Total number of reads represent the total count across three biological replicates, while average number of reads represent the mean value from three biological replicates. The genes were considered as expressed with minimum three reads aligned to a gene model

Modulation of alternate splicing events after virus infection
Alternate splicing (AS) has a well-characterized role in various plant growth processes and biotic and abiotic stress conditions [27,[47][48][49][50]. We used cufflinks [36] to construct the transcripts from alignment outputs and determined various splicing patterns upon virus inoculation, using the ASTALAVISTA tool (Fig. 2a) [38]. In total, 3,194 AS events are identified in the current annotation of the P. vulgaris genome (Additional file 10: Figure S2, Additional file 11 , and other events (approximately 10-11.6 %) (Additional file 12: Figure S3). Our results mostly agree with the AS trends observed in other species [27].
To determine the statistical significance of various splicing events, we used the cuffdiff program [36,37] to evaluate (at p-adj < 0.05) healthy and virus inoculated samples at 4 and 8 dpi. Fewer genes showed differential splicing at 4 dpi than at 8 dpi (Fig. 2b). A total of 30 and 37 genes showed differential splicing for NL1-I and BCMV-S2 at 4 dpi, which corresponds to 85 and 148 respective transcripts (Additional files 13 and 14). At 8 dpi, we found differential splicing for 172 and 342 genes for NL1-I and BCMV-S2 strains, which represent 617 and 1100 transcripts, respectively (Additional files 15 and 16). These results indicate that virus infection causes significant changes in the AS landscape of The transcripts were built using cufflinks2 and significant differential splicing events were detected using cuffdiff2 [36,37]. Low abundance transcripts (FPKM < 0.3) were eliminated to avoid misassembly issues due to low read count. Alternate Splicing Transcriptional Landscape Visualization Tool, ASTALAVISTA [38], was used to characterize different AS events common bean leaves. Evidently, the induced virus stress over time (Day4 vs. Day8) increases the number of significant AS events (Fig. 2b). However, severe infection (Day8) is also associated with a reduction in specific AS events such as intron retention in NL1-I (Fig. 2c) and exon skipping in BCMV-S2 (Fig. 2d) treatments. A similar decrease in some AS forms has been documented under particular stress conditions [50][51][52]. Apparently, host machinery differentially responds to different virus strains in terms of splicing regulation.

Regulatory changes associated with virus infection
We evaluated the coverage of our transcriptome data by considering a gene model as "expressed" if it contained or overlapped with at least three uniquely mapped reads. Approximately 80 % of the published gene models were expressed using this criterion (Additional file 17: Table  S4). In order to identify the differentially expressed genes at each time point and virus treatment, we used the raw read count from the mapping output and evaluated level of gene expression with the DESeq2 statistical package [40]. A gene was considered as differentially expressed (DE) if it had a fold change of ≥ 2 (control vs. treatment) and adjusted p-value for multiple testing corrections less than 0.05 [43]. The DE genes were annotated using the common bean genome annotation (Phytozome.jgi.doe. gov/pz/portal.html#!info?alias = Org_Pvulgaris).
Comparison of gene expression profiles of healthy and virus treated plants at 4 dpi identified only 1 and 2 DE genes in response to NL1-I and BCMV-S2 inoculation, respectively. A single gene, Phvul.004G073400, was down-regulated under both virus treatments (log 2 fold change was −5.5058, −4.9006 for NL1-I and BCMV-S2, respectively). This gene is the small subunit of ribulose bisphosphate carboxylase and is involved in photosynthesis. The down regulation of this photosynthetic gene in two independent virus treatments suggests that the photosynthetic pathway is highly responsive to virus infection. Another gene, Phvul.010G120600, was induced specifically in the BCMV-S2 treatment and is a NAM ("no apical meristem") protein, involved in transcriptional regulation. Interestingly, the Phvul.005G073400 was not down-regulated at 8 dpi in either virus strain (suggesting a time point specific response), but in the S2 leaves at 8 dpi, Phvul.010G120600 continued to be induced (1.4214 at 4 dpi compared to 5.8475 at 8 dpi).
At 8 dpi, we identified a total of 4676 and 2099 DE genes in response to NL1-I and BCMV-S2 treatments, respectively (Fig. 3, Additional files 18 and 19). The log 2 fold changes were comparatively higher in the positive direction in both virus treatments (Fig. 3a, b), indicating that many genes were strongly up-regulated after virus infections. Also, a large number of genes were up-regulated in the NL1-I (2871) and BCMV-S2 treatments (1509) (Fig. 3c). The number of down-regulated genes was 1805 and 590 for NL1-I and BCMV-S2 treatments, respectively (Fig. 3c). A number of genes showed overlap between the two virus treatments (Fig. 4). To better understand the pattern of shared and unique DE genes, we plotted the induced and repressed genes separately from each virus treatment (Fig. 4a, b). A total of 1249 induced (Fig. 4a) and 348 repressed (Fig. 4b) genes were common between both the virus treatments. A total of 1622 induced and 1457 repressed genes showed NL1-I strain specific expression, while 260 induced and 242 repressed genes have a unique expression profile for BCMV-S2 virus strain. Overall, the results indicate that the NL1-I strain had a more vigorous regulatory response than the BCMV-S2 strain.

Response of plant pathways to BCMV infection
To discern information about pathways affected by virus infection, significantly overrepresented (padj < 0.05) gene ontology (GO) terms associated with DE genes were analyzed at 8 dpi for each virus treatment. The analysis was performed with respect to all genes present in the common bean genome. Various overrepresented GO terms associated with biological, molecular and cellular processes were identified for both NL1-I (Fig. 5, Additional file 20) and BCMV-S2 treatments (Fig. 6, Additional file 21). We also performed a pathway analysis using KEGG (Kyoto Encyclopedia of Genes and Genomes) to understand the relation between gene expression pattern and associated pathways for each virus treatment. Different datasets were created for upregulated genes, downregulated, unique, and shared DE genes for each virus treatment. The complete KEGG pathway analysis is presented in Table 4. The KEGG analysis using all DE genes indicated that various pathways with functional classes related to processes involving ribosomes, ribosome biogenesis, photosynthesis, plant-pathogen interaction, and metabolism were perturbed by virus infection. These pathways were commonly enriched in both the virus treatments. However, uniquely expressed genes in the two treatments showed enrichment of different pathways. Photosynthesis antenna proteins, pentose phosphate pathway, protein processing in endoplasmic reticulum, N-glycan biosynthesis, stilbenoid, Fig. 3 Differentially expressed (DE) transcripts (padj < 0.05) upon NL1-I and BCMV-S2 treatments. The transcripts were called as significant using adjusted p values less than 0.05 with false discovery rate (FDR) and log 2 fold values more than 1 or less than −1. The extent and distribution of transcript expression differences were visualized using volcano plots for (a) NL1-I treatment, (b) BCMV-S2 treatments. The x-axis represent -log 10 of adjusted p-values (FDR; [43], and y-axis represent log 2 fold change values between control and treatment comparison. The numbers of significant DE genes were shown using bar plot (c), where orange color represent significantly downregulated genes and blue color represent significantly upregulated genes upon different virus treatments  The bold gene names are common between the two treatments. Gene annotations were retrieved for common bean genome in Phytozome. The unannotated genes were functionally classified using TAIR (indicated by a , the e-value of TAIR match is indicated in the brackets). "NA" represents no functional characterization of gene is available The gene names in bold are common between the two treatments. Gene annotations were retrieved for common bean genome in Phytozome. The unannotated genes were functionally classified using TAIR (indicated by a , the e-value of TAIR match is indicated in the brackets). "NA" represents no functional characterization of gene is available The analysis was performed using KOBAS v2.0 [42] by splitting the DE genes dataset into unique, shared, upregulated and downregulated genes for studied virus treatments. The pathways in bold letters are significant at adjusted p-value < 0.05 diarylheptanoid and gingerol biosynthesis, protein export, biosynthesis of secondary metabolites, and nitrogen metabolism pathways were significantly enriched (p < 0.05) using unique DE genes for the NL1-I treatment. In contrast, ribosome, ribosome biogenesis, and diterpenoid pathways were enriched (p < 0.05) using uniquely expressed genes under the BCMV-S2 treatment.
To discern the identity of various induced and repressed pathways upon virus infection, we used the upregulated and downregulated genes separately for KEGG analysis (Table 4). Pathways specific to photosynthesis antenna proteins, sesquiterpenoid and triterpenoid biosynthesis, and starch and sucrose metabolism were commonly repressed (p < 0.05) under both virus treatments. In contrast, certain downregulated gene specific pathways were unique to each virus treatment. For example, biosynthesis of secondary metabolites, metabolic pathways, carbon fixation in photosynthetic organisms, glyoxylate and dicarboxylate metabolism, and pentose phosphate pathway were unique to NL1-I treatment, while linoleic acid metabolism, alpha-Linolenic acid metabolism, valine, leucine and isoleucine biosynthesis, diterpenoid biosynthesis, and valine, leucine and isoleucine degradation were downregulated (p < 0.05) in the BCMV-S2 treatment. Similar analysis was conducted using the upregulated genes. The results indicated that pathways related to ribosome, ribosome biogenesis, plant pathogen interaction, and glutathione metabolism were shared between two virus treatments and were significantly upregulated (p < 0.05). Certain pathways such as protein processing in endoplasmic reticulum, phenylalanine metabolism, N-glycan biosynthesis, phenylpropanoid biosynthesis, and protein export were uniquely upregulated after the NL1-I treatment, while no unique pathway was observed for the BCMV-S2 treatment.
Clustering patterns illustrated specific modules related to overrepresented GO terms Similar expression patterns between different genes often illustrate functional correlation between them. To identify various expression modules related to specific pathways, the DE genes were clustered with k-means clustering using Genesis software [56] http://genome.tugraz.at). The normalized expression values from the DE genes in each virus treatment were combined to obtain twenty distinct clusters (Fig. 7, Additional file 22). Also, the GO enrichment analysis (as described earlier) was performed using genes within each module to correlate the gene expression patterns with specific functional categories. The number of co-expressed genes varies for different clusters. For instance, the largest clusters represent cellular processes (cluster 17) with 8.5 % of total DE genes (n = 440). This is followed by mitochondrion processes (cluster 11) having Significantly enriched GO slim categories after NL1-I virus treatment. Significant DE genes (padj < 0.05, log 2 fold −1,1) identified after NL1-I treatment were used to perform GO enrichment analysis. The number of genes within each category is represented on x-axis. The significant overrepresented GO classes (false discovery rate corrected p-value < 0.05) were identified using agriGO analysis tool [41] Phaseolus vulgaris version1 genome annotations 8.2 % of total genes (n = 423). Similarly, clusters 3 and 18 have the smallest sizes (89 and 109 genes, respectively) and represent enriched pathways in cellular, photosynthetic, protein modification, binding and catalytic activity. Furthermore, clear patterns are evident in up regulation (cluster1, cluster 13, cluster18) and down regulation (cluster4, cluster6, cluster7).
The GO enrichment analysis, using genes in individual clusters, showed that a number of clusters belong to one specific functional category. For example, specific functional classes such as membrane (cluster4, n = 164), catalytic activity (cluster7 and 8, n = 336 and 384 respectively), mitochondrion (cluster11, n = 423), reproductive (cluster 12, n = 297), iron binding (cluster 14, 225) etc., were represented by a single cluster. However, several clusters represented more than one functional classification. For instance, clusters 13 (n = 113), 15 (n = 356), 18 (n = 109), and 20 (n = 138) exhibit a broader level of functional activities related to protein modification, transferase activity, catalytic activity, and metabolism. Overall, this analysis suggests that many pathways respond to virus infection in a modular fashion by co-expressed gene components. Also, identification of co-expressed genes within the same cluster indicates possible interactions between different pathways responding to virus-induced stress.

Transcription factors involved in response to BCMV infection
The roles of different transcription factors (TFs) under various biotic and abiotic stress conditions have been extensively studied [28,55,57,58]. To identify the level and pattern of expression of different TFs, we extracted the HMM motifs of known legume transcription factor (TF) families available in the legume transcription factor database (http://legumetfdb.psc.riken.jp/) and identified the matched regions in DE gene sequences (e-value < 10 −10 and 90 % identity). This database includes sixty-one TF families from Glycine max, Lotus japonicus, and Medicago truncatula. The TFs were categorized using separate datasets for upregulated genes and downregulated genes for Fig. 6 Significantly enriched GO slim categories after BCMV-S2 virus treatment. Significant DE genes (padj < 0.05, log 2 fold −1,1) identified after BCMV-S2 treatment were used to perform GO enrichment analysis. The number of genes within each category is represented on x-axis. The significant overrepresented GO classes (false discovery rate corrected p-value < 0.05) were identified using agriGO analysis tool [41] using Phaseolus vulgaris version1 genome annotations  (Table 6). In contrast, the Myb_related and bHLH families have more members that were repressed (Table 6). Overall, the analysis indicated that diverse TF classes showed significant but varying responses under the two virus infections.

Discussion
This study describes the regulatory landscape of P. vulgaris and the associated expression changes at two different stages of BCMV infection. We further described the identification and genome sequence of a previously unknown BCMV strain used in this study. At the start of experiment, the identity of one BCMV strain was not known. We utilized the unmapped sequences to produce a de novo assembly of the genomes of two virus To clearly differentiate the transcriptional response to two BCMV strains and to further characterize the severity of symptoms of BCMV-S2, we conducted RNAseq on two time points during infection on a universally susceptible line, stringless green refugee. The analysis of alternate splicing (AS) and differential gene expression (DGE) patterns clearly indicate a strain-specific host response upon infection. A larger number of significant (padj < 0.05) AS events were observed for the BCMV-S2 (342) strain than the NL1-I (172) strain. However, the trend was opposite in terms of DGE, and fewer genes showed differential expression in BCMV-S2 (2099) than for the NL1-I (4676) strain. Overall, these results indicate that the viruses represent two different strains of BCMV which activate differential host transcriptional responses upon infection and that these differences may correlate both to the expanded host range of S2 and the increased severity of the symptoms in susceptible cultivars. Moreover, there is also a high level of overlap between DE genes (upregulated and downregulated) across virus treatmentswhich suggests that common regulatory pathways respond to the different virus stimuli. In contrast, identification of uniquely expressed genes implies strain-specific induction in each case. Plants respond to stress conditions in part by producing spliced isoforms [27,48,50] and by changing transcriptional expression levels [59][60][61]. Thus, we performed a detailed genome wide characterization of host transcriptional response at 4 and 8 days after BCMV inoculation of P. vulgaris L. These stages were selected to represent represented early onset of infection to full systemic spread of virus infection in common bean leaves as determined by ELISA. We determined that although we did not observe symptoms until day 12, we could identify viral reads as early as day 4. We identified a number of putative AS events, previously unannotated in the common bean genome. The distribution ratios of detected AS events were consistent across healthy and virus inoculated samples. However, the frequency of occurrence was different for various AS events. Intron retention (IR) and alternate acceptor events represent the most frequent types of AS events (31-35 %). This observation is consistent with previous studies in other monocot and dicot species [27,62]. An increase in systemic virus stress over time increases the number of significant AS events. These results were consistent with previous observation of increases in AS events after Panicum mosaic virus infection in Brachypodium distachyon [27]. However, we also identified few significant AS forms with relative reduction in number after virus infection. Similar observations were noticed in some other stress related studies [50][51][52] The decreased number of some AS events under severe virus infection suggests the possible role of mRNA degradation machinery to eliminate most of the unproductive transcripts [63,64]. Statistical analysis revealed a number of significant AS events with comparatively more spliced transcripts in the BCMV-S2 than the NL1-I virus treatment, which clearly shows distinct host responses upon infection using different virus strains. Although the data in this study is limited by the nature of sequencing reads and will require further experiments for validation, these results do provide preliminary evidence about modulation of AS landscape during virus infections. Future studies aimed at putative splicing motifs and underlying variants in AS regulators can provide further insights into molecular mechanisms underlying hostvirus interactions. For example, previous analysis of AS during virus infection identified multiple termination codons in the splicing regulators [27], which can potentially be targeted for decay by cellular degradation machinery to keep unproductive transcripts under check [63][64][65].
Similarly, host gene expression levels show drastic changes for a large number of genes after virus infection. Also, the expression changes were clearly different for NL1-I and BCMV-S2 strains, which further highlights the differential host responses upon infection with two strains of the same virus. In our study, the initial sampling stage (day 4) defines the start of host response as revealed by expression changes in only 1 and 2 genes for NL1-I and BCMV-S2 treatments, respectively. In particular, a noticeable repression in expression level of a ribulose bisphosphate carboxylase gene (Phvul.004G073400) was detected under both virus treatments. This observation suggests that photosynthetic pathways quickly respond under pathogen stress conditions. This result is consistent with previous data during infection of Arabidopsis thaliana with Turnip mosaic virus (TuMV) [66]. Analysis of transcript abundance patterns during systemic virus infection stage (day 8) The numbers of genes corresponding to each TF family were counted after blast results. Datasets for upregulated and downregulated genes for each virus treatment was used separately displayed more evident expression shifts in the host transcriptome machinery. Most of the DE genes exhibit induced expression level in NL1-I (61.4 %) and BCMV-S2 (71.9 %) treatments, while fewer genes were downregulated (38.6 % in NL1-I and 28.1 % in BCMV-S2). The majority of upregulated genes involved Leucine-Rich repeat receptorlike protein kinases, calcium-binding EF-hand family members, pathogenesis-related (PR) proteins, oxidative stress related genes, and WRKY transcription factors. Many of these gene classes have a known role during stress response and pathogen resistance [67][68][69][70][71]. Pathway analysis also identified specific gene classes related to kinase activity, receptor activity, cell death, protein modification, protein metabolism, ribonucleoprotein complex, ribosome, cellular metabolic activity, gene expression and translation were particularly induced during virus stress. Also, similar pathways were activated upon NL1-I and BCMV-S2 infections, which suggest the concerted reprogramming of major defense-related modules involved in diverse stress responses. Similar analysis using down-regulated genes identified that pathways associated with photosynthetic, metabolic process, carbohydrate metabolism, transferase activity, and catalytic activity were repressed under virus stress. It would be interesting to analyze other time points, perhaps six days, ten days or even twelve days to determine how the viral response progressed over time. This may provide a more accurate account of changes in transcript abundance due to differences in viral accumulation observed during the analysis of the reads from four and eight days with S2 having approximately eight times the reads compared to NL1 at day two. Corresponding changes in photosynthetic pathways and various metabolic activities after pathogen infection have been reported earlier [72][73][74]. These results indicate that cells extensively modulate the metabolic activity and energy production during virus stress conditions. Switching-on defense mechanism and respiratory processes is a costintensive process [74][75][76], which might occur at the expense of photosynthesis turn off [74]. Reduced photosynthetic gene activity might have been associated with chlorosis or decreased green tissue surface area after systemic virus spread. Overall, this tight metabolic regulation is critical for resistance response or for specific defense mechanisms during virus infection [70]. Specific models explaining the concerted regulation of metabolic activity and photosynthesis have been suggested [77,78]. These models propose a system level coordination between various cellular and metabolic processes to overcome pathogen infection. Our results from co-expressed gene modules representing various response pathways support previous models.
Various transcription factors (TFs) have well-characterized roles under different stress conditions [55,57,58]. Thus, we assessed the complexity of host-virus interactions by identifying TF families activated or repressed during virus stress. Approximately 4.8 % to 5.3 % differentially expressed genes were putative TFs with both up-regulated (2.67 % to 2.95 % for BCMV-S2 and NL1-I, respectively) and down-regulated (2.14 % to 2.31 % for BCMV-S2 and NL1-I, respectively) expression patterns under virus stress. Similar induction of six OsNAC transcription factors has been reported as a result of Rice stripe virus and Rice tungro spherical virus infection [79]. Nonetheless, several TF families identified in this study have known role in response to pathogen and other stresses [80][81][82][83]. This can provide further opportunities for exploring their underlying molecular mechanisms in the regulation of different aspects of host-virus interactions.