Deep Sequencing-Based Transcriptome Analysis of Chicken Spleen in Response to Avian Pathogenic Escherichia coli (APEC) Infection

Avian pathogenic Escherichia coli (APEC) leads to economic losses in poultry production and is also a threat to human health. The goal of this study was to characterize the chicken spleen transcriptome and to identify candidate genes for response and resistance to APEC infection using Solexa sequencing. We obtained 14422935, 14104324, and 14954692 Solexa read pairs for non-challenged (NC), challenged-mild pathology (MD), and challenged-severe pathology (SV), respectively. A total of 148197 contigs and 98461 unigenes were assembled, of which 134949 contigs and 91890 unigenes match the chicken genome. In total, 12272 annotated unigenes take part in biological processes (11664), cellular components (11927), and molecular functions (11963). Summing three specific contrasts, 13650 significantly differentially expressed unigenes were found in NC Vs. MD (6844), NC Vs. SV (7764), and MD Vs. SV (2320). Some unigenes (e.g. CD148, CD45 and LCK) were involved in crucial pathways, such as the T cell receptor (TCR) signaling pathway and microbial metabolism in diverse environments. This study facilitates understanding of the genetic architecture of the chicken spleen transcriptome, and has identified candidate genes for host response to APEC infection.


Introduction
Avian pathogenic Escherichia coli (APEC), a gram-negative, facultative anaerobic bacterium, causes intestinal and extraintestinal infections, septicemia, and mortality in broiler chickens [1]. The most common infectious bacterial disease in poultry, APEC-induced colibacillosis reduces growth and egg production, thereby causing significant economic losses, as well as potentially contaminating poultry products, which generatess a risk for human health [1,2]. The APEC-O1, O2, O78 serotypes of the O serogroup represent at least half of the total number of isolates [3,4], and are responsible for over 80% of human septicemia cases world wide [2]. Except for the control of environmental conditions, such as humidity and ventilation, prevention of APEC infection usually relies on antibiotic therapy or vaccine administration. However, vaccines are not fully effective against heterologous APEC strains and there is consumer pressure to reduce the use of antibiotics in food animal production. Elucidating the host resistance mechanisms against APEC infection is a foundational step in developing sustainable strategies to enhance resistance to APEC through development of more effective vaccines and through genetic selection of poultry populations for enhanced innate resistance to APEC. Until now, the major focus in study of the host:pathogen interaction with APEC has been on the bacteria itself. Some virulence factors or genes responsible for pathogenesis or invasion capacities have been discovered in various APEC strains. With two-dimensional gel electrophoresis, one differentially expressed protein of OmpA was isolated from serum and proposed to be involved in APEC resistance [5]. ExPEC adhesin I has been shown to play a significant role during APEC infection in chickens, as its deletion leads to reduced colonization ability and, moreover, complementation of the adhesin gene restored this ability [6]. By microarray investigation and mutational analysis for confirmation, some upregulated APEC genes have been identified in APEC cultured in APEC-treated chicken serum, and these genes are predicted to contribute to APEC virulence [7]. In addition, some other genes, such as APEC autotransporter adhesin A (aatA) and ibeA have also been reported to affect APEC infection [8,9].
Investigations on the host genomic response is also important, so as to reveal the molecular mechanisms of response to APEC infection. With the sequencing of chicken genome [10], the identification of causative genes and markers for APEC susceptibility or resistance at whole genomic or transcriptomic level is practicable and advantageous for genetic selection of poultry with enhanced resistance capabilities. Gene expression profiling by using an avian macrophage microarray revealed 981 differentially expressed chicken ESTs during phagocytosis of Escherichia coli (E. coli) [11]. A similar study identified 146 common elements modulated by both APEC and M. synoviae and exposure to APEC induced higher expression of cytokine genes and genes involved in oxidative burst than M. synoviae did [12]. Until now, very few studies at the whole transcriptome level have been reported in response to APEC infection in chicken.
Whole transcriptome shotgun sequencing or RNA-seq is an efficient and reliable technology for transcriptomic analysis so as to reveal genetic architecture, to identify sequence variation, and to quantify gene expression [13]. A variety of platforms exist for RNA-Seq, including Illumina Solexa, Roche 454, Life Technology SOLID, and others. Identification of host genetic factors resistance to APEC is of great significance for poultry breeding and production. With use of Illumina deep sequencing of APECchallenged birds, this study aims to investigate the genetic architecture of the spleen transcriptome, and to discover genes/ transcripts and genetic markers for resistance to APEC infection in the chicken.

Illumina Draft Reads
In this study, spleens of three males were used to prepare one pooled RNA sample for each group of NC, MD and SV. Three cDNA libraries were then constructed to perform Illumina deep sequencing. The schematic of Illumina deep sequencing and analysis are shown in Figure 1

Assemble and BLAST Analysis
After assembly analysis based on all Illumina reads, we identified 148,197 contigs with total residues of 195,622,566 bp. The average length of all contigs was 1,320 bp, with the smallest sequence of 300 bp and the largest one of 19,212 bp. The sequence length distribution of contigs is indicated in Figure 2 and Table S1. Analysis of nucleotide content within all contigs showed that the content of A, T, C, G were 26.45% (51,750,884 nucleotides), 26.52% (51,887,566), 23.83% (46,619,027), and 23.19% (45,365,089), respectively, giving rise to an overall GC content of 47.02% in the chicken whole transcriptome.
Further assembly analysis showed that all contigs contributed to 98,461 unigenes. BLAST analysis with the known chicken genome sequence indicated that 134,949 contigs and 91,890 unigenes match the chicken genome. The distributions of contigs and unigenes in chicken chromosomes are described in Table 2.

Differentially Expressed Genes
Comparison of gene expression showed that a total of 13,650 unigenes were differentially expressed between any two-way comparison of the groups of NC, MD and SV (fold changes $2 or #22; q value ,0.01), including 6,844 significantly expressed isogenes between NC and MD (NC Vs. MD), 7,764 between NC and SV (NC Vs. SV), and 2,320 between MD and SV (MD Vs.
A total of 38 differentially expressed genes are involved in the T cell receptor (TCR) signaling pathway, which has important functions in animal immunity (Table 4). Three crucial genes in this pathway, CD148, CD45, and LCK, exhibited significantly different expression among NC, MD and SV groups ( Table 5). The CD148 was significantly up-regulated in SV (P,0.001) and MD (P,0.05) compared with NC ( Figure 5A); in contrast, LCK was significantly lower in SV (P,0.001) and MD (P,0.001) compared with NC ( Figure 5C). The CD45 gene, was expressed at a significantly lower level in SV compared with NC (P,0.001) and with MD (P,0.001), but was not significantly different between NC and MD (P.0.05) ( Figure 5B).

Assembly, Blast and GO Analysis
The high-throughput sequence data obtained by Illumina deep sequencing contributes to the understanding of the genetic architecture of chicken transcriptome. In this study, we pooled RNA from multiple individuals to generate one sample, and subsequently performed Illumina deep sequencing. This pooling strategy was widely used in some similar studies [14,15]. As a result, we generated 148,197 contigs for 195.6 Mb residues of chicken spleen transcriptome based on 43,481,951 Illumina read pairs. Considering all contigs, the overall GC content of the transcriptome was calculated to be 47.02%, which is very close to that reported for genome-wide exons (i.e. 47.00% for GGA4q), but much higher than that of genome-wide introns (i.e. 40.00% for GGA4q) [10]. We obtained a total of 98,461 unigenes by further assembly analysis, which was more than all predicted genes (20,000-23,000) in chicken genome [10]. We compared our unigenes with NCBI unigene database using blastn, and found that 35  complete transcript by assemble analysis. Meanwhile, some unigenes of this study are longer than corresponding unigene in NCBI, i.e. 3,661 unigenes were longer than NCBI corresponding unigenes using a cutoff (less than 90% coverage for our unigenes and more than 90% coverage for NCBI unigenes). In addition, there are 7,276 unigenes which couldn't be mapped to chicken genome. These unmapped unigenes still belong to chicken transcriptome, and some if not all of them might be ncRNA which need to be studied in future. Alternative splicing is also very common in chicken genome, as it was previously estimated that 40260% of all genes and 74% of multiexon genes are alternatively spliced in the human genome [16]. GO annotation showed that some unigenes were involved in the three categories of biological process (11,664 unigenes), cellular component (11,927), and molecular function (11,963). The chicken whole genome was sequenced in 2004 [10], but no major update has been published since then. Recently, the genomes of two other avian species, i.e. turkey and zebra finch, were sequenced [17,18]. The known chicken genome is 1,063 Mb in total length, of which 933 Mb were localized in 29 autosomes (GGA1-28 and 32) and sex chromosomes (Z and W), and the remaining residues were unlocalized [10,19]. In the current study, blast analysis with the chicken genome showed that most contigs and unigenes mapped to GGA1-5 and the Z chromosome, corresponding to the fact that these macrochromosomes contain a major part of the chicken genome [19]. It is logical that no contig or unigene was found for the known GGA32, because only 1,028 bp of sequences are available in this chromosome (ftp://ftp. ncbi.nih.gov/genomes/Gallus_gallus/).
Even though the chicken transcriptome of various tissues has been investigated by cDNA microarray or gene chip [7,11,12], this technology fails to detect sequence variation and to recognize new genes or transcripts. With emergence of second generation sequencing, RNA-seq is a more powerful approach for transcriptome analysis [13]; however, such investigations in chickens and other birds are very limited. Recently, a total of 856,675 Roche 454 reads were obtained in crows and further expression analysis   indicated a general pattern of ineffective dosage compensation in that species [20,21]. The results of the current study as well as 148,197 Illumina reads are useful resource for further investigation on chicken transcriptome.

Differently Expressed Genes for APEC
Comparison of gene expression among the different treatment groups in the current experiment is helpful for identification of candidate genes underlying response and resistance to APEC infection in chicken. Many fewer differentially expressed unigenes were found in the MD Vs. SV contrast (2,320) compared with that of NC Vs. MD (6,844) and NC Vs. SV (7,764), revealing that there is greater difference between infected and non-infected states than between mild and severe infections. Among the differently expressed genes, there are 531, 115 and 134 unigenes that specifically express in NC, MD, and SV groups, respectively. Our previous study with cDNA microarray technology revealed 1,101 significantly expressed genes between SV and NC [22]. Because RNA-Seq can recognize new unigenes or unique isoforms present in chicken transcriptome, it should be more powerful in expression analysis. Further, KEGG prediction and GO analysis showed that these differently expressed genes were involved in a couple of major pathways. It is notable that 38 and 22 unigenes are contained in the two predicted pathways of TCR signaling pathway and microbial metabolism in diverse environments, respectively, suggesting that these unigenes are candidate genes for APEC infection in chicken.
The TCR signaling in response to antigen recognition can induce integrin to facilitate T-cell activation and thus the TCR signaling pathway has a central role in the adaptive immune response [23,24]. Within the TCR signaling pathway, CD148, CD45 and LCK are crucial genes that regulate the signal transduction throughout the entire network [25,26]. It is notable that CD45 and CD148 are specifically required for L. pneumophila phagocytosis and effector translocation [27], suggesting that they are involved in the interaction between host and bacterium. It has been proposed that CD45 is alone sufficient for TCR triggering [28] and, moreover, CD45 deficiency results in a severe combined immunodeficiency phenotype [26]. In the current study, both CD45 and LCK were significantly downregulated in SV compared to NC and MD (P,0.001), which supports the critical function of CD45 in the immune response to APEC in chicken. The expression of CD148 was upregulated in SV compared with NC (P,0.001) and MD (P,0.01) in this study. The true function of CD148 in the immune system remains unclear and there are contrary opinions regarding whether CD148 is dispensable for normal growth and development [26]. The identified unigenes in the TCR signaling pathway are candidate genes for APEC infection in chicken, and warrant additional functional confirmation by further investigation.
The cytokine interleukin-1 beta (IL1B) is known as a ''master'' cytokines and plays a great role on the process of anti-infectious protection. In the pathogenesis of APEC, we found that IL1B expression was significantly up-regulated in both SV and MD compared to NC, and moreover SV showed higher IL1B level than MD. It indicated that IL1B was a key cytokine responsible for the inflammation process caused by APEC infection. A similar result was also found in cow mastitis. Investigation on the global transcription of Mild priming primary mammary epithelial cells (MEC) of cow for 12 h with lipopolysaccharide (LPS) (100 ng/ml) before stimulated with heat inactivated E. coli bacteria showed that, the expression of IL1B was significantly down-regulated to inhibit inflammation [29]. Moreover, it was predicted that IL1B could directly regulate 44 differently expressed genes in the process of LPS priming-mediated modulation of the E. coli-elicited response [29].
Toll-like receptors are important factors for immune response. In this study, TLR4 was significantly up-regulated in MD (2.3 folds) and SV (2.9 folds) compared to NC, which was consensus to the study of TLR4 in human. After co-stimulation the T24 human bladder carcinoma cell with E. coli and lactobacilli, TLR4 were significantly increased in both mRNA and protein level, and inhibition of TLR4 blocked the lactobacilli potentiation of NF-kappaB [30]. TLR2 were also up regulated in MD (3.0 folds) and SV (3.3 folds) compared to NC. It was reported the TLR2 subfamily were involved in the avian response to C. perfringens challenge [31].
Compared to NC, the L-phenylalanine oxidase IL4I1 upregulated its expression in SV and MD at 20 and 39 folds respectively. Meanwhile, IL4I1 expression in SV was about 2 fold higher than MD. It indicated that IL4I1 facilitates the pathogens of APEC. In human, it was proved that IL4I1 improve tumor growth by inhibiting the CD8(+) antitumor T-cell response [32].

Ethics Statement
The APEC challenge experiment and sample collection were approved by the Iowa State University Institutional Animal Care and Use Committee (# 11-07-6460-G).

APEC Challenge Experiment and Sample Preparation
A total of 240 non-vaccinated commercial male broilers at 4 weeks age were challenged with 0.1 ml APEC O1 (10E8 colony forming units) by the intra-air sac route into the left thoracic airsac. Another 120 non-vaccinated males were non-challenged Table 5. Comparison of gene expression that differs significantly among NC, MD and SV for CD148, CD45 and LCK. The expression value is based on the obtained reads within the specific gene in the three groups of [non-challenged (NC), challenged-mild pathology (MD), and challenged-severe pathology (SV)]. 2 Refers to normalized fold changes. 3 The q-value was calculated according to Benjamini et al. (1995) [38]. doi:10.1371/journal.pone.0041645.t005 (NC) but treated with 0.1 ml phosphate buffered saline (PBS). All detailed information on the APEC O1 strain and challenge design and procedures was previously described [22]. Birds were euthanized and necropsy was performed at one day post challenge. Based upon pathological finds a summarized lesion score, ranging from 0 to 7, was determined for each bird. Birds with lesion scores of 0-2 were regarded as mild pathology (resistant phenotype), and those scoring 4-7 as severe pathology (susceptible phenotype). Subsequently, spleens from three groups [non-challenged NC, challenged-mild pathology (MD) and challenged-severe pathology (SV)] were subjected to Illumina deep sequencing to investigate the dynamic responses of chicken transcriptome. The recorded lesions (mean 6 standard deviation) for NC, MD, and SV groups were 0.0060.00, 0.5060.58, and 5.2561.26, respectively.

RNA Isolation, cDNA Library Construction and Illumina Deep Sequencing
For each group, three spleens were randomly chosen and shipped on RNAlater (Applied Biosystems) to Shanghai Majorbio Bio-pharm Biotechnology Co., Ltd. (Shanghai, China), where total RNA was isolated from each spleen by trizol (Invitrogen, CA, USA). Then, samples of three individuals were pooled within each group in equal amounts to generate one mixed sample per group by RNA pooling. These three mixed RNA samples were subsequently used in cDNA library construction and Illumina deep sequencing.
Three cDNA libraries were prepared using the TruseqTM RNA sample prep Kit (Illumina, San Diego, CA USA) following the manufacturer's instructions. First, magnetic beads containing poly-T molecules were used to purify mRNA from 10 mg of total RNA. Second, the three samples were chemically fragmented and reverse transcribed into cDNA. Third, end repair and A-base tailing was performed and then Illumina adapters were ligated to the cDNA fragments.
After a gel size fractionation step to extract fragments of 300 bp, 29 mL of the purified samples were amplified by 15-cycle PCR. Amplified products were validated and quantified using an Agilent 2100 bioanalyzer and the DNA 1000 Nano Chip Kit (Agilent, Technologies, Santa Clara, CA, USA). Libraries were loaded onto the channels of the flow cell at 8 pM concentration. Sequencing was performed on the Genome Analyzer IIx (Illumina, San Diego, CA, USA) by running 81+7+81 cycles using Illumina's cBot Paired End Cluster Plate Kit and 36 Cycle Sequencing Kit according to the manufacturer's instructions.

Bioinformatic Analysis
Reads trimming and assembly. For each of the sequencing reads, low quality bases (Sanger base quality ,20) of 39 ends were first trimmed using in-house perl scripts and then the sequencing adapters were trimmed using fastx_toolkit software (http:// hannonlab.cshl.edu/fastx_toolkit/). All Illumina reads of three samples (NC, MD and SV) were assembled by Trinity software using default parameters [33].

Transcriptome Annotation
The isogenes were compared with the protein nonredundant database using BlastX with E values less than 1.0610 25 (E values less than 1.0610 25 were considered as significant) [34]. Gene ontology (GO) terms were extracted from the best hits obtained from the BlastX against the nr database (E value #1.0610 26 ) using blast2go, and then sorted for the GO categories using inhouse perl scripts [35]. Metabolic pathway analysis was performed using the Kyoto Encyclopedia of Genes and Genomes (KEGG).

Expression Analysis
Reads from each samples were mapped to isogenes using bowtie software respectively using corresponding parameters for pair ends reads and single reads [36]. The expression of each gene was calculated using the numbers of reads mapping to its isogenes. For calculating gene expression correctly, reads which aligned to different isogenes of the same gene were only counted once. Reads which have best alignments to more than one gene were not counted. Both reads from a read pair were removed if one read aligned to one gene and the other aligned to another gene. The differentially expressed genes were analyzed using the R package DEGseq and the Benjamini q-value was calculated [37,38]. Gene Ontology and KEGG pathway enrichment analysis was performed by using the GOseq package, which took gene length bias into account, using a 0.05 cutoff for the false discovery rate [39].
The according to et al. (1995).