De Novo Transcriptomic Analysis of Peripheral Blood Lymphocytes from the Chinese Goose: Gene Discovery and Immune System Pathway Description

Background The Chinese goose is one of the most economically important poultry birds and is a natural reservoir for many avian viruses. However, the nature and regulation of the innate and adaptive immune systems of this waterfowl species are not completely understood due to limited information on the goose genome. Recently, transcriptome sequencing technology was applied in the genomic studies focused on novel gene discovery. Thus, this study described the transcriptome of the goose peripheral blood lymphocytes to identify immunity relevant genes. Principal Findings De novo transcriptome assembly of the goose peripheral blood lymphocytes was sequenced by Illumina-Solexa technology. In total, 211,198 unigenes were assembled from the 69.36 million cleaned reads. The average length, N50 size and the maximum length of the assembled unigenes were 687 bp, 1,298 bp and 18,992 bp, respectively. A total of 36,854 unigenes showed similarity by BLAST search against the NCBI non-redundant (Nr) protein database. For functional classification, 163,161 unigenes were comprised of three Gene Ontology (Go) categories and 67 subcategories. A total of 15,334 unigenes were annotated into 25 eukaryotic orthologous groups (KOGs) categories. Kyoto Encyclopedia of Genes and Genomes (KEGG) database annotated 39,585 unigenes into six biological functional groups and 308 pathways. Among the 2,757 unigenes that participated in the 15 immune system KEGG pathways, 125 of the most important immune relevant genes were summarized and analyzed by STRING analysis to identify gene interactions and relationships. Moreover, 10 genes were confirmed by PCR and analyzed. Of these 125 unigenes, 109 unigenes, approximately 87%, were not previously identified in the goose. Conclusion This de novo transcriptome analysis could provide important Chinese goose sequence information and highlights the value of new gene discovery, pathways investigation and immune system gene identification, and comparison with other avian species as useful tools to understand the goose immune system.


Introduction
Genomic evaluations are the source of important evidence for determining the immune system characteristics that differ between the goose, chicken, duck, and other birds. After genome sequencing of the chicken and duck, both became the first-class avian species, which allows immunologic comparison with other animals [1,2]. Subsequently, the avian species have been further elucidated in the evolutionary process [2], with the recent availability in fish and mammal genomes [3]. The goose genome requires study because it is one of the most important waterfowl species and is also a vital component in the fast-growing poultry economy of China, which has become the largest goose production country in the world [2].
Interest in the goose immune system comes not only from its importance as a food animal species, but also for its role as a natural reservoir of many avian viruses, such as influenza virus [4]. Therefore, it is essential to illustrate the nature and regulation of the innate and adaptive immune systems in the goose [5]. However, except chicken and other poultry such as goose, relatively little is known about their immune systems at the molecular level [6]. Thus, the discovery of important immune genes and functional studies can help elucidate immunological responses and the natural or inherited disease resistance ability. To date, only a few research studies have examined the goose species and their relevant genes; some of the genes, including, CD8α [6], CD4 [4], interleukin(IL)-17A [7], IL-6 [8], Toll like receptor (TLR) 5 [9], MHCI/II [5,10], interferon(IFN)-γ [11], IFN-α [12] and IL-2 [13] were cloned. Despite these studies, many genes related to the goose immune system remain unknown.
To date, the goose transcriptome profiling studies on the identifying the genes responsible for follicle development and reproductive biology in the laying and broodiness [2,14], and comprehensive analyzing the transcriptome of geese to understand the geese development and metabolism [15] has been reported. These research studies were quite different from identifying genes related to the goose immune system. To study the immune system, we are the first to present here the de novo transcriptome of Chinese goose (Anser cygnoides) peripheral blood lymphocytes (PBLs) using Illumina-Solexa sequencing technology, which is a powerful tool for transcriptome analysis [16]. All of the obtained transcriptome and unigenes were annotated largely from the duck and the chicken, because these two species are closely related to the goose. This study produced the first characterized transcriptome of this waterfowl and provided a genomic picture related to previously undiscovered immune genes. Through the functional annotation of the assembled sequences and identification of the sequenced unigenes, our study identified important novel immune genes related to antigen processing and presentation, toll like receptor signaling pathways, complement cascades, natural killer cell-mediated cytotoxicity response, and inflammatory response by chemokines and cytokine-mediated signaling pathways in the goose.

Ethics Statement
This study was conducted according to the management regulations of experimental animals in Beijing and was approved by the Animal Care Committee of China Agricultural University, Beijing, People's Republic of China.

Geese Rearing and Blood Sample Preparation
The five Chinese geese used in this study were raised in the animal isolator house, college of veterinary medicine, China agricultural university, Beijing. These birds were kept under same environmental conditions and provide ad libitum water and locally available commercial feed. The blood samples from geese were collected by sterilizing the wing and femoral vein with the surgical cotton containing 70% alcohol. From each bird, 5-10 ml blood was taken and properly mixed with equal volume of EDTA as an anticoagulant, containing tube (1:1) (TBDscience, China).

Peripheral Blood Lymphocytes Separation and RNA Extraction
For the separation of PBLs, we used a sterile pipette to take 5 ml blood, and added to an equal volume of PBS and mixed well. Then slowly added the blood/PBS mixture on the top of the 10 ml Ficoll-hypaque solution containing tube and centrifuged at 2000 rpm for 20 min. After centrifugation, the tube contained 4 layers (first layer: plasma layer, second layer: white lymphocyte layer, third layer: transparent Ficoll-hypaque layer and fourth layer: erythrocytes). At this point, we removed the upper layer that contains the plasma and most platelets, and the second layer containing lymphocytes was carefully aspirated with a sterile pipette into a new centrifuge tube, which contained 10 ml of washing solution. The tube was thoroughly mixed and centrifuged at 2000 rpm for 20 min. Then removed supernatant, re-suspend cells in washing buffer, and the washing step was repeated twice to obtain the lymphocytes. The total RNA was extracted from the collected Chinese goose lymphocytes using TRIzol (Invitrogen, USA) according to the manufacturer's protocol. Total RNA was treated with RNase-free DNase I (Promega, USA) for 30 min and then incubated at 37°C to remove residual DNA. The RNA purification was carried out using the RNeasy Mini kit (Qiagen, USA) following the manufacturer's instructions.

cDNA Library Construction
Total RNA was prepared to construct the cDNA library and Illumina-Solexa was carried out. In brief, mRNA was isolated and purified from 10 μg of total RNA using oligo(dT) magnetic beads, and short fragments (200-700 bp) were obtained. These short RNA fragments were used as templates for first-strand cDNA synthesis by random hexamer-primers, and then the second-strand cDNA as synthesized by adding buffer, dNTPs, RNAse H and DNA polymerase I. After purification and paired-end (PE) repair, 5' and 3' ends of the cDNA fragments were ligated with sequencing adapters and were amplified by polymerase chain reaction (PCR) to generate the templates. The cDNA templates were further enriched by PCR amplification to generate the cDNA library. The cDNA library was sequenced by an Illumina HiSeq 2000 sequencing platform and the raw reads were generated using the Solexa pipeline according to the manufacturer's instructions.

De Novo Transcriptome Assembly
The raw reads were cleaned by removing adapter sequences, non-coding RNA (such as rRNA, tRNA and miRNA) and low-quality sequences (reads with uncertain bases 'N'). To insure the quality control of raw read data, we used two steps; the first was the sliding window method to remove low quality segments (Threshold quality 20, window size 5 bp, and threshold length 35 bp), and the second was the removal of reads that contained N as a part of the sequence (Threshold length 35bp). De novo transcriptome assembly was performed by the Trinity program [17] (Version r2013/08/14), and the longest transcription sequences were taken and defined as unigenes. To measure RPKM (reads per kilobase of exon model per million mapped reads), the number of sequenced reads that aligned to a gene must be normalized to remove the biases in the aligned sequences [18]. The RPKM was calculated for all assembled unigenes in every sample by single-end mapping using software bowtie2 (version 2.1.0). The unique feature of this tool is that it does not rely on the existence of a reference genome and therefore it is mostly useful for quantification with de novo transcriptome assemblies [19]. All unigenes were arranged in descending order from the first unigene. When the assembled length covered half of the total length of all unigenes, the length of the current unigenes was considered to be N50. And when the assembled length covers 90% of the total length, the length of the current unigene was considered to be N90. The sequence database generated in this study is available at the National Center for Biotechnology Information (NCBI) database Short Read Archive under the accession number SRX399106.

Annotation and Classification of the De Novo Transcriptome
All unigenes were searched for homologous genes using BLAST and annotation against the NCBI Nr database (non-redundant, http://www.ncbi.nlm.nih.gov/), using an E-value cut-off of 10 −5 . Unigene sequences were also aligned by BLASTx to various protein databases in the following order: Swiss-Prot and TrEMBL (http://www.ebi.ac.uk/uniprot/), Gene Ontology(GO) (http://www.geneontology.org/), Conserved Domain Database (CDD) (http://www.ncbi.nlm. nih.gov/cdd/), Pfam database (http://pfam.janelia.org/), eukaryotic Orthologous Groups (KOGs) (ftp://ftp.ncbi.nih.gov/pub/COG/KOG/), and Kyoto Encyclopedia of Genes and Genomes (KEGG) (http://www.genome.jp/kegg/). The unigenes were sorted to recover proteins with the most similarity to the given unigenes with putative functional annotations. When the aligned results were different from database sequences, then most privileged results of Nr were selected, followed by the Swiss-port, TrEMBL, CDD, PFAM, GO, KOG and KEGG databases. GO terms at the 2 nd level were used to perform the GO annotation of the unigenes under the biological, molecular functions and cellular components. The unigene sequences were also aligned to the KOG database to predict and classify possible functions, and the pathway assignments were performed according to the KEGG pathway database [20].

Identification and Annotation of the Goose Immunity Relevant Genes
The identification of the most important immunity relevant genes was assembled mainly according to search in our BLAST annotation results to the NCBI databases. A set of keywords representative of immune genes was used to predict immune-related genes based on the annotation results. Similarly, to find the most genes belonging to functions of the immune system, the GO term and KEGG pathway information were also used to identify the most important genes. The immune genes were detected not only as described by [21], but also according to the GO categories "response to stimulus" and "immune system process", and KEGG pathways "immune system" and "immune diseases," which had a direct relationship with the immunity genes. Then the finding of the presence and absence of immune relevant genes of goose in comparison with the duck, chicken, turkey and zebra finch were individually search by gene name and identifying regions of the goose immune related genes that were conserved in other species using BLAST annotation results of the NCBI database.

STRING Analysis
To find out the functional relationship among immune relevant genes, 125 transcripts were imported by selecting Gallus gallus as a model organism to STRING 9.1 http://string-db.org/ (a database of known and predicted protein interactions), which responds by displaying a network of nodes (proteins) connected by colored edges representing functional relationships. Interactions of these genes were identified, based on the evidence indicated in the edge map. The STRING database assembles data from genomic context, high-throughput experiments, conserved co-expression and data mining to integrate data from these sources into groups with direct physical and indirect functional associations. Complete knowledge of all direct and indirect interactions between proteins represents an important milestone towards a comprehensive description of cellular mechanisms and functions [22].

PCR Reaction
PCR was performed to confirm the expression of the recognized immune-related genes. Ten genes, BAFF, C1qA, C1qB, C1qC, SOCS1, SOCS3, TLR3, IL1RL1, C8G and CD74 were utilized to confirm the sequencing data. Genes were selected based on their functions in innate, adaptive immune system and signaling pathways. Primer sequences were designed according to sequences from our transcriptome data. Information of individual primer sequences of the selected 10 genes are listed in S1 Table. For PCR, a TAKARA LA Taq and Primer STAR HS DNA Polymerase kits (Takara Biotechnology (Dalian) Co.Ltd) were used according to the manufacturers' instructions. All reactions were run in triplicate.

Illumina-Solexa Sequencing and De Novo Assembly
De novo transcriptome sequence data were obtained using Illumina-Solexa deep sequencing to understand the genetic structural design of the goose lymphocyte transcriptome. A cDNA library of goose peripheral blood lymphocytes was sequenced using Illumina-Solexa sequencing technology. Through Solexa RNA paired end sequencing, we generated 91.93 million raw reads with 9.19 Giga base pairs (Gbp) as listed in Table 1. After removing adaptor sequences, ambiguous nucleotides and low-quality sequences, 69.36 million clean reads with a length of 93.65 bp remained. The GC content and average length was 47% and 93.65 bp, respectively. Assembly of all of the clean reads resulted in 211,198 unigenes that ranged from 201 bp to 18,992 bp with the average length of 687 bp and a total length size of 69.5 Mb. Next, all unigenes were sorted in descending order to find the N50 length of 1298 bp and the N90 length of 260 bp. All assembled 211,198 unigene transcripts were more than 200 bp in the length, which indicated that the unigenes were worthy and assured most of the transcriptome sequences. The length distribution of the assembled unigenes in the sequenced cDNA library was shown in The number of clean reads to each annotated unigene was calculated and then normalized by RPKM and bowtie2 software. Out of the 211,198 unigenes, 143,901 (approximately 68.1%) had a RPKM value less than one, 59,616 (28.2%) had a RPKM value between 1 and 10, 4051 (1.9%) had a RPKM value between 10 and 100, and 3630 (1.7%) had a RPKM value of greater than 100 (Fig. 1B).

BLAST Analysis
The quality control reads data were randomly selected and aligned with the NCBI nucleotide database. For each query, we selected the best one among all of the aligned sequences with an E-value of 10 −10 and that covered more than 80% similarity. All unigene sequences were aligned against the public database using the BLAST comparison to known sequence databases and functional annotation with similarity >30% and an E-value cut of 10 −5 . The alignment results of the public databases (Nr, Swissport, TrEMBL, Pfam, CDD, KOG and KEGG) were shown in Table 2. The aligned sequences were matched with the Nr 17.44%, TrEMBL 17.70% and SWISSPORT 15.14% databases, respectively. The 36,854 (17.44%) unigene BLAST hits in the Nr database were studied as the Nr database had the maximum annotated unigenes. However, the alignments with Nr database were in the top five hits species (Fig. 1C), and most of the aligned sequences were matched to the Gallus gallus 17,238 (55.59%). The E-value distribution showed that 13,890 (37.7%) unigenes were significantly homologous with a significance of 1E-5 to 1E-50, and 7123 unigenes were in the range from 1E-50 to 1E-100 (Fig. 1D). Similarly, the identity distribution of unigenes revealed that 26,985 (73.2%) unigenes were highly matched to 80-100% identity and 5436 (14.7%) unigenes were between 60-80% matched (Fig. 1E). In addition, the score distribution showed that 900 (7.8%) unigenes had scores less than 500, 84 (0.7%) unigenes had scored more than 3,000, and 6882 unigenes (60.1%), which was the greatest number, had a score between 500 and 1000 (Fig. 1F).

Functional Annotation and Pathway Classification
The rapid assembly of genome sequences is a major challenge to researchers attempting to extract the maximum functional and evolutionary information from the genome data. The NCBI eukaryotic orthologous groups (KOGs) include sequences from 7 eukaryotic genomes [24]. To further evaluate the transcriptome library, the accuracy of our annotation sequences in the KOG functional classification was examined. All unigenes were aligned to the KOG database for prediction and classification into different functional categories. There were 15,334 unigenes annotated into 25 KOG categories (Fig. 2). The largest group was signal transduction mechanism at 5787 (2.74%) followed by the general function prediction only at 2792 (1.32%), posttranslational modification, protein turnover, chaperones at 2168 (1.02%), and cytoskeleton at 1876 (0.88%). Nuclear structure 75 (0.03%) and cell motility 45 (0.02%) were the KOG categories with the least represented unigenes. Gene Ontology (GO) is an international standardized gene functional classification system that uses strict definitions to comprehensively describe the properties of genes and their products in any organism [25]. A total of 163,161 unigenes were assigned in GO annotations and were divided into three categories and 67 functional categories. The biological process category contained 72,844 unigenes, followed by the 60,500 in the cellular component category and 29,817 in molecular function category (Fig. 3). In the biological process category, 'cellular process' at 14,596 (8.95%), 'metabolic process' at 12,660 (7.75%) and biological regulation at 7317 (4.48%) were the most highly represented. In addition, unigenes were categorized into other 27 important biological processes, including 'regulation biological process' at 7075 (4.33%) unigenes, 'response to stimulus' at 4686 (2.8%) unigenes, 'signaling' at 3570 (2.2%) unigenes and 'immune system process' at 512 (0.3%) unigenes, which were mainly involved in resistance or the defense system in the geese. However, 17 GO functional groups were assigned to the cellular component category with 'cell' and 'cell part' with the same number of unigenes at 13,309 (8.15%), followed by 'organelle' at 8771 (5.37%) and 'membrane' at 6,841 (4.19%) as the most represented. Likewise, 20 GO functional groups were assigned into the molecular function category with 'binding' at 13,457 (8.2%) and 'catalytic activity' at 9,144 (5.60%) as the most highly represented.
To evaluate the biological system of the geese, we aligned the annotated sequences to the corresponding KEGG pathways and analyzed the relationship between unigene and pathways to further understand the biological functions and gene interactions [20]. Out of 211,198 unigenes, 39,585 (18.74% of total unigenes) KEGG pathways and annotated unigenes were categorized into six biological functional groups (Table 3). A total of 39,585 unigenes had significant matches in the database and were assigned to 308 KEGG pathways. Some of unigenes were mapped to more than one pathway. Among unigenes mapped into pathways, the highest Other unigenes were assigned into pathways of organismal processes (24.77%), metabolism (14.54%), cellular processes (13.35%), environmental information processing (11.7%) and genetic information processing (7.16%). The lymphocytes have pivotal functions in cell-mediate immunity: the innate (by NK cells) and adaptive (by T-cells) immune systems, as well as the antibody derived humoral (by B cells) immune response, are the main functions of lymphocyte [25]. Our present analysis show that large numbers of the unigenes lye in the immune relevant pathways. Out of the six groups listed above, human diseases and organismal systems contained the most unigenes. Within organismal systems, it is noteworthy that the immune systems group, comprised 2757 (6.96%) out of 39,585 unigenes were involved in 17 KEGG pathways ( Table 3). The immune system pathways were further categorized into 15 subcategories as shown in Fig. 4. Among subcategories, the chemokine signaling pathways comprised of highest number of unigenes at 348 followed by leukocyte transendothelial migration at 244 unigenes, T-cell receptor signaling pathway at 204 unigenes, Fc gamma R-mediate phagocytosis at 202 unigenes, Toll-like receptor signaling pathway at 179 unigenes, complement and coagulation cascades at 164 unigenes, natural killer cell mediated cytotoxicity at 137 unigenes, Bcell receptor signaling pathway at 134 unigenes, Fc epsilon RI signaling pathway at 115 unigenes, hematopoietic cell lineage at 114 unigenes, RIG-I-like receptor signaling pathway at 103  Digestive system 1372 9 Endocrine system 1277 6 Environmental adaptation 137 4 Excretory system 409 5 Immune system 2757 17 Nervous system 1816 8 immunity relevant genes were identified from the goose transcriptome and are summarized in Table 4.

Identification of Immunity Relevant Genes
Lymphocytes play an important role in the immune response and each type playing different roles: B lymphocytes are more associated with humoral immunity, while T-cells are the main players in cell-mediated immunity [26]. In this study, we were interested in identifying immune-related genes in the transcriptome of the goose PBLs. According to the literature and our sequence analyses, 125 immune-related genes were identified (  (Table 4). Of the 125 unigenes, only 16 (approximately 13%) unigenes from the known goose species (anser anser 9, and anser cygnoides 7) were identified using an NCBI database blast search. We also searched by gene name and by identifying   regions of the goose immune related genes that were conserved in other species using BLAST. We found that the majority of the immune genes could be identified in duck, chicken, turkey and zebra finch, but some genes appeared to be unique to geese (Shows in Fig. 5 and Table 4). In conclusion, 109 unigenes (approximately 87%) were new genes and were not previously identified in the goose. In order to confirm these immune-related genes in goose and analysis their relationships with those of other species, we cloned some genes from innate, adaptive immune system and signaling pathways (S1 Fig.).
The complement system, which helps antibodies and phagocytic cells eliminate infectious microbes and cellular debris, is one of the key components of the innate immune system. C1q, the ligand-binding unit of the C1 complex of complement, is the first subcomponent of the classical pathway and is a major link between innate and adaptive immunity. Many studies have examined mammalian C1q, but least information is known about avian C1qs, especially C1q in the goose. Here, the A, B and C chains of the goose C1q have been cloned (S1 Fig.). The mature peptides of the C1qA, B, C chains are 243, 244 and 242 amino acids in size, respectively (Figs. [6][7][8]. Goose C1qA, B, and C have similar molecular structures comprised of a signal peptide, one (C1qA and C1qC) or two (C1qB) collagen-like domains and a C1q domain. This structural arrangement is also conserved in other species, such as reptiles and mammals. When the species are compared, the goose C1qA, B and C chains all have the highest identities to duck C1qs (93.38% for C1qA, 95.9% for C1qB and 92.15% for C1qC), followed by chicken and zebra finch (bird) C1qs. In the blood, C1qA, B and C form a heterotrimer that is stabilized by interchain disulfide bonds. The sites for the formation of the disulfide bonds are Cys26 in Transcriptomic Analysis of Goose Peripheral Blood Lymphocytes goose C1qA, Cys22 in goose C1qB and Cys32 in goose C1qC and all these sites are conserved from birds to humans.
Most of the sites for glycosylation, hydroxylysine hydroxylation and hydroxyproline hydroxylation found in the collagen-like regions of human C1qs [27] can also be found in the goose C1qs. The variant sites in the goose C1qA chain are His85, Arg100 and Ala146. The variant sites in the goose C1qB chain are Asp44, Asn47, Arg53, Arg83, Gln92, Met95, and Gln101. In the formation of complement component C1, the collagen-like regions of C1qs are supposed to be recognized by the modular proteases C1r and C1s. A motif shared between C1qA, B and C, Hyp-Gly-Lys-X-Gly-Pro/Tyr/Asn (where Hyp is hydroxyproline), mediates binding to C1r and C1s [28]. Similar regions are also observed in the goose C1q chains, such as Hyp79-Phe84 in C1qA, Hyp77-Pro82 in C1qB and Hyp84-Pro89 in C1qC. Among them, the C1r-C1s binding regions are most conserved in C1qB, followed by C1qC. Most of the variations are found in C1qAs, especially among the avian species.
As a versatile pattern recognition molecule, the heterotrimeric globular domain (gC1q) of C1q is thought to be capable of engaging a broad range of ligands, including aggregated IgG and IgM, C-reactive protein (CRP), human T cell lymphotropic virus-I (HTLV-I) gp21 peptide [29]. Val183, Arg184, Arg185 in human C1qA, Arg126, Arg139 and Arg154 in human C1qB and Arg184 in human C1qC are the key sites for IgG interaction. These sites are strictly conserved in the goose C1qB and in other C1qBs. These sites are not seen in the C1qAs and C1qCs of other species, even in a mammal (mouse). The Lys200 and Trp147 sites of human C1qA that interact with CRP are not found in other species, except for in the mouse. Tyr198 of the human C1qB is relatively well conserved among different species because the same or similar amino acids are found in other C1qBs. In contrast, His129, Pro131, Ala133 and Pro134 of human C1qC, which are important in binding to the HTLVI gp21 peptide, are not observed in other C1qCs. However, the calcium ion binding sites of C1qs are strictly conserved from geese to humans. These sites are Gln195 in goose C1qA, and Asp192, Tyr193 and Gln199 in goose C1qB. The different conservation of binding sites may reflect the presence of different ligands in different species.
Another important component of innate immunity is the Toll like receptor family. Here, we identified goose TLR3 using our EST library. The extracellular region of goose TLR3 has 22 LRR regions, 1 LRRNT region and 1 LRRCT region (S2 Fig.). The identity of goose TLR3 to other TLRs ranges from 95.88% to 59.45%. Similar to human TLR3, the N-glycosylation sites of goose TLR3 are related to the specific interaction surface structure [30] and are Asn25, Asn43, Asn97, Asn168, Asn219, Asn224, Asn247, Asn360, Asn469, Asn598 and Asn624. Some variant sites are also observed in goose TLR3, such as Asp30, Lys237 and Asp263, which may be specie specific. The conserved disulfide bonds are formed by Cys68-Cys95 and Cys611-Cys639 in goose TLR3. The functional sites, such as Asn219, are important for the response to ds-RNA. Asn168 is related to TLR3 expression levels, and His501 and Asn503 are required for RNA binding and the activation of NF-kappa-B. All of these functional sites are conserved in the goose TLR3 (S2 Fig.).
B-cell activating factor (BAFF) is critical for the stimulation and maturation of B-cells in the adaptive immune system and is also found in our goose cDNA library. The identities among avian species are particularly high and range from 91.99% to 99.65% (S3 Fig.). Similar to the other BAFFs, goose BAFF is mainly composed of a Tumor Necrosis Factor (TNF) domain, which is relatively well conserved among various different species. In the TNF domain, the trimer interface sites are hydrophobic residues such as Gln151, Phe197, Tyr199, Tyr249, Ala254, Tyr281 and Val285 and are conserved from geese to humans. The TNFR 50s-loop binding sites (Leu172, Ser174, Gly212, Lys219 and Ser228 in goose BAFF) are also found in all species without any modification. The conserved long DE loop, known as the "flap", is unique to BAFF in the TNF family and is located between sites 219 to 228 of goose BAFF. The furin cleavage sites of goose BAFF are "Arg-Gly-Arg-Arg". Compared with the mammalian BAFFs, the BAFFS are more conserved among the avian species. The Cys235 and Cys248 of goose BAFF are responsible for the formation of conserved intra-chain disulfide bond. The N-Glycosylation site (Asn245) in the TNF domain is conserved among different species, and the other site (Asn102) only found in mouse and human (S3 Fig.). These results indicate that the functional sites of BAFF have changed very little during evolution.
Among the signal transduction pathways, the JAK-STAT pathway is mainly expressed in white blood cells and involved in the regulation of the immune system. Suppressor of cytokine signaling (SOCS) proteins 1 and 3 are inhibitors of JAKs and implicated in inflammation, and they were cloned in this study. Similar to the SOCS proteins from other species, goose SOCS1 is also composed of an SH2 domain and a SOCS box. An extended SH2 subdomain (ESS) is important for JAK phosphotyrosine binding and is located at the beginning of the SH2 domain. The KIR domain is involved in signal and kinase inhibition and is located between Phe51 and Phe60 in goose SOCS1 (Fig. 9). The Elongin BC complex binding domain is known as a BCbox, has the motif (A/P/S/T) -L-x (3) -C-x (3)-(A/I/L/V), and is located between 169 and 179 sites in the goose SOCS1. Compared with the conserved KIR domain and B-C box, the Suppressor of cytokine signaling 1 sequence is 7 or 8 poly-serines and can only be found in mammalian species. The sites for JH1 binding and JAK signal transduction suppression, such as Phe51, Phe54, Asp59, Tyr60, Ile63 and Leu70, are conserved between geese and humans. Arg100, which is important to suppress LIF and IL-6 signal transduction, is highly conserved. According to the alignment, SOCS1 is relatively similar among different species and the identities are all over 60% (Fig. 9). However, SOCS3 is highly conserved; the similarity between goose SOCS3 and duck SOCS3 can reach up to 100% and the smallest similarity is 88.04% (Fig. 10). Similar to SOCS1, the SH2 domain, SOCS box, KIR domain and ESS domain are conserved in goose SOCS3. The sites important for EPO/LIF-induced signaling suppression are Leu22, Phe25, Glu30, Tyr31, Val34, Leu41, Gln45 and Arg71 in goose SOCS3. The Leu58, Leu93 and Arg94 sites of goose SOCS3 are important for the binding to Tyr429/Tyr431 phosphorylated EPOR. These functional amino acids are all conserved from birds to mammals.
The results listed above indicated that the genes involved in adaptive and innate immunity play central roles in goose immunity, while the toll-like receptor, chemokines, complement system and Jak-STAT signaling pathway act as the functional bridge between the innate and adaptive immune responses.

STRING Analysis
Based on the identified 125 immune-related genes, STRING 9.1 (http://string-db.org/) was used to analyze the interactions and relationships among these genes. From 125 goose genes, 105 were matched well to the known immune genes of Gallus gallus in the STRING database (Fig. 11). In particular, the genes, whose sequence information has been confirmed by our PCR reactions, play important roles in the immune interaction nets. Like SOCS3, it can interact with IL receptors of Cytokine-cytokine receptor interaction pathway, STATs of Jak-STAT signaling pathway, PTPN11 of Natural killer cell mediated cytotoxicity pathway and IRF1 of Transcription factors for immune response pathway. By the interaction of SOCS3, various immune pathways are connected as a whole one and SOCS1, SOCS3 and PTPN11 were very strongly linked with many signaling pathways, which indicates that all of these genes interacted with each other and comprise a classical network of different genes. C1qs are also the key knots for complement pathway by interact with C1s and C1r and then in the downstream, C3 was activated. Myd88 is the centre modular of Toll-like receptor signaling pathway. It can interact with various TLRs, including TLR3, leading to NF-kappa-B activation of immune defense. No connection is observed in BAFF/TNFFSF13B, it may due to the absence of BAFF receptors in this analysis. These results show us these immune genes of goose have similar functions and reaction modes as in other species (like chicken). These immune molecules do not work  independently and they can function in various pathways to link the immune system as a whole one for the immune defense.

Discussion
As a source of meat, eggs and feathers, the goose is one of the most important economical waterfowl around the world. As geese serve as one of the principal natural reservoirs for influenza A viruses, its study is of special interest in medicine and public health problem [31,32]. Due to lack of genomic information, it is important for us to understand the immune repertoires of geese. After the high-throughput RNA sequencing of the transcriptome, it became one of the most convenient method to obtain the overall gene information. Here, we present the study about the immune-related genes and pathways in goose transcriptome. The results describe the genetic architecture of the goose PBLs transcriptome and further explore their relevant immune relevant genes.
We generated 211,198 unigenes for 69.5 MB total length of the goose PBLs transcriptome. The overall GC content of the transcriptome was calculated to be 47%, which closely resembles the percent GC content that was reported in a previous study [14]. The size distribution indicated that the length of the 91,069 unigenes was more than 1500 bp, which is much higher than that reported by other researchers [2,15] for the goose transcriptome. We also noticed that the mean length of unigenes was longer than that reported in other research [2,15]. We compared our unigenes against the NCBI Nr protein database, allowing further functional annotation and classification using GO, KOG and KEGG. This functional annotation provides expected information on biological function and biosynthesis pathways for assembled unigenes. We also found that the Nr database had maximum annotated unigenes and most of the aligned sequences matched with the chicken (17,238, 55.59%) sequence, suggesting that more genetic similarities exists between the goose (a waterfowl bird) and chickens (a domestic bird). Among the Nr blast hits, only 128 genes were matched to the goose itself, illustrating the limited number of the goose related genes currently available in the NCBI database. We also found that 26,985 (73.2%) of the unigenes identified in this study have high similarity to the NCBI blast result (Fig. 1E) and are covered by 80-100% in the length, which supports the validity of our transcriptome data.
The GO annotation shows that a large number of the unigenes are in the biological processes (72,844) category. As expected, many genes which are involved in the defense system of geese, including response to stimulus (7,075 unigenes), signaling (3,570), and immune system process (512), were found in this study. We identified a large number of unigenes that are involved in the immune system (2,757) in the KEGG pathway's organismal systems category to further elucidate the biological functions of these genes in geese. In our goose PBLs transcriptome data, we found that a large number of the unigenes were in the immune relevant pathways and were involved in well-recognized immune pathways, such as chemokine signaling pathway 348, B-cell receptor, T-cell receptor 134, Toll-like receptor signaling pathway 204, leukocyte transendothelial migration 244, antigen processing and presentation 74 (Fig. 4). The availability of these data would provide an abundant resource for understanding the pathways of the goose immune system. Additionally, among the 2,757 unigenes that participated in 15 immune system KEGG pathways, we focused on the 125 most important novel genes of the immune system of goose that were identified. We observed that 109 unigenes of the 125 unigenes described here, are novel genes in the goose that were first identified in this study.
Our transcriptome sequence revealed the presence of immune relevant genes in goose. There were no evidence existed, in that majority of the detected immune related genes in goose before performing this PBLs transcriptome analysis. Here, we are the first to identify several important genes of the immune pathways and to compare them with genes in the duck, chicken, turkey and zebra finch. The genomic information has been made available in the NCBI database to facilitate the detection of novel genes in the goose (Fig. 5/ Table 4). Except for the duck, the other avian species are different in both habitat and avian family from the goose. Using the NCBI database, we found that 104 of the 125 unigenes in the goose PBLs transcriptome were shared with the duck (a waterfowl like the goose) and 21 genes (including TLR13, CC26, IL8, IFR3, IFR7) were not shared. In the NCBI database, 8 genes (including CCL14, CCL26, IFR3, C4-1) were not shared in the chicken, 43 genes (including TAPBP, HSP70, HSP90A, TLR3, TLR13, LY96, C8G, BAFF) were not found in the turkey and 30 genes (including TAPBP, TLR13, TLR15, CCL14, CCL19, IL9R, IL23R, C2, MBL) were not found in zebra finch. A comparison of our sequencing results with the information available in the NCBI databases for four other species provides more genomic and transcriptomic information and can contribute to the study of the avian transcriptome.
Our ongoing studies on these genes may extend the list of variations in the immune genes of goose with other species. Here, we describe the 10 most likely putative genes of the innate, adaptive immune system and signaling pathways identified in the goose PBLs transcriptome, which were confirmed by PCR and further by comparison of their sequences to that of other species.
In the complement system, C1qA, C1qB and C1qC were cloned and further analyzed by comparison. According to the comparative analysis, goose C1qA, C1qB and C1qC all show the highest identity to the corresponding genes in the duck. The goose is most closely evolutionarily related to the duck and that was further confirmed by an evolutionary tree (S4 Fig.). In the tree, the evolutionary distance between different species are arranged consistent with time of emergence of each species. As C1qA, C1qB and C1qC belong to the same C1q family, they have a common evolutionary origin in the evolutionary tree. The molecule architecture and functional sites in the collagen-like domains of these three molecules are conserved among birds, reptiles and mammals. However, the receptor binding sites in the C1q domains varied considerably in comparison to other species. We also cloned complement component 8, gamma polypeptide (C8G), which is a constituent of the membrane attack complex. The C8G alignment results show conserved characteristics among the different species (S5 Fig.). With the exception of some functional studies [33], little is known about the avian toll-like receptor pathway, especially TLR3. Here, the goose TLR3 is shown to have conserved domains and sites with the human TLR3, indicating its evolutionary conservation. Overall, 5 genes of the innate immune system, including C1qA, C1qB, C1qC, C8G and TLR3, have been identified in the goose.
Studies have shown that goose BAFF is a conserved molecule in the adaptive immune system and that it is able to promote bursal B cell survival and proliferation in the goose [34]. Here, we further confirm the BAFF sequence in our goose cDNA library. BAFF is conserved among the avian species, but diverges from the mammalian gene. CD74, the MHCII invariant chain, play critical roles in MHC class II antigen processing by stabilizing peptide-free MHCII heterodimers and was also cloned in this study [10]. The sequence of the goose CD74 clone here is similar to the one submitted to NCBI, which indicates its functional and conservational importance. IL1RL1/ST2, an IL-1 family cytokine that can activate NF-κB and MAP kinases and drive production of TH2-associated cytokines from T helper type 2 cells [35], play important roles in both the innate and adaptive immune systems. The IL1RL1 gene of goose was also identified in our study. IL1RL1 has 3 Ig-like C2 domains and one TIR domain. Of the 5 mammalian disulfide bonds, 4 are found to be conserved at the Cys32-Cys83, Cys109-Cys145, Cys128-Cys175 and Cys228-Cys295 sites in the goose IL1RL1 (S6 Fig.). According to the alignment, the IL1RL1s in avian species share more sequence similarity with those in reptiles than with those in mammals. For instance, the positions of the N-Glycosylation sites are not well conserved between the goose and human IL1RL1s.
JAK-STAT pathway plays a central role in lots of biological processes of both innate and adaptive immunity. SOCS family proteins are part of a classical negative feedback system that regulates cytokine signal transduction. SOCS3 and SOCS1 are negative regulators of cytokines that signal through the JAK/STAT pathway. Here, SOCS1 and SOCS3 genes have been cloned from the goose and shown to have conserved structural and functional sites. The evolutionary tree indicates that SOCS1 and SOCS3 share a common origin (S7 Fig.). Over long evolutionary distances, the avian SOCS1s diverged after the split between the reptile, avian and mammalian lineage. In the evolutionary tree, the SOCS3s of different species are clustered together, indicating that little change occurred over the course of evolution.
The 125 immune related genes were further analyzed by gene interaction networks (STRING analysis). The results show us a similar immune response network as in other avian species, and it also confirms the potential functions of these genes in the goose.

Conclusion
The data described here provide the first PBLs transcriptome profile of the goose immune system. Among 211,198 unigenes, 2,757 unigenes of immune system, 17 immune related pathways and their unigenes, 125 important immune genes have been found in the goose EST and compared between the goose, duck, chicken, turkey and zebra finch. The 10 most important immune genes of the goose have been cloned and analyzed. This information will give us an overall landscape of the goose immune system and assist us in understanding the goose immune system. We believe that the availability of this annotated transcriptome will facilitate the isolation and characterization of the functional genes involved in different immune system pathways, as well as validate the molecular genetic approach to disclose the immune system of goose.  Table. Genes and specific primers used for PCR. Ten genes were selected based on their functions in innate, adaptive immune system and signaling pathways. Primer sequences were designed according to sequences from our transcriptome data of goose PBLs. (PDF)