Sequence , Expression and Phylogenetic Analysis of Immune Response Genes Related to Mastitis in Buffaloes

Increased expression of several acute phase cytokines, such as IL1, IL8 and TNF-α, have been positively correlated with the most severe clinical symptoms often associated with coliform or endotoxin-induced mastitis. Very little information is available on buffalo transcriptome. No information is available on mastitis related genes in buffaloes thus, the main aim of the present study was to analyze the buffalo transcriptome for extracting sequence of mastitis related genes (IL-1B, IL6, IL8 and IL-12B), the expression of these genes across various tissues using RNA-Seq, analyse the functional pathways and confer the phylogenetic relationship of these Interleukin genes with other species. IL1B revealed high expression in lungs while IL8 in mammary gland and IL12B in kidney respectively. The phylogenetic analysis revealed a clear homology between buffalo and cattle ILs. The results were confirmed by constructing gene and species tree. In species tree, Bos taurus was nearest to Bubalus bubalis followed by Ovis aries thereby grouping the whole Bovidae family together. The gene tree constructed with the help of Maximum likelihood methods clearly clustered IL1B and IL8 in one clade. This may be attributed to structural and functional relationship of IL8 induced after exposure to a variety of inflammatory stimuli including bacteria, oxidative stress, LPS, TNF and IL1B. Phylogenetic analysis of vertebrate IL genes provided insights into their patterns and process of gene evolution.


INTRODUCTION
Buffaloes are the major part of subsidiary economy of rural people in India.However, buffaloes like cattle are prone to the intra-mammary infections, which have a lot of economic impact on the farmer.Bovine mastitis is an important and a persistent factor causing economic losses; Mammary gland inflammation affects both the quantity and quality of milk yield.During mastitis, the milk changes in composition which impair coagulation, cheese yield and composition; leading to poor quality cheese.The elevated SCC was associated with the production of a cheese with high moisture content (Munro et al., 1984;Grandison and Ford, 1986;Politis and Ng-Kwai-Hang, 1988a, b, c;Barbano et al., 1991) in cattle.The effects of mastitis on milk yield and milk quality have also been reported for buffalo milk (Singh and Ludri, 2001;Cerón-Muñoz et al., 2002;Pasquini et al., 2003;Tripaldi et al., 2003;Singh and Bansal, 2004).Somatic Cell Count (SCC) is a measure that is widely used to assess mammary gland health (Smith, 2002).Milk SCC includes all types of cells: polymorphonuclear leukocytes (PMN), macrophages and lymphocytes.An increase in SCC is due to an increase in PMN.The primary function of PMN is ingestion and destruction of invading microorganisms, as well as secretion of inflammatory regulators (Kelly et al., 2000).
Interleukins are a group of cytokines that were first seen to be expressed by white blood cells (leukocytes).The function of the immune system depends in a large part on interleukins.Previous studies also showed that certain cytokines may play a critical role in the pathophysiology of mastitis.For example, increased expression of several acute phase cytokines, such as IL1, IL8 and TNF-α, were positively correlated with the most severe clinical symptoms often associated with coliform or endotoxin-induced mastitis (Hoeben et al., 2000;Nakajima et al., 1997;Shuster et al., 1996;Sordillo and Peel, 1992).Mastitis leads to production of a variety of bioactive molecules.Milk-derived cells from infected mammary glands contained increased amounts of mRNA for Interleukin (IL) IL-1B, IL6, IL8 and IL-12B (Jernej et al., 2008).All the interleukins belong to the CXC chemokine family that is produced by a wide range of cells, including lymphocytes, monocytes/macrophages, neutrophils, fibroblasts and vascular endothelial and epithelial cells, in response to inflammatory stimuli, such as viral and bacterial infections.
Regulation of gene expression is fundamental to link genotypes with phenotypes which ultimately drive biological processes.Since RNAs transcribed from noncoding portions of genomes are playing fundamental roles, genome-wide approaches have provided valuable insights into various aspects of transcriptome which is essential for interpreting the functional elements of the genome and revealing the molecular constituents of cells and tissues and also for understanding various development and disease aspects.
Recently, the development of novel highthroughput DNA sequencing technology has provided a new method for both mapping and quantifying genes.In RNA-Seq, all RNAs of a sample (or, more often, polyA + RNAs) are randomly fragmented, reverse transcribed, ligated to adapters and then these fragments are sequenced.Gene expression levels are estimated from the number of sequence reads deriving from each gene (Wang et al., 2009).The greater accuracy and coverage of the expressed transcriptome makes this method suitable for addressing global features of transcriptome.
Very little information on buffalo transcriptome (only few sequenced genes and ESTs are available in the database) coupled with no information on mastitis linked genes in buffaloes led us to analyze the buffalo transcriptome for sequencing of mastitis related genes (IL-1B, IL6, IL8 and IL-12B), expression of these genes across tissues using RNA-Seq, the annotation along functional pathways and confer the phylogenetic relationship of these Interleukin genes with other species to trace out the evolutionary history.

MATERIALS AND METHODS
The nine tissues samples of Bubalus bubalis were selected for the study.The tissues were-liver, muscle, heart, hypothalamus, kidney, spleen, mammary gland, testis and lungs.Transcriptome analysis was carried out using Next Generation Sequencing.RNA was isolated following the standard protocols of RNeasy Kit (Qiagen).The mRNA comprised only 1-3% of total RNA samples and was not readily detectable even with the most sensitive methods.Quantification of RNA was done using Agilent 2100 Bioanalyzer (Agilent, Foster city, USA) which provided ng RNA/μl values.The clear 28S and 18S rRNA bands were indicative of intact RNA.The Bioanalyzer 2100 was used for all the nine isolated samples of RNA.The RNA samples were processed further if the RNA Integrity Number (RIN) was found to be greater than or equal to 8.5, for RNA-Seq library preparation.

RNA-Seq library preparation and data generation:
Standard Illumina kit was utilised for RNA library preparation.Paired end protocol for generation of data was used.In addition to sequence information, both reads contain long range positional information, allowing for highly precise alignment of reads.The paired-end sequencing assay utilized a combination of cBot (or the Cluster Station) and the paired-end module followed by paired-end sequencing on the Genome Analyzer IIx.The unique paired-end sequencing protocol allowed us the length of the insert (200-300 bp), generating highly quality, align-able sequence data.A typical paired-end run could achieve 2×76 bp reads and up to 40-60 million reads of the transcriptome data for each of the nine tissue samples utilised in the present study.The image analysis, base calling and quality score calibration were processed using the Illumina Pipeline Software v1.4.1 according to the manufacturer's instructions.Reads were exported in the FASTQ format and used for further analysis.

RNA-Seq data analysis:
The RNA-Seq data analysis was carried out using CLC Genomics Workbench (version 4.0.2;CLCBio).The quality filtered and trimmed reads were aligned to annotated Bos taurus genes downloaded from Ensembl Genome Browser (http://www.ensembl.org/info/data/ftp/index.html).For the genes downloaded, the default RNA-Seq settings were used to rank all potential matches, with mismatch cost of 2, deletion cost of 3 and insertion cost of 3. The highest scoring matches that shared ≥80% similarity with the reference sequence across ≥50% of their length were included in the alignment.This permissive alignment ensured that even reads derived from highly mutated orthologs between cattle and buffalo were not discarded.We utilised RNA-Seq tool of CLC Genomics Workbench and mapped the reads of nine tissues from buffaloes.
Expression profile and SNP detection: RNA-Seq data generated was mapped for each of the 9 tissues of buffalo using Illumina GAIIx with cattle genes as reference (Ensemble ver 69).CLCbio ver 4.0.2(www.clcbio.com) was used for the Digital Gene Expression analyses of the buffalo transcriptome.The statistics utilized was RPKM (Reads per Kilobase exon Model per million mapped reads) values.The RPKM values were normalized for total exon-length and the total number of matches in an experiment, making it possible to compare different tissues.The resultant mapped files were then merged and utilized for heterozygous SNP calling.For SNP detection, CLC Genomics Workbench was used which utilizes Neighborhood Quality Standard (NQS) algorithm (Altshuler et al., 2000).The central base quality score of ≥20 and average surrounding base quality score of ≥15 were set to assess the quality of reads at positions for SNP detection.We utilised the criteria of depth of coverage of ten and the minor allele frequency of 2 out of 10 reads (minimum allele frequency of 20%) for the identification of SNPs (Fig. 1).

Extracting gene sequencing and mapping for gene ontology terms:
The sequences of the Interleukin genes were extracted using the facility of CLCBio.The Six frame translation of the gene sequences were carried out using the CLCBio workbench utility and the amino acid sequences were extracted.The proteinprotein interaction and co-expression of the four interleukin genes involved in mastitis were carried out using STRING web server version 9.0 (Szklarczyk et al., 2011).
Phylogenetic analysis: BIOEDIT version 7.1.3was used for sequence alignment.All the sites including gap, insertions, deletions, matches and mismatches were kept.
The gene and species tree were build using two Bayesian and likelihood methods i.e.Randomized Accelerated Maximum Likelihood-VI (RAxML-VI) (Stamatakis, 2006) using Topali ver 2 (Milne et al., 2009) and Bayesian MCMC analysis using Beast ver 1.7.1 (Drummond and Rambaut, 2007) software.Default parameters were used for selecting the best fitting model for the utilised sequences using Topali V2.0.The best model was selected based on AIC (Akaike information criterion) and BIC (Bayesian information criterion).The HKY (Hasegawa et al., 1985) model of nucleotide substitution was used in both the software in order to obtain comparable results.The Tracer was used to analyze the log files obtained by the Bayesian analysis using Beast software (Rambaut and Drummond, 2007).The XML files used as input were prepared using Beauti software.The WAG substitution model of sequence evolution for amino acids was used with an uncorrelated lognormal model of rate change.The other priors were kept as defaults.The final maximum credible tree was visualized using FIGTREE (Rambaut, 2006(Rambaut, -2008)).The species tree was built by alignment of the nucleotide sequences of the four Interleukins of all the species by Split Network Analysis using SplitsTree (Huson and David, 2006) and using HKY algorithm of BEAST software.
The present study was carried out in National Bureau of Animal Genetic Resources, Karnal under research project on Identification of QTLs for somatic cell count which is associated with mastitis.

Generation and assembly of expressed short reads:
Illumina sequencing generated short sequence reads of expressed sequences.cDNA libraries were made from  RNA samples prepared from 9 tissues of buffaloes.The mRNAs were sequenced with one lane for each tissue using Illumina GAIIx and it generated 41.6, 16.5, 21.7, 23.8, 28.1, 65.1, 27.3, 39.2 and 84.5 million 2 X 76-bp paired-end reads in Heart, Hypothalamus, Kidney, Liver, Lungs, Mammary glands, Muscles, Spleen and Testis respectively (Table 1).The sequence reads were aligned against annotated Bos taurus genes from Ensembl browser (http://www.ensembl.org/info/data/ ftp/index.html)version 69 as a reference.Among the genes mapped 5433 genes mapped in heart, while 4843 genes were mapped in hypothalamus, 8078 in kidney, 6043 in liver, 8517 in lungs, 2437 in mammary glands, 4621 in muscles, 7828 in spleen and 8707 in testis.Total genes mapped in each of the tissue is summarised in Table 1.
The Digital Gene Expression was measured in the terms of RPKM values.The expression of the four Interleukin genes (IL-1B, IL6, IL8, IL-12B) for 9 different tissues of buffaloes using RNA-Seq is present in Table 2.The IL genes revealed their differential expression in various tissues.All ILs except IL6 revealed expression in all or some of the nine tissues studied.IL1B was found to be expressing in hypothalamus, lungs, mammary glands and muscles while IL8 expressed only in mammary gland; IL12B was found to express in kidney and muscles.IL1B revealed high expression in lungs while IL8 in mammary gland and IL12B in kidney, respectively.

SNP identification:
As the sequences from buffalo transcriptome were aligned to form a consensus sequence which was aligned with Bos taurus annotated sequences downloaded from Ensemble genome browser version 69, the variation within buffalo (heterozygous SNPs) and variations from cattle were identified for IL1B, IL8 and IL12B gene.In IL1B gene, 34 variants were observed from cattle sequences, with 3 non synonymous changes viz (Phe56Leu, Ile119Val and Asn150Ser).The changes within buffalo were also observed-6 variants were identified however all changes were synonymous.Similarly in IL8 gene, 38 variations were observed from cattle with 6 non synonymous SNP's leading to changed amino acid viz.(Ile18Pro, Ala95Thr, Ala97Thr, Met100Thr, Leu11 0Pro and Ala111Asp).Single non-synonymous variation was also observed in buffalo.In IL12B gene, 40 variations were recorded from cattle with no changes in amino acid, while only one synonymous variation was observed in buffalo.The SNVs observed in buffalo have been summarised in Table 3.
All the interleukin were mapped on GOIDs using the web server g: Profiler to confirm the functional aspects.The GOIDs were analysed using REViGO web server for removal of redundant GO terms.In IL1B, the GOs associated with biological components (Fig. 2) are responsible for inflammatory response, immune response, immune process system, biological regulation and biological process.The molecular function related to binding, cytokine activity, receptor binding, protein binding activity while the IL8 gene mapped with GOs responsible for immune response and response to stimulus (biological process) given in Fig. 3; binding, protein binding and receptor binding (molecular function).In case of IL12B, the GOs associated with biological components (Fig. 4) are responsible for cellular metabolism, gene expression, biological process, macromolecular metabolism and primary metabolism while the molecular function were related to nucleic acid binding, molecular function and protein domain binding.
The interaction among the resultant protein sequences of Interleukin genes were tested using the web-server STRING (Szklarczyk et al., 2011).The IL genes revealed a close relation among themselves as given in Fig. 5 wherein the three genes viz; IL1B, IL6 and IL8 show interaction among themselves at protein level.The thicker is the line; stronger is the relation between the genes.Thus in case of bovines, IL1B and IL6 had a close relation among themselves in terms of occurrence and signaling pathways.The gene IL12B however did not reveal any interaction with other interleukins.We utilized the database and interaction/ coexpression profile of Mus musculus and Homo sapiens to compare the results which revealed similar results (http://string-db.org/).
All the IL genes however revealed no coexpression in Bos taurus, but in Homo sapiens and Mus musculus, certain degree of co-expression was observed between 1L6 and IL1B (Fig. 6) and very low levels of coexpression was observed between IL6 and IL8.This can be attributed to lack of information for coexpression analysis in Bos taurus whereas significant level of co-expression observed in Mus musculus and Homo sapiens may be due to the availability of information related to co-expression analysis in these model species.
To place (Bubalus bubalis) with respect to other livestock and mammalian species, we carried out phylogenetic analysis using sequences of four Interleukin genes to infer homology, compare evolutionary relationship of ILs of Bubalus bubalis with other species using Maximum likelihood and Bayesian methods.The Maximum likelihood tree constructed for IL genes of buffalo are given in Fig. 7.The two clades utilising RAxML method (Fig. 7) contained the IL1B and IL8 gene which were more closely related to each other which in turn are in close association to IL12B followed by IL6.We utilized Bayesian Markov Chain Monte Carlo (MCMC) techniques that incorporate rate heterogeneity between branches and allow estimation of divergence times even when the clock is violated (Thorne et al., 1998;Sanderson, 2003;Yang and Yoder, 2003).These methods provided better insights into the phylogenetic analysis utilizing IL gene sequence.The species tree was constructed using HKY nucleotide substitution model in Beast (Fig. 8) and Splits Tree 4 analysis using Neighbor Net algorithm using IL genes of buffalo and 11 other species viz., Bos taurus, Equus caballus, Canis familaris, Felis catus, Homo sapiens, Gallus gallus, Mus musculus, Ovis aries, Pan troglodytes, Rattus norvegicus and Sus scrofa.An avian sequence was included as the avian and mammalian lineages diverged about 300 million years ago, yet avian immune responses are broadly similar to those in mammals (Brownlie and Allan, 2011).The species tree constructed using Beast and Splits Tree 4 analysis (Fig. 9) gave similar results as given by the gene tree.
The split network approach (Fig. 9) was chosen as the tool to evaluate phylogenetic relationships between four IL gene sequences because it can represent incompatibilities between and within data sets (Huson and David, 2006).The inferred IL phylogenetic network was quite well resolved and tree -like.The networks encompassing the sequences of Bubalus bubalis and Bos taurus showed parallel edges, rather than single branches, suggesting potentially ambiguous signals in the data set.This might reflect a high degree of similarity among members of family Bovidae.The Fit index for this network was found to be 98.01%suggesting other distance based phylogenetic methods would provide similar trees.
In case of Interleukin 8, Bubalus bubalis formed a completely distinct group from other members of Bovidae in the species tree.This may be attributed due to the variations in the amino acid sequences of cattle and buffalo.Total of 18 variations in amino acid sequences of cattle and buffalo were observed with 9 additions of amino acid in buffalo.This makes buffalo a completely distinct group compared to other members of family Bovidae as depicted in the species tree (Fig. 8  and 9).
Sequences of highly diverged species were not included as their inclusion decreased the number of aligned amino acid sites available for analysis.The species tree constructed using Splits Tree 4 analysis gave similar results as given by the gene tree.

DISCUSSION
The interleukins revealed different levels of expression in the various tissues studied.IL1B revealed expression in Hypothalamus, Lungs, Mammary gland and muscles.High degree of expression was observed in Lungs.Similar reports have been published by Herman et al. (2010) on expression of IL1B and IL-1 receptor genes in Hypothalamus of anoestrous ewes.The treatment of these animals reflected increase in immune/inflammatory responses of IL1B expression in hypothalamus.IL1B expression has been reported to be observed in bronchial and smooth muscles in humans (Human GeneAtlas) (www.genatlas.org/).Gene expression profiling of mammary gland development reveals putative roles of IL1B in death receptors and immune mediators in post lactational regression (Richard et al., 2003).Initial burst of cytokine and cytokine receptor expression was observed within 12hrs of forced weaning which included the acute phase mediators LIF receptors and pro-inflammatory mediators like IL1B.Similarly, IL8 was found to be highly expressed in mammary gland of Bubalus bubalis.This may be due to the expression of interleukins in mammary gland in case of infection.The expression of IL8 was seen in mammary gland endothelial and epithelial cells following experimental mastitis infection in Holstein cows with E. coli as reported by McClenahan et al. (2006).High levels of expression of IL12B were observed in kidney of Bubalus bubalis and moderate expression was observed in muscles.Similar results have been obtained in kidney of Mus musculus and muscles of Rattus norvegicus (GeneCards, Weizmann Institute of Science).
Since there were no reports for structural, evolutionary and phylogenetic relationship of interleukin gene in buffaloes, phylogenetic networks have an important role to play in the reconstruction of evolutionary history.Implicit models such as split networks are very useful for exploring and visualizing the different signals in a data set.The split network methods describe a comprehensive statistical framework for phylogenetic analysis with the representation of splits for a more accurate representation of the data.We carried out split network analysis for Interleukin genes of Bubalus bubalis, the sequence generated on buffaloes were compared to other members of the family Bovidae.The phylogenetic analysis revealed a clear homology between buffalo and cattle ILs.The results were confirmed by constructing gene and species tree.In species tree, Bos taurus is much closer to Bubalus bubalis followed by Ovis aries thereby grouping the whole Bovidae family together.The gene tree constructed with the help of Maximum likelihood methods clearly clusters IL1B and IL8 in one clade.IL8 has been shown to participate in leukocyte recruitment during inflammation, supporting its role as a chemokine (Chang et al., 2010).IL8 is released from many cell types, including T cells, monocytes and endothelial cells (Steinkasserer et al., 1992;Gracie et al., 2003;Park et al., 2000).Its expression is induced after exposure to a variety of inflammatory stimuli including bacteria, oxidative stress, LPS, TNF and IL1B (Shono et al., 1996;Strieter et al., 1989;Yoshimura et al., 1987;Lakshminarayanan et al., 1998).Thus, IL1B and IL8 can be clustered together using Maximum likelihood approach.Both IL6 and IL12B are pro-inflammatory cytokines and their structural relationship is well established (Lindsay and Vignali, 2011).The gene tree is in congruence with the species tree which also displays the same results and thereby clustering the Interleukins together.

CONCLUSION
The gene sequences of the interleukin genes have been deciphered and the level of their expression in various tissues was examined in buffaloes.The variation in gene sequences and their resultant effect on amino acid sequence change (synonymous and nonsynonymous) have been deciphered.The phylogeny of buffalo for these four immune response genes with respect to other livestock and mammalian species was carried out.The gene tree and species tree analysis revealed a clear homology between members of Bovidae which cluster together except for IL8 where Bubalus forms an altogether different clade from rest of the members of family Bovidae due to the difference in amino acid sequences.IL1B and IL8 clustered together in buffaloes and this has been attributed to the expression of IL8 is induced after expression of IL1B.The inferences drawn from the above study can add to the evolutionary history of the interleukin genes in buffalo.Since the genes are related to mastitis the variations identified can act as markers for various epidemiological studies as well as diversity analysis in buffaloes.Yang, Z. and A.D. Yoder, 2003

Fig. 1 :
Fig. 1: Screenshot of CLC Genomics Workbench software showing SNP detection with Bos taurus as reference SNPs for IL-1B, IL6, IL8 and IL12B were also identified.The Interleukin genes were mapped with the GOIDs.The gene names were mapped using webserver g:Profiler (http://biit.cs.ut.ee/gprofiler/) for classification into Biological process, Cellular component and Molecular function.REViGO web server (http://revigo.irb.hr/) was utilised for summarizing the long lists of Gene Ontology terms associated with these genes for removal of redundant GO terms.The remaining terms were visualized in semantic similarity-based scatterplots, interactive graphs and tag clouds.Six frame translation of the gene sequences were carried out using the CLCBio workbench utility and the amino acid sequences were extracted.The proteinprotein interaction and co-expression of the four interleukin genes involved in mastitis were carried out using STRING web server version 9.0(Szklarczyk et al., 2011).
Fig. 2: Scatterplot (A) and TreeMap (B) graphs of GOs mapped to biological process in IL1B Fig. 3: Scatterplot (A) and TreeMap (B) graphs of GOs mapped to biological function in IL8

Table 1 :
Summary of genes mapped and number of total reads utilised for nine tissues

Table 3 :
SNVs/InDels (Insertion-Deletion) and functional annotation of genes in buffaloes Functional annotation of genes - . Comparison of likelihood and Bayesian methods for estimating divergence times using multiple gene loci and calibration points, with application to a radiation of cute-looking mouse lemur species.Syst.Biol., 52: 705-716.Yoshimura, T., K. Matsushima, S. Tanaka, E.A. Robinson, E. Appella et al., 1987.Purification of a human monocyte-derived neutrophil chemotactic factor that has peptide sequence similarity to other host defense cytokines.Proc.Natl.Acad.Sci.USA., 84: 9233-9237.