Characterization of a newly developed chicken 44K Agilent microarray

Background The development of microarray technology has greatly enhanced our ability to evaluate gene expression. In theory, the expression of all genes in a given organism can be monitored simultaneously. Sequencing of the chicken genome has provided the crucial information for the design of a comprehensive chicken transcriptome microarray. A long oligonucleotide microarray has been manually curated and designed by our group and manufactured using Agilent inkjet technology. This provides a flexible and powerful platform with high sensitivity and specificity for gene expression studies. Results A chicken 60-mer oligonucleotide microarray consisting of 42,034 features including the entire Marek's disease virus, two avian influenza virus (H5N2 and H5N3), and 150 chicken microRNAs has been designed and tested. In an important validation study, total RNA isolated from four major chicken tissues: cecal tonsil (C), ileum (I), liver (L), and spleen (S) were used for comparative hybridizations. More than 95% of spots had high signal noise ratio (SNR > 10). There were 2886, 2660, 358, 3208, 3355, and 3710 genes differentially expressed between liver and spleen, spleen and cecal tonsil, cecal tonsil and ileum, liver and cecal tonsil, liver and ileum, spleen and ileum (P < 10-7), respectively. There were a number of tissue-selective genes for cecal tonsil, ileum, liver, and spleen identified (95, 71, 535, and 108, respectively; P < 10-7). Another highlight of these data revealed that the antimicrobial peptides GAL1, GAL2, GAL6 and GAL7 were highly expressed in the spleen compared to other tissues tested. Conclusion A chicken 60-mer oligonucleotide 44K microarray was designed and validated in a comprehensive survey of gene expression in diverse tissues. The results of these tissue expression analyses have demonstrated that this microarray has high specificity and sensitivity, and will be a useful tool for chicken functional genomics. Novel data on the expression of putative tissue specific genes and antimicrobial peptides is highlighted as part of this comprehensive microarray validation study. The information for accessing and ordering this 44K chicken array can be found at


Background
The chicken, being the first farm animal with a completely sequenced genome, has become an important animal model in the fields of evolution, development, immunology, oncology, cell biology, virology, and genetics [1,2]. Candidate genes, QTL, and molecular markers have been widely utilized to reveal the genetic basis of economically important traits in chickens [3][4][5]. There are also many new genetic and bioinformatics resources available that are based upon chicken genome information, including genetic and physical maps [6], EST databases [7], and SNP maps [1,8]. Global gene expression profiling will provide a complementary tool improving our ability to study regulation of complex and economically important traits in chickens.
The development of high-throughput microarray has accelerated the study of gene expression by interrogating thousands of genes simultaneously [9][10][11]. Microarray technologies provide an important tool to infer gene networks and to identify highly conserved genetic pathways in plants and animals. There have been many important studies contributing to gene expression profiling in agricultural animals including pigs [12,13], rabbits [14], and cattle [15,16]. Several chicken cDNA or oligonucleotide probe (oligo) arrays have also been developed and utilized in gene expression studies. These arrays include a 3,011 lymphocyte array [17], a 3,072 intestinal array [18], an 11K heart specific array [19], a 14,718 macrophage specific array [20], a 13K cDNA transcriptome array [21], a 5K immune related array [21,22], a 20K long oligo chicken genome array [23], and a 33K Affymetrix chicken genome array [24].
Short and long oligo arrays have several advantages over cDNA arrays in terms of specificity, sensitivity, and reproducibility [25]. Both microarray technologies can provide comprehensive and reliable data for global expression analyses. However, oligos are more uniform in concentrations and annealing temperature, more gene-specific, flexible, and economic. Long oligos can provide increased signal intensity compared to short ones [26,27]. Long oligo arrays generated by Agilent Technology may be able to detect down to single transcript per cell [25]. This 60mer 44K chicken whole genome custom array which was developed by our group and manufactured using the Agilent Technology will provide a comprehensive and powerful functional genomics tool for the agricultural community.

Genes selected on the array
A total of 42,034 probes were designed based on the whole chicken genome sequence including autosomes, sex chromosomes, unlocalized chromosomes (i.e. E22C19W28, E26C13 and E50C23), and mitochondria (Figure 1), plus 1264 positive control features and 153 negative control features. Chicken chromosomes range from 0.15 Mb to 188.2 Mb [6]. In order to calculate the probe density (number of probes per Mb) on each chromosome, the number of probes targeted to each chromosome was divided by the length of the chromosome. The probe density ranged from 28 probes per Mb (Chr. 16) to 445 probes per Mb (Chr. 2), with a mean value of 76. This array also included probes designed from 150 chicken microRNA, 43 Marek's disease virus genes, and 20 avian influenza virus genes (10 H5N2 and 10 H5N3 genes).

Array quality
The signal-to-noise ratio (SNR) for each element was calculated using the difference of the median intensity, minus the median background, divided by the standard deviation of the background [28]. The percentage of high quality spots (SNR > 10) were calculated as the number of high quality spots divided by the total number of spots on the array. For all 24 arrays, the average percentage of high quality spots was determined to be 96.55 ± 4.89%.
To evaluate the array quality, two comparisons were carried out: (1) two biological replicates from the same tissue labelled with the same dye and (2) the same samples labelled with Cy5 and Cy3. The correlation coefficients of signal intensities between the two biological replicates and between the two different dyes compared among the same samples (dye swap) were calculated by JMP 5.5 (SAS Institute, Cary, NC) ( Figure 2). The correlation coefficients between two biological replicates of cecal tonsil labelled with Cy5 or Cy3 were 0.99, 1.00, respectively. The regression lines between two biological replicates of cecal tonsil labelled with Cy5 or Cy3 were y = 0.9779x + 0.0057 (R 2 = 0.99) and y = 0.9778x (R 2 = 0.99), respectively (Figures 2A, B). Dye swaps were utilized throughout this study in order to avoid the dye bias. The correlation coefficient and regression line between cecal tonsils labelled with Cy5 and Cy3 were 1.00 and y = 0.99x (R 2 = 0.98), respectively ( Figure 2C).

Gene expression in different tissues
Before normalization, signal intensities of each feature were filtered against negative controls in the array. The ratio of signal intensity for each gene and the average signal intensity of negative control elements were calculated. An arbitrary ratio of 1.5 was used to determine if a particular gene was expressed in a given tissue. It was found that 43.83% of all genes on the array were expressed within all four tissues. Looking at each tissue individually, it was found that 71.11%, 80.05%, 75.37%, and 80.22% of the genes on the array were expressed in cecal tonsil, ileum, liver, and spleen, respectively.
The correlation of signal intensities between biological replicates and dye swaps Figure 2 The correlation of signal intensities between biological replicates and dye swaps. A. The correlation of signal intensities between two individual cecal tonsil (C) samples labeled with Cy5, Y = 0.98 X+0.0057, R 2 = 0.99. B. The correlation of signal intensities between two individual cecal tonsil (C) samples labeled with Cy3, Y = 0.98 X, R 2 = 0.99. C. The correlation of signal intensities between the same cecal tonsil (C) sample labeled with Cy3 and Cy5, Y = 0.99 X, R 2 = 0.98.

A.
C.

B.
There were three pairs of tissue gene expression comparisons performed for each tissue as part of this study. These comparisons were used to obtain a list of genes that are specifically expressed in each tissue ( Figure 3). In summary, there were 286, 489, 4102, and 3929 genes significantly expressed in cecal tonsil, ileum, liver, and spleen (P < 10 -3 ), respectively; 167, 201, 1627, and 1141 genes at cut-off P value of 10 -5 , and 156, 88, 737, and 378 genes at P < 10 -7 .
Fold change is an indication of relative gene expression differences. It is considered that genes, which are expressed at a higher level in one tissue compared to all other tissues, are "tissue-selective" genes [29]. Those genes which are significantly expressed with at least two-fold higher expression in one tissue when compared to the other three tissues are considered to be selectively expressed in this tissue. The data on the selectively expressed genes at three different cut-off P values (10 -3 , 10 -5 , and 10 -7 ) as determined in this study are listed in Table  1. There were 120, 153, 857, and 541 genes selectively expressed in the cecal tonsil, ileum, liver and spleen at P < 10 -3 , respectively. There were 103, 115, 736, and 291 selective genes respectively at a cut-off P value of 10 -5 and there were 95, 71, 535, and 108 selective genes at a cut-off P value of 10 -7 . The selectively expressed genes expressed at P < 10 -7 are listed in the additional data files 1, 2, 3, 4.

Gene ontology
Functional category enrichment evaluation based on the gene ontology (GO) was performed on the differentially expressed genes for each tissue comparison ( Figure 4, 5, 6). There are three components to a GO annotation: cel-lular component (CC), molecular function (MF), and biological process (BP). Biological Processes may arguably be the more relevant aspect of GO in relation to this study, therefore, only functional clusters belonging to this component have been presented. Comparatively induced genes from liver when individually compared to the other three tissues showed GO BP enrichments associated with cellular biosynthesis, cellular lipid metabolism, coagulation, hemostasis, metabolism, nitrogen compound metabolism, and physical process ( Figure 4A). GO BP enrichment analysis of repressed liver genes for each of the three comparisons identified the categories of actin cytoskeleton organization and biogenesis, cell differentiation, cell organization and biogenesis and development. There were many significantly enriched functional categories associated with comparatively repressed genes when only considering the comparisons between liver and spleen, and liver and cecal tonsil. These were cellular physiological processes, primary metabolism, macromolecule biosynthesis, macromolecule metabolism, development, protein biosynthesis, protein metabolism, and regulation of cellular process ( Figure 4B). GO BP enrichments of induced genes in the spleen revealed enriched categories that included biopolymer metabolism, nucleobase, nucleoside, nucleotide and nucleic acid metabolism, physiological process, and primary metabolism (Figures 4B,5). Induced genes in comparisons of cecal tonsil with liver and ileum showed functional enrichment primarily categorized as cell death (Figures 4B,5). Comparisons of repressed genes in cecal tonsils with both spleen and liver showed enrichments associated with physiological process and response to stress ( Figures 4A, 5). In the functional comparisons of induced genes from ileum with both spleen and liver, there were enrichments associated with development ( Figures 4B, 6); however, the repressed genes in ileum when compared to spleen and liver showed functional enrichment of cellular biosynthesis, physiological process, and protein biosynthesis ( Figures  4A, 5).

Quantitative real time PCR
To validate the microarray results, quantitative real time PCR (qRT-PCR) assays were performed on the same RNA samples used for the microarrays. A total of 23 genes were selected for these verifications. These genes included induced and repressed genes that were significantly and Venn diagram showing the number of specifically expressed genes in each tissue Significantly enriched Gene Ontology (GO) terms for Biological Process classification of the differentially expressed genes Figure 4 Significantly enriched Gene Ontology (GO) terms for Biological Process classification of the differentially expressed genes. A. Up regulated genes between liver and cecal tonsil, liver and ileum, and liver and spleen. B. Down regulated genes between liver and cecal tonsil, liver and ileum, and liver and spleen. nucleobase, nucleoside, nucleotide and nucleic acid ...: nucleobase, nucleoside, nucleotide and nucleic acid metabolism. Percentage shown in Y-axis was calculated as genes in each GO term divided by all up regulated or down regulated genes in each comparison.

B.
A.
non-significantly expressed ( Table 2). The relative signal intensities of those genes selected for qRT-PCR ranged from low to high (10 to 65535). Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was used as the normalization standard.
The coefficient of variation between the replicate qRT-PCR reactions was calculated and ranged from 0.1%-2%. For the genes with P < 5 × 10 -4 in microarray results, 95.5% of the genes tested by qRT-PCR were also differentially expressed (P < 0.05); for genes 5 × 10 -4 <P < 0.05 in the microarray only 16.7% were significantly and differentially expressed (P < 0.05) using qRT-PCR; none of the genes with P > 0.05 were shown to be differentially expressed using qRT-PCR. In terms of regulation direction for the genes in each qRT-PCR comparison, the microarray results were always consistent with qRT-PCR for genes with P < 0.0001, but only 75% when considering genes with 0.0001 <P < 0.05, and genes with P > 0.05, we found 78.57% of genes were consistently expressed between microarray and qRT-PCR (Tables 3 and 4). The foldchanges (log2 ratio) of each gene for six comparisons are presented in Tables 3 and 4. Most of fold-changes in qRT-PCR were higher than those seen in microarray comparisons.

Microarray performance
Three different types of microarrays have been widely utilized in genome research including cDNA (long strands of amplified cDNA sequences), short oligonucleotide (25-30 nt), and long oligonucleotide (50-80 nt). Several studies have compared the performance of different platforms [10,13,[31][32][33][34]. Annotation and identity of the commercial oligonucleotides are reliable and the probe Significantly enriched Gene Ontology (GO) terms for Biological Process classification of the differentially up regulated genes between cecal tonsil and ileum, spleen and cecal tonsil, spleen and ileum  performance is excellent [32]. Commercial microarrays can provide higher precision than homemade microarrays [33]. This custom long-oligo array was generated by the Agilent SurePrint ink-jet technology, which also provides a flexible platform for revising and updating oligonucleotide probes in the array without additional cost [25,35]. Only small amount of RNA is needed for labelling (50 ng to 5 μg of total RNA or 10-100 ng of poly (A) + RNA) [35], compared to at least 20-30 μg total RNA using cDNA array. This is especially important for those applications that generate limited amounts of RNA, such as laser-capture.
Chicken, as a major food animal, plays a key role in nutrition and food safety for human health, and is a model organism in developmental biology and for disease research including virology, oncology, and immunology [36]. There were several chicken whole genome microarrays as noted in the introduction. The currently described 44K long oligonucleotide (60-mer) microarray has shown overall high array quality and specificity compared to cDNA and 25-mer oligo arrays [37]. In addition, the 4 × 44K platform in the array design has the feature of four independent arrays in one slide, which is more cost effective and can also reduce variations among the arrays within a slide. The design of this array was based on expressed sequences selected by walking over the chicken genome sequences in the UCSC genome browser. This manual approach allowed us to maximize genome coverage and minimize gene redundancy.
High background levels in an array platform can obscure the signal from low-expressed genes and impede accurate quantification. The magnitude of SNR can affect the sensitivity of the microarray, and a higher SNR indicates high sensitivity and low background. In general, SNR > 3 was used as the lower-bound threshold for spot detection [21] in the current microarray studies and a SNR > 10 was the indication of high quality spots [28]. More than 95% of the spots with SNR > 10 in the array compared to 86.3 to 88.9% with SNR > 3 for the chicken cDNA array [21] have demonstrated the high sensitivity of the current array. The average SNR of the current microarray was 921.93, which was much higher than the SNR of most cDNA array platforms (35.1 to 38.3). This will promote sufficient signal generation for the detection of even low copy genes.
Quantitative real time PCR has become the gold standard for the gene expression and generally used to validate the microarray results [38]. At the criterion of P < 5 × 10 -4 in the microarray analysis, false positives could be effectively controlled (95.5% consistency between microarray and qRT-PCR). For those 4.5% inconsistent ones, large variations were observed between four biological replicates within each tissue using the more sensitive qRT-PCR method, which caused higher P values. On the other hand, the results from qRT-PCR demonstrated that type II errors (false negatives) can be controlled, given certain cut-off P value from microarray analysis (100% true false, given P > 0.05). These results indicated that microarray analyses from the current array were statistically reliable and accurate.

Genes on the microarray
This whole genome 44 K microarray consists of probes designed from all potential genes and was designed based on the February 2004 chicken (Gallus gallus) v1.0 draft assembly. The current array design includes all of the available (150) chicken microRNAs from miRBase 8.1 [39,40], all known Marek's disease virus and two avian influenza virus (H5N2 and H5N3) transcripts. This array platform will provide a unique opportunity to study hostpathogen interaction using the same array simultaneously. This is important as we currently face potential emergence of an avian influenza virus epidemic. A second version of this array based on May 2006 chicken (Gallus gallus) v2.1 draft assembly has been updated and is now available.
A strict statistical criterion has been applied in the current analysis. Several thousand genes were differentially Significantly enriched Gene Ontology (GO) terms for Biolog-ical Process classification of the differentially down regulated genes between cecal tonsil and ileum, spleen and cecal tonsil, spleen and ileum expressed between every two tissue comparison even at P < 10 -7 . Because there were more than 40 thousand genes analyzed in this microarray experiment; therefore, it is important to control the proportion of false positives [41]. False discovery rates (FDR) based on P values is the expected proportion of true null hypotheses rejected in relation to the total number of null hypotheses rejected [42]. FDR is a more convenient and natural scale than the P-value scale, and it can provide the probability of a gene value to be false positive [43]. In this study, the FDRs were less than 5% for a P value of 10 -7 , which demonstrated the reliable results of the current microarray experiment. Similar FDR were observed in gene expression profiling between different tissues using a long oligo swine array [12].
Gene expression profiles of different normal tissues provide information about the biological function of the tissue and are expected to be conserved during evolution. Liver, spleen, and ileum have been widely utilized in gene expression profiling studies in human [29,[44][45][46] and swine [12,47]. There were some common gene ontology terms enriched with tissue comparison between spleen and ileum in both human [44] and chickens such as protein biosynthesis, energy pathways, and immune response. But there were some distinct enrich terms between human and chickens including cytochrome C oxidase activity in human, and cell death, development, M phase, macromolecule metabolism, and physiological process in chicken. For the comparison of liver and spleen, energy pathways, main pathways of carbohydrate metabolism, and fatty acid oxidation were enriched in human [44], while generation of precursor metabolites and energy, cellular carbohydrate metabolism, cellular lipid metabolism, tricarboxylic acid cycle organic acid metabolism were enriched in chickens ( Figures 4A, B).
Spleen is one of the major immune organs. Many immune-related genes were more highly expressed in spleen than the other three tissues in chickens. Similar results have been observed using northern blot hybridization [49], moreover, it was reported that immune response genes were selectively expressed in human spleen [44] and porcine small intestine [12]. Ileum is one of the more important tissues involved as part of bacterial pathogenesis studies in agricultural animals. Genes related to interaction between organisms and viral life cycle were specifically expressed in porcine ileum cDNA libraries [50]. In chickens, class II histocompatibility antigen, B-L beta chain and C7 were found ileum-selective (see additional file 2). No ileum-selective genes were available in human from the previous studies. The conserved gene expression profiles in tissue comparisons among species have provide a solid basis for comparative genomics study. The tissue-selective genes could be potentially used as markers for the origin of pathogen, like gutrelated pathogens.
Perhaps one of the most important and interesting findings in the study was in relation to antimicrobial peptides (AMPs). AMPs are essential for the innate immune response in plants, flies, mammals, and chickens. There are two major families of AMPs: defensins and cathelicidins. Fourteen β-defensins, known as gallinacins (GAL) and cathelicidin have been described in chickens [51][52][53].
In the present study, GAL1, GAL2 and GAL6-7 showed strong comparative induction in spleen and weakest expression in the ileum. Macrophage receptor with collagenous structure (MARCO) mediates alveolar macrophages to bind, ingest and clear the inhaled particle and bacteria [54]. MARCO only expressed on the marginal zone macrophage of the spleen and macrophages of meullary cord in lymph nodes in normal mice [55]. The current study corroborates this as we also found MARCO was highly expressed in spleen compared to other tissues.
To our knowledge, this is the first study to characterize tissue expression in chickens using a whole genome array. A total of four tissues were selected for this study. Two of these (liver and spleen) are complete organs, which play significant roles in many sophisticated biological functions of the animals. The liver is responsible for lipid, amino acid, and carbohydrate metabolism, while the spleen is an essential part of immune function in animals. The other two tissues (ileum and cecal tonsil) may have less complicated functions than liver and spleen. The GO analysis of global gene expression profiling among these four tissues supported the notion that more clusters of genes would be significantly enriched in the comparisons of organ (liver and spleen) against tissues such as ileum and cecal tonsil ( Figures 4A, 4B, 5). The majority of functional enrichments associated with gene regulation in the liver comparisons were consistent with the roles of liver [56]. In the spleen, there were many immune-related (cell death, apoptosis, response to stimulus etc) clusters enriched. In summary, the results above demonstrated that this newly developed chicken 44K whole genome array is a powerful genomic tool to investigate different biological processes in chickens.

Conclusion
We have characterized a newly developed chicken 44K whole genome oligonucleotide microarray using four major tissues. This microarray in theory consists of probes designed from the whole chicken transcriptome as well as 150 microRNAs, the entire genome sequences of Marek's disease virus and two avian influenza virus genomes. Comparison of gene expression among 2 organs and 2 tissues has been submitted to GEO providing valuable comparative gene expression data to the scientific community. Novel findings related to defensins and cathelicidin expression in the spleen is highlighted. Additionally, the custom tracks for sequences and probes used in this array have been built for Chicken Genome Browser Gateway in Note: All data are shown as log2, positive value means up regulated between tissue A vs. tissue B (tissue A -tissue B) # represents P value for the comparison is less than 0.0001 in microarray results. * represents P value for the comparison is between 0.05 and 0.0001 in microarray results. Micrroarray results shown in bold font in diagonal are the results those were used to select genes for qRT-PCR for each comparison.
UCSC providing an efficient tool to link genomic information from this powerful genome browser to our expression data. This array will be a complimentary platform for the scientific community to study genetics, immunology, developmental biology, genomics, nutrition, and food safety in chickens.

Tissue collection
Cecal tonsil (C), ileum (I), liver (L), and spleen (S) were collected from six two-week commercial broilers. Total of 24 samples were immersed into 10 volumes of RNAlater (Ambion, Austin, USA) and stored at -20°C until RNA isolation.

Microarray design
Loop design and dye swap were used in the microarray study ( Figure 7). In brief, four different tissue samples (cecal tonsil, ileum, liver, and spleen) from each chicken were designed for one loop. The orders of the tissues in different loops were changed so that there were four comparisons with a dye swap across all six pairs of tissue comparisons. Data from 12 measurements for each tissue were collected, with total of 48 measurements from 24 arrays.

RNA isolation
Tissues were homogenized using a Tissue Miser (Fisher Scientific, Houston, TX). Total RNA was isolated from each homogenized tissue using Trizol extraction method as described by the manufacturer (Invitrogen, Carlsbad, CA). All of DNA was removed from the samples using TURBO DNA free™ Kit (Ambion, Austin, TX) according to the manufacturer's protocol. The RNA quantity and purity were determined by NanoDrop ND-1000 spectrophotometer at 260/280 nm (Nano Drop Technologies, Wilmington, Delaware). The integrity of total RNA was assessed with an Agilent Bioanalyzer 2100 and RNA 6000 Nano LabChip Kit (Agilent Technologies, Palo Alto, CA). The RNA Integrity Numbers (RINs) for the samples were obtained. Only RNA samples with RIN values of 6, or higher, were used for the analysis.

cRNA preparation
A 500 ng of aliquot of total RNA was reverse transcribed into cDNA using the Low RNA Input Fluorescent Linear Amplification Kit (Agilent Technologies). The synthesized cDNA was transcribed into cRNA and labelled with either cyanine 3 or cyanine 5-labelled nucleotide (Perkin Elmer, Wellesley, MA). Labelled cRNA was purified with RNeasy Mini columns (Qiagen, Valecia, CA). The quality of each cRNA sample was verified by total yield and specificity calculated based on NanoDrop ND-1000 spectrophotometer measurement (NanoDrop Technologies).

Microarray hybridization
Labelled cRNAs with specificity greater than 8 were used for hybridization using the in situ hybridization kit plus (Agilent Technologies). Arrays were incubated at 65°C for 17 h in Agilent's microarray hybridization chambers. After hybridization, arrays were washed according to the Agilent protocol.

Image processing
Arrays were scanned at 5-μm resolution using GenePix Personal 4100A (Molecular Devices Corporation, Sunnyvale, CA) and images were saved as TIFF format. Auto Photomultiplier tube (PMT) settings were selected and adjusted to get the ratio of the overall intensities between two channels (Cy3 and Cy5) to 0.95 to 1.05. The signal intensities of all spots on each image were quantified by Genepix pro 6.0 software (Molecular Devices Corporation, Downingtown, PA), and data were saved as .txt files for further analysis.

Normalization and statistical analysis
The signal intensity of each probe was divided by that of negative control to filter the genes which were not expressed. The signal intensity of each gene was globally normalized using LOWESS within the R statistics package [57]. A mixed model that included the fixed effects of dye (Cy3 and Cy5), tissue, and random effect of slide and array, was used to analyze the normalized data by SAS (SAS Institute, Cary, NC). P value and fold changes between each comparison for each gene were calculated. One tissue was included in three comparisons, the significantly expressed genes among these three comparisons were joined together to derive the selectively expressed tis- The diagram of the microarray experiment design Figure 7 The diagram of the microarray experiment design.
The arrow represents Cy3, the end of the arrow represents Cy5.