Differential gene expression between hygienic and non-hygienic honeybee (Apis mellifera L.) hives

Hygienic behavior is a complex, genetically-based quantitative trait that serves as a key defense mechanism against parasites and diseases in Apis mellifera. Yet, the genomic basis and functional pathways involved in the initiation of this behavior are still unclear. Deciphering the genomic basis of hygienic behavior is a prerequisite to developing an extensive repertoire of genetic markers associated to the performance level of this quantitative trait. To fill this knowledge gap, we performed an RNA-seq on brain samples of 25 honeybees per hives from five hygienic and three non-hygienic hives. This analysis revealed that a limited number of functional genes are involved in honeybee hygienic behavior. The genes identified, and especially their location in the honeybee genome, are consistent with previous findings. Indeed, the genomic sequences of most differentially expressed genes were found on the majority of the QTL regions associated to the hygienic behavior described in previous studies. According to the Gene Ontology annotation, 15 genes are linked to the GO-terms DNA or nucleotide binding, indicating a possible role of these genes in transcription regulation. Furthermore, GO-category enrichment analysis revealed that electron carrier activity is over-represented, involving only genes belonging to the cytochrome P450. Cytochrome P450 enzymes’ overexpression can be explained by a disturbance in the regulation of expression induced by changes in transcription regulation or sensitivity to xenobiotics. Over-expressed cytochrome P450 enzymes could potentially degrade the odorant pheromones or chemicals that normally signal the presence of a diseased brood before activation of the removal process thereby inhibit hygienic behavior. These findings improve our understanding on the genetics basis of the hygienic behavior. Our results show that hygienic behavior relies on a limited set of genes linked to different regulation patterns (expression level and biological processes) associated with an over-expression of cytochrome P450 genes.


Background
The honeybee (Apis mellifera) is a valued resource for both mankind and the global environment. Honey is an important food product internationally, but pollination is by far the honeybee's most valuable contribution [1]. Bees contribute to almost 90 % of crop pollination around the world [2,3]. In Canada, beekeepers need to store their hives to protect them from difficult wintering conditions. This storage seems to increase the colony sensitivity to infections, which is translated into a greater mortality of bees during the winter [4,5].
However, the mechanisms involved in this decline of population linked to winter's mortality remain unclear. Some studies put forward the use of chemicals pesticides, including acaricides, which are detected inside the hives [6][7][8]. Sublethal exposures to chemicals like neonicotinoid pesticides lead to a disturbance of the behavior of foragers that failed to return to the hives [9]. On another point of view, a higher pathogen incidence can be responsible of the decline observed. That is why some research pointed to the impact of different pathogens [10][11][12][13][14]. Overall, it appears that some factors (pathogen outbreaks, pesti-cides…) or combinations of factors compromise the immunity of bees, and alter their behavior [6][7][8][9][10][11][12][13][14][15].
In honeybees, immunity operates on different levels [16]. Individual immunity encompasses behavioral (autogrooming), mechanical, physiological and immunological defenses [17][18][19]. Pairwise defenses include allo-grooming and a colony-wide behavioral mechanism called hygienic behavior [20], a type of nest-cleaning behavior. Nurse bees in response to diseased or dead brood exercise this collective mechanism. Hygienic behaviour is performed by younger bees (<27 days old) and mainly by middle age bees (15)(16)(17)5 days old) [21,22]. This cleaning is accomplished by two different actions. First, the nurses uncap the brood, which is operculated by wax (uncapping), and secondly, they remove the pupae from the brood cell (removal).
This hygienic behavior was first described in 1937 by Park, but its genetic basis was first suggested by , who proposed a two loci model to explain hygienic behavior inheritance [22][23][24][25][26][27][28]. Since then, this behavior has been recognized as an example of the influence of mendelian inherited genes on behavior. One locus (u) was thought to be involved in uncapping and the other (r) in removal. The homozygote for one of the loci should either uncap (uu) or remove (rr). Later, a three loci model was developed to better fit the original data [29]. Recently, four studies based on molecular techniques (RAPD and SNP) found respectively seven, six, nine and two QTLs associated with hygienic behavior [30][31][32][33]. These results suggest that the genetic basis of hygienic behavior is more complex than previously thought.
A diseased brood detection threshold is determined by how quickly a nurse can detect and initiate the diseased brood removal process. This detection seems to be influenced by the olfactory capabilities of nurse bees [34][35][36]. Furthermore, it seems that all worker bees show various levels of hygienic behavior and its effectiveness is linked to the speed of execution. Brain gene expression is closely related to behavioral status in honeybees [37]. Therefore, in order to ensure identification of a reliable signal correlating both gene expression and hygienic behavior, we examined brain tissue from nurse bees of colonies that were the most contrasted in terms of the phenotypic trait of interest. This strategy has been proven to be valuable for detecting candidate genes [37]. In our study, we analyzed the transcriptomic profiles of 13 managed honeybee colonies. The objective was to investigate and compare differential gene expression between hygienic and non-hygienic lines in order to identify genes involved in hygienic behavior. Ultimately, the goal was to provide functional genetic markers for SNP analysis in order to develop useful genomic tools for honeybee selection programs.

Results
In 2012, the 13 hives were evaluated for hygienic behavior using the freeze-killed brood assay [38]. Data from a previous evaluation in 2011, performed on the same colonies, was also available ( Table 1). Comparison of the two evaluations showed that hygienic behavior varied between years. To avoid any bias, we chose to classify the colonies as hygienic or not based on the 2012 evaluation. A wide range of hygienic behaviors was observed during our experiment (Table 1). In 2012, three colonies were classified as non-hygienic (removal of dead brood < 50 %), five exhibited intermediate behavior and five were highly hygienic (removal of dead brood > 90 %). Extreme behaviors were selected for the transcriptome analysis in order to increase the detection power of DEG.
A total of 293 296 626 reads were sequenced for the eight colonies most distinct in terms of hygienic behavior (  (Table 2). Further, a high number of the clean reads were assembled and mapped to the reference genome of A. mellifera (85.56 % ± 0.87) ( Table 2). Unmapped reads were not retained in the analysis.
From the 11 168 genes referenced in the genome of A. mellifera, 10 519 genes were found to be expressed in the hygienic pools and 10 374 genes in the non-hygienic pools. The top 10 expressed genes were the same for the two behavioral conditions, but their order differed ( Table 3). All are genes involved in royal jelly production. The other major royal jelly protein genes (Mrjp) were also detected in our data, but at lower levels. Ninety-six genes were found differentially expressed between hygienic and non-hygienic bees (Fig. 1). Twentyeight genes were over-expressed in hygienic bees and 17 of these had a log2 fold change higher than 1, meaning that expression of the gene was two times or higher in hygienic bees (Table 4). The three most DEGs (log2 fold change > 2) were CYP6AS1, Syn1 and LOC100577331. LOC727589 was appended to this list because it was not expressed in non-hygienic bees but lightly expressed in all hygienic colonies. Mir375 and Mir252 genes showed relevant patterns but, as they were not expressed in all hygienic colonies, were not statistically significant. These two genes were highly expressed in two colonies, 571 and 586 respectively. Sixty-eight genes were overexpressed in non-hygienic bees (Table 5), with a fold change higher than 1 for 20 of them. Six genes were highly differentially expressed: Hex70c, LOC410988, LOC552229, LOC100576440, LOC726319, LOC727570.
Among the 96 DEGs, 79 were located on all the 16 linkage groups of the honeybee genome, 2 on the mitochondrial chromosome (ND1 and ND4) and 15 on unplaced scaffolds (Fig. 2). Two of the three genes highly related to hygienic behavior were located on chromosome 10 (Syn1 and LOC100577331) and the third one on chromosome 13 (CYP6AS1). Concerning the nonhygienic genes, only Hex70c and LOC552229 were located on chromosomes 8 and 1 respectively. The other genes were located on four different unplaced scaffolds.
Genetic links with hygienic behavior have been studied previously in order to detect QTLs associated to this behavior [30][31][32][33]. By comparing our results to accessible previous data (Table 6), the genomic sequences of 22 gene candidates (i.e. exhibiting DGE) were located inside the confidence interval of 95 % of all the QTLs influencing hygienic behavior (Table 6).
For the 96 genes differentially expressed, 71 were associated with at least 1 GO-term and 86 with an interproscan result. The ontology covered three domains: cellular component, molecular function and biological process.
Twenty-four GO-terms classified in molecular function were found and were recovered by 63 genes (Fig. 3a and Table S1). For the cellular component domain, 36 genes were assigned to 17 GO-terms ( Fig. 3b and Table S2). Fifty-nine genes were assigned to a biological process (41 GO-terms) ( Fig. 3c and Table S3). Furthermore, 15 DGEs were assigned as potential transcription factors by direct blast to a drosophila gene belonging to the GOterm DNA binding or by the Blast2Go annotation with the GO-terms DNA binding or Nucleotide binding. For the KEGG pathways analysis, 10 genes were involved in 13 KEGG pathways. KEGG pathways and the enzyme involved are presented in Table 7.
Gene set enrichment was performed to compare the enrichment of the different GO-terms between the 96 genes differentially expressed and the whole genome. The results of this analysis indicate that GO-term electron carrier activity (GO-ID: GO:0009055) is over-represented in our DEGs. The genes involved in this GO-term were: CYP4AZ1,  CYP6AS8, CYP6AS1, CYP6AS11, LOC100576440, Cyp4g11 and LOC100577883. All of these genes were involved or considered as potentially involved in the cytochrome P450. Six of these genes were over-expressed in non-hygienic bees, and only gene CYP6AS1 was overexpressed in hygienic bees.

Discussion
To our knowledge, this study is the first to correlate honeybee hygienic behavior with differential gene expression.
Almost all previous studies that analyzed honeybee RNA focused on the parasitic / pathogenic reaction and its variability among different honeybee castes [30,33,[39][40][41]. In this study, we used RNA-seq to highlight, at the genomic scale, which genes are differentially expressed in hygienic versus non-hygienic colonies. Our data indicates that not all previously referenced genes present in the genome of honeybee are expressed in our bee brain samples. Only 10 519 genes (94.2 % of all genes referenced) were found expressed in hygienic  honeybees and 10 374 genes (92.9 % of all genes referenced) in non-hygienic bees. Genes may be missing from our samples due to the fact that we studied gene expression only for the RNA pools from bee brains, and not the entire tissue composites. It has previously been demonstrated that tissue specificity is highly important for RNA-seq design [42]. The distribution of gene expression was very similar between both behavioral states (hygienic and nonhygienic). This result can be explained by the fact that the most abundantly expressed genes were not differentially expressed (Table 3). Interestingly, the genes most expressed in our samples were all involved in processing royal jelly. This finding is consistent with a previous study that found nurses and foragers highly expressed the Mrjp gene family [40]. As in the article of Liu et al. 2011, we found a low level of expression of the Mrjp 6 in bee brains.
The differences between the two behavioral states rely on few genes (96 genes), which is 94 % less than differences between nurses and foragers (1621 genes with a fold change greater than 2) [40,41]. However, this finding is consistent with results between such less-differentiated castes as guard, undertaker or comb builder [41]. It seems that, as for other task specializations (comb building, guarding and undertaking), few genes influence the performance of hygienic behavior, and they are thus more tightly regulated than caste specialization. This could be explained by the fact that caste specialization (queen, nurse, forager) is strongly influenced by the environment (queens vs workers) or age-related differences [43][44][45]. The low numbers of genes differentially expressed (DEGs) between the two intra-caste behavioral states may reflect a difference in bee age. Consequently, as the difference in behavioral state is the only variable among our samples, these few DEGs relate to differing behavioral performance  that occurs according to a design independent of any age limit or caste. Furthermore, we hypothesize that expression of gene involved in the hygienic behavior is constitutive and not facultative, especially during the detection of diseased brood [20]. This ensures that the differential expression of these genes is related to the hygienic behavior and not to age or caste. This assumption is also supported by the fact that RNA analysis was performed on a pool of 25 nurses so it represented the global transcription of the caste and not an individual gene expression event.
All DEGs are statistically significant (FDR correction, q-value < 0.05) despite the fact that fold-changes are at low levels. Low level fold-change acts in favor of a subtle modification of brain gene expression, much as has been suggested for task specialization [41]. However, if we consider the higher level of fold-change, gene dispersion is concentrated on unplaced scaffolds (4 genes) and chromosomes 1, 8, 10 and 13 (respectively 1, 1, 2 and 1 genes). Interestingly, chromosomes 1, 10 and 13 are known to carry QTLs linked to honeybee hygienic behavior [30,31,33]. This result is also supported by the fact that genomic sequences of the 22 DEGs are located in the confidence interval of all the QTLs influencing hygienic behavior (Table 6) except for the QTL named hyg 2, located on chromosome 5, the QTL region localized on chromosome 7 by Spötter et al. and the QTL found in chromosome 1 by Tsuruda et al. 2012. These results show that the DEGs found in our study are consistent with the previous literature. However, it also seems that regulation of the gene expression linked to hygienic behavior is spread more widely throughout the genome than previously thought. The localization of these genes is, however, quite surprising, because it was thought that the genetic basis of hygienic behavior was localized on few loci. We unexpectedly found DEGs on all 16 honeybee chromosomes, as well as on mitochondrial chromosomes and unplaced scaffolds. These results suggest a wider regulation of the transcription. The high number of DGEs classified as potential transcription factors supports this supposition. Four of them (GMCOX12, LOC725238, Myo20 and Pka-C1) are even located on hygienic QTL positions (chromosomes 1, 1, 10 and 2 respectively).
Gene Ontology analysis shows that the two biological processes most represented by the DEG are multicellular organismal development and regulation of biological processes. This result indicates that most DEGs contribute to the development of larvae into adult bees and are also involved in gene expression regulation, protein modification or interaction with a protein or substrate molecule. These two biological processes are consistent with the theory that few genes have a wide influence (as transcription factors). At the level of molecular function, the three most represented GO-terms are protein binding, hydrolase activity and nucleotide binding. As previously   discussed, protein binding and nucleotide binding are functions that are involved in the regulation of molecular processes like transcription. It seems that hygienic and non-hygienic bees differ in their transcription pathways. Interestingly, hydrolase activity is particularly overexpressed in non-hygienic bees. Hydrolase activity is a process involved in the catalysis of various bonds, including the catalysis of peptides such as the one that can signal the presence of diseased brood. Gene Set Enrichment analysis of the DEG shows that only one GO-term is over-represented (FDR correction, q-value < 0.05): electron carrier activity (GO-ID: GO:0009055). Furthermore, all the genes associated with this GO-term are involved or can be considered as potentially involved in cytochrome P450 pathway. Among these genes, some are coding for different enzymes of the cytochrome P450. These enzymes are known to be involved in many processes, particularly detoxification of xenobiotics and hormonal degradation. Furthermore, they are suspected of playing a role in the degradation of odorants, pheromones or defensive chemicals [46]. The threshold for detection of diseased brood is one key factor in how quickly a nurse can detect and initiate the removal process. This detection capacity could be influenced by the nurse's olfactory capabilities [34][35][36]. We can therefore hypothesize that non-hygienic bees that over-express cytochrome P450 enzymes degrade the odorant pheromones or chemicals that normally signal the presence of diseased brood before activation of the removal process. The bees are then less efficient in detecting contaminated broods. The high level of cytochrome P450 enzymes in non-hygienic bees can be explained by two non-exclusive hypotheses. First, the non-hygienic bees may have a constitutively higher expression of these enzymes due to the differences observed in regulation patterns, as previously discussed. Or, the induction of cytochrome P450 gene expression may be due to a higher sensitivity to xenobiotics. It has been demonstrated that xenobiotics can enhance the expression of cytochrome P450 genes [47]. A higher sensitivity to xenobiotics could then induce a stronger response by over-expressing cytochrome P450 genes, which in turn might alter the performance of hygienic behavior.

Conclusions
This study is the first to characterize the transcriptomic basis of the differential performance of hygienic behavior by the honeybee (Apis mellifera L.). Our findings show that hygienic behavior relies on a limited set of genes, most collocated with the QTLs described in previous studies as playing major roles in honeybee hygienic behavior. The differences between behavioral states (hygienic and non-hygienic) can be explained by different regulation patterns (expression level and biological processes) associated with an over-expression of cytochrome P450 genes. These candidate genes provide relevant targets for SNPs analysis (cis-regulatory sites and coding sequence) to develop molecular tools for honeybee genetic programs, which would provide a rapid and efficient method for selecting honeybee colonies with a high level of hygienic behavior.

Ethics statement
The owner of the land on which the hives were located gave permission to conduct the study on the site. No additional permit was required, considering the fact that the owner gave his permission. The GPS coordinates were (46°40'31.06" N 71°54'57.98" W). The field study did not involve endangered or protected species.

Sample collection
This study was based on 13 managed honeybee colonies (Apis mellifera), selected from among the livestock of our bee research facility (Deschambault, Québec, Canada) in June 2012. Young queens had been introduced in these colonies in July 2011. These queens were hybrid Italian/Buckfast stock obtained through our selection program. Each colony was comprised of a 9frame Langstroth hive body. Selected colonies were all of equivalent strength (6-7 frames of bees/brood). All hives were placed in the same apiary to avoid influence of environmental conditions. The freeze-killed brood assay [38] was chosen to measure the hygienic behavior capability of each colony. This test consisted of freezing a patch of a pupated sealed brood with liquid nitrogen. Briefly, 100 mL of liquid nitrogen was poured on two circles (15 cm diameter) within the brood area of each hive (7 day old larvae). The liquid nitrogen was confined to a specific spot on the brood frame, covering an area of 60 cells. Hygienic behavior was evaluated by calculating the number of brood removed in a period of 24 h. The hygienic behavior of each hive was estimated as a percentage of the dead larvae removed by the worker bees. A colony that removed 90 % of the dead larvae or more was considered as hygienic and a colony that removed less than 60 % of the dead cells was considered as non-hygienic. Among the studied colonies, five colonies removed between (50-90) % of the cells; these were classified as intermediate, and were sequenced but not included in the differentially expressed genes (DEG) analysis. Forager bees can revert to brood task such as removing dead brood when there is a nectar flow. In our study, hygienic test was done in absence of a nectar flow (mid May). Honeybee samples were taken the day following the hygienic test and great care was taken to sample only bees that were on the brood frame where the hygienic test was performed.

RNA extraction
Total RNA was extracted from a pool of 25 honeybee brains using TRIzol® Reagent protocol from Invitrogen [48] with some modifications. The 25 honeybee brains sampled from each colony were dissected and treated as one pool, then added separately to 1 mL Trizol with 50 mg of acid washed glass beads and gently mixed for 5 min. Samples were then incubated at room temperature for 5 min. 200 μL of Chloroform was added, mixed vigorously and the mixture was incubated at room temperature for 12 min (with a vortex step at mid-incubation) followed by a centrifugation at 12 000 g for 15 min at 4°C. Supernatant was then washed with 250 μL each of Isopropanol and hypersaline solution (1.2 M sodium citrate; 0.8 M NaCl) with incubation for 10 min at room temperature followed by centrifugation at 12 000 g for 15 min at 25°C. The RNA pellet was subsequently washed twice with 1 mL 75 % ethanol and centrifuged at 12 000 g for 7 min at 24°C. The pellet was dried and 30 μL of nuclease-free water was added to each extraction. Purity and quality of the RNA was assessed by quantification with a Nanodrop 2000 spectrophotometer (Thermo Scientific). Tubes were then stored at −80°C.

Library construction
To construct a paired-end library for Illumina sequencing, we used the Illumina TruSeq TM RNA sample preparation kit according to the manufacturer's instructions.
First, sample quality was confirmed by an Experion RNA analysis following the Experion RNA StdSens analysis kit protocol (Bio-Rad). Then, 4 μg of the total RNA sample was used for poly-A mRNA selection using streptavidincoated magnetic beads. This protocol uses two rounds of enrichment for poly-A mRNA followed by mRNA fragmentation. The fragmented mRNA samples were subjected to cDNA synthesis using the Illumina Tru-Seq TM RNA sample preparation kit according to the manufacturer's protocol. Briefly, cDNA was synthesized using reverse transcriptase (Super-Script II) and random primers. The cDNA was further converted into double stranded DNA using the reagents supplied in the kit, and the resulting dsDNA was used for library preparation. The double-stranded cDNA obtained was subjected to library preparation using the Illumina TruSeq™ RNA sample preparation kit (Low-Throughput protocol) according to the manufacturer's protocol. The quality of the library was controlled on an Agilent technologies 2100 bioanalyser following the protocol provided with the Agilent DNA 7500 kit. In the final step before sequencing, all 13 individual libraries were normalized and pooled together using the adapter indices supplied by the manufacturer (Illumina indexes 3-8, 12-16, 18, 19). Pooled sequencing was then performed as 150 bp, paired-end reads in a single lane of an Illumina HiSeq2000 instrument at McGill University and the Génome Québec Innovation Centre.

Data processing
The raw reads were first assessed and trimmed for quality using the FASTX toolkit (http://hannonlab.cshl.edu/ fastx_toolkit/). Reads with a bad quality score (phred score < 20) were discarded. The remaining reads were synchronized, and single reads were also discarded. Reads were then assembled and mapped to the reference genome of the honeybee (ftp://ftp.ncbi.nlm.nih.gov/genomes/ Apis_mellifera/) using Tophat [49,50]. Reads were assembled for different K-mer values (Kmer = 27 to Kmer = 53) to ensure good quality assembly. The data obtained were analyzed for the numbers of reads mapped, and the best quality mapping was selected with SAMStat [51].

Identification of Differentially Expressed Genes (DEG)
Gene expression was calculated with Cufflinks [50] based on the honeybee genome as well as the annotation file downloaded from the NCBI database (ftp://ftp.ncbi.nlm. nih.gov/genomes/Apis_mellifera/GFF/). The abundance of each transcript mapped against the 11 169 genes annotated on the reference was estimated. The abundance of each transcript was then normalized by calculating the transcript abundance in Fragment Per Kilobase of exon per million fragments mapped (FPKM). Gene and transcript expression changes among samples were analyzed with Cuffdiff software. Differential expression was considered as statistically significant when the q-value (FDR correction) was lower than 0.05.

Gene Ontology (GO) annotation
All transcripts of Apis mellifera referenced in the NCBI ftp server (ftp://ftp.ncbi.nlm.nih.gov/genomes/Apis_mellifera/ RNA/) were used as a reference database. Genes were mapped to the references by a blast search (parameters: E-Value Hit Filter = 1e-06; Annotation cutoff = 55 and Goweigth = 5) to retrieve the GO terms associated. Then query sequences from the pool of GO terms were assigned to functional terms. Mapping from GO terms to enzyme codes permits the subsequent recovery of enzyme codes and KEGG pathway annotations. Once this database was constructed, we performed the same analysis on the differentially expressed genes (DEG). GO enrichment analysis of functional significance maps all DEGs to terms in GO database. Then, a comparison of GO-term abundance between the differentially expressed gene set and the genome background was performed to look for enriched GOterms. KEGG pathways enrichment analysis is based on the same procedure, so we compared the pathway representation in the DEG set and the genome background to identify significantly enriched metabolic pathways or signal transduction pathways. All of these analyses were performed with Blast2GO [52].