Dog Olfactory Gene Expression Pro ling Using Samples Derived from Nasal Epithelium Brushing


 Dogs have an exquisite sense of olfaction. In many instances this ability has been put to use by humans in a wide range of important situations. It is also thought that some breeds have better senses of smell than others. Dogs can detect many components at a very limited concentration in air. To achieve such high levels of detection, the dog olfactory system is both complex and highly developed. The dog genome encodes a large number of olfactory receptor (OR) genes. However, it still remains unclear as to what extent are all of these OR genes expressed? To address this, a nasal brushing method was developed to recover dog nasal epithelium samples from which total RNA could be extracted to prepare high quality cDNA libraries. After capture by hybridization with a large set of oligonucleotides, the level of expression of each transcript was measured after deep sequencing by next generation sequencing (NGS). The reproducibility of the sampling approach was checked by analyzing several samples from the same animals (up to 6, 3 per each naris). The quality of the capture was also checked by analyzing two DNA libraries, which offered the advantage over RNA libraries by having an equal presence of each gene. Finally, we compared this brushing method on live animal to a biopsy approach applied to two terminally ill dogs, euthanized following consent from the owner. Comparison of the levels of expression of each transcript indicate that the ratios of expression between the most and the least expressed OR in each sample are > 10,000 (paralog variation) and that a number of OR genes are not expressed. The method developed here will allow us to address whether variations observed for any of the OR transcriptomes relate to dog life experiences and whether any differences observed between samples are dog-specific or breed-specific.


Introduction
Olfactory receptors were discovered 30 years ago by Buck and Axel, who identi ed a novel sub-family of GPCR (G protein coupled receptor) in rat olfactory epithelium (1). The importance of this discovery, which transformed the whole eld of olfaction, was recognized by their award of the Nobel Prize for physiology or medicine in 2004. For all wild animals, olfaction is a vital function. It participates critically in the foraging of food, to the selection of sexual partners, prevention to danger and to escape from predators. For humans, even if these functions are ful lled differently, olfaction still remains an important function and people suffering from anosmia or even hyposmia are very much at a disadvantage or in pain.
Since, the discovery of several OR gene transcripts in rat olfactory epithelium, many animal olfactory gene repertoires have been identi ed through full genome DNA sequencing, showing that these genes represent the largest gene family with several hundred members scattered across many chromosomes (2)(3)(4)(5)(6)(7).
However, the number of studies relating to gene transcription and expression studies in olfactory tissues is limited to decipherer the number of expressed genes (8)(9)(10)(11)(12)(13) These studies showed that 90% of human and rat and 70% of dog OR genes are expressed although at very different levels with no or limited differences between males and females. They also showed that transcription is not limited to intact OR genes but that some pseudogenes are also expressed (14,15). As long and widely recognized, olfaction is a very important and well-developed function in dogs. However, what makes some dogs so good at nding hidden objects, drugs or explosives or even at detecting that its owner is developing a melanoma? ( 16,17). Moreover, many dog breeds, like hunting dogs, have been derived with a particular attention toward this function. It is thus important to know if any differences exist regarding OR expression between dog breeds and also between dogs and other mammals. With these questions in mind, as soon as the dog genome sequence was available (18), we determined the dog OR repertoire (15,19) and the genetic polymorphism of several OR genes within a cohort of 48 dogs representing six different breeds known for their different olfactory capabilities (20,21). The number of OR genes and the extent of their genetic diversity are important parameters in determining the olfactory capabilities of a mammal. However, variation in ORs expression levels as well as of the proteins implicated in the odorant transduction signal toward the brain represent major aspects of understanding and explaining individual differences. Analysis of the dog olfactory epithelium poses di cult problems; these include ethical and painless tissue sampling, anatomical issues given the large size of the olfactory epithelium which can be up to 200 cm 2 for an adult German Shepherd and thus be challenging to recover in its entirety (22).
Two ways of accessing the olfactory epithelium exist. The rst one is following euthanasia of ill and incurable dogs or from deceased dogs following an accident. Sampling nasal epithelium following euthanasia, in principle, gives access to the whole olfactory epithelium (OE). However, relying on euthanasia only, presents limitations not just in the number of samples but also to the variety of dog breeds investigated; and even more importantly the circumstances relating each situation. In other words, restriction of nasal epithelium sampling to just instances of euthanasia only, will seriously restrict research aimed at analyzing the full variation of olfactory transcriptomes in response to a given situation such as breed, training and the age of the animals. A second option is by the gentle scraping of the nasal epithelium with the aid of a nasopharyngeal swab, similar to those used at the hospitals in Otorhinolaryngology (ORL) services. In an effort to develop an ethical and minimally invasive general method of sampling which can be utilized in many circumstances, we investigated a brush sampling approach and compared its utility.

Material And Methods
Olfactory epithelium samples All samples of nasal epithelium were collected at the Clinique Vétérinaire (Pole Santé Chanturgue-63100 Clermont-Ferrand -France) under general anesthesia (Ketamine-Imalgene 1000®and xylazine -Rompun® -aa, 0,1 mL/kg IV, Iso urane-O2) performed for chirurgical purposes by gentle brushing made with endocervical DOC cytobrushes (Medispo.com). Two of these samples, one from a Bichon and a second from a Golden Retriever were used for total mRNA sequencing. We collected also several samples from four dogs, a Belgian Shepherd, a West Highland White Terrier, a Whippet and a Labrador Retriever. All animals were being anesthetized for programmed surgical interventions and samples collected with owner consent and ethic committee approval. After recovery of the nasal epithelium, brushes were immediately placed into a tube containing 1.5 ml of RNA later solution and sent to the laboratory where they were stored at -80C° until subsequent nucleic acid extraction.
In addition, with owner consent several biopsies were taken from two euthanized dogs, a Cane Corso and a Golden Retriever that were at the terminal phase of lung cancer. These biopsies were taken in order to sample the olfactory epithelium at different locations. These samples were processed as all the other samples described above.

Nucleic acid isolation
Total RNA was extracted and puri ed with the Nucleospin RNA kit (Macherey Nagel). Following titration with a Nanodrop spectrometer, the quality and purity of the RNA samples were assessed with a BioAnalyzer (Agilent

Sequence data analysis
Sequencing was made on a HIseq 4000 sequencer (Illumina, 2x125nt) using the v4 chemistry/HBS Hiseq kit. One line was used for each sample to produce up to 300x10 6 reads per sample. Image analysis and base calling were performed using Illumina Real-Time Analysis software version 2.7.3 with default parameters. Raw sequence data produced by the Genotoul platform and Integragen were sent to the laboratory on oppy disks for processing and analysis. The two sequence extremities were rst trimmed to remove the remaining primer sequences and any bases with poor quality base calling often present at the extremities of the reads with Cutadapt (25). The trimmed sequences were then aligned with STAR.v2 through Galaxy (Sigenae) (25). The resulting BAM les were analyzed with Samtool, Bedtool, Cu ink and Stringtie (26) on the Toulouse Genocluster and with the Geneious suite (27).

Statistical analysis of the data and Heatmap constructions based on the FPKM values were made with the
Manhattan and Ward method using R language by 'in house' written lines of commands (28).

OSN transcriptome analyzes
Two samples of nasal epithelium tissues were obtained as described above. One sample was from a male Bichon, the second sample was from a female Golden Retriever, both aged of 8 years. The two samples were processed as described in the methods section. As we anticipated that the olfactory neurons might be contaminated by other cells, thus reducing the level of neuron-speci c transcripts and the OR transcripts, each sample was deeply sequenced and up to 300 million reads obtained, maximizing the chances of capturing transcript differences from poorly expressed genes. A number of reads to be compared to the 60 million reads previously reported for murine neurons (10). As shown in Table 1, approximately 90% of the reads could be mapped at unique positions. This high percentage of mapping resulted from the good quality of the RNA, libraries and the sequencing itself. Given this, we therefore believe the two gures 0.27 and 0.31, representing the percent of mismatches between the sequence reads aligned onto the reference genome (CanFam3.1), are indicative and representative of any slight polymorphism differences Analysis of the sequence data with the Geneious suite (27) allowed us to identify many genes and to calculate their respective FPKM values, i.e. the number of reads corresponding to each transcript, a metrics de ning the abundance of all transcripts (Additional data les 1a, b) and OR transcripts (Additional data le 2). For these two samples, despite the very deep sequencing strategy only 14% and 16% only of the OR gene transcripts were detected with an FPKM > 0.1, corresponding to 112 Bichon and 104 Golden Retriever OR genes respectively, percentages to be compared to 90% for the OR murine and human repertoires and the 70% for canine repertoire (10,13).
As summarized in Table 2, 88% and 90% of all the annotated genes (ENSEMBL.org) and 62 and 56% of the nonannotated genes are expressed at a detectable level. This high percentage of expressed genes is probably due to the composition of the samples made of several cellular cell types. Based on their respective FPKM values the 10 most expressed identi ed genes are listed in Table 3. A comparison of the dog gene expression ranks with their murine orthologs (10), indicates strong differences. None of the highly expressed dog genes was found to be strongly expressed in the murine tissue. SCGB1A1, the most highly expressed gene in the Bichon sample, is not even detected in a murine sample; VMO1 at the second position in the two dog samples ranks at position 1498 in the mouse sample and the same applies to TAGNL2 at position 6750. Similarly, the olfactory major protein (OMP), a protein characteristic of the olfactory tissues, and the 3rd most expressed transcript in murine OSN, ranks at positions 2502 and 1796 in the dog samples ( Table 4). The Gα sub unit of the G(olf) protein encoded by GNAL, a key protein in the transduction pathway, ranks at position 9 in the mouse sample and 10680 and 8767 in the dog samples. These large differences in gene ranking observed between the dog and murine samples, strongly suggests that in the dog samples the olfactory sensory neurons (OSN) were heavily contaminated by adjacent cellular types, such as the supporting cells diluting the expression of the OSN genes and comparatively increasing that of non-OSN genes (29,30).

OR Targeted transcriptome analysis
Reliability and reproducibility of the approach In this series of experiments, we analyzed 14 samples from four dogs only, i.e. two to six per dog, to appreciate the reproducibility of the samplings, as well as the difference that could exist between the right and left nostrils. As shown in Additional data le 3, a strong correlation does exist between the different FPKM values obtained for the same dog, either if one compares the right and left nostril samples or the different samples from the same nostril.
In the heat map presented in Fig. 2 hierarchical clustering is seen for the four Labrador Retriever samples, for the two West Highland White Terrier samples and for four out of the ve Whippet samples. A discrepancy was seen for the two Belgian Shepherd samples which are not grouped, suggesting a sampling or a sequencing problem.
Alternatively, a particular physiological condition could have induced a different transcription pro le of one of the two nostrils of this dog (31). Nevertheless, the coherent grouping of a large majority of the samples indicated good reproducibility of the sampling itself and the good quality of the sequencing. Moreover, the grouping of the different samples of each dog indicates, within the limit of the experimental procedure, the absence of difference between the right and left nostrils. One of the rst observations of the data summarized in Additional data le 5, is the large extended range of expression regardless of the samples used. Values ranged from above several thousand FPKM for the most expressed OR down to 0.1 for the least expressed. These data indicate that a variable, but large, proportion of the OR genes are not even detectable, having (if transcribed) an FPKM value below 0.1. If one concentrates on the OR genes having an FPKM value ≥ 1% of the most expressed OR gene of the sample, then these results are even more surprising. About 30 genes are above this limit, as already observed for the Bichon and Golden Retriever of which their data were not normalized and for which we obtained 39 and 37 OR genes above this limit.
To address the issue of whether the normalization made was appropriate and correct we compared the list of 20 most expressed Bichon and Golden OR genes (Additional data les 1a and b) with that the 20 most expressed OR genes of the West Highland White terrier, Whippet and Labrador (Additional data le 5) and noticed that 9 out of these 20 genes are present in all samples, indicating that the correction made was correct.
A further issue to resolves was whether the rather low proportion of OR being expressed re ected the truth or was a consequence of a capture effect and/or of the variable number of sequencing reads per sample.
To address this, we plotted the DNA FPKM values of the Cane Corso and Golden Retriever samples in blue and the FPKM values of the corresponding RNA in red (Additional data les 6 a and b). As shown in the graphs, no correlation existed between the DNA and RNA FPKM values of any genes: e.g. Golden Retriever CfOR 12F06 or CfOR 0268 genes. Furthermore, a large number of genes were not transcribed at a detectable level, whereas their cognate genes were well captured by the same set of oligonucleotides. Thus, the fact that a large number of OR transcripts

Spatial segregation of OR expression
Spatial segregation of OR gene expression within the olfactory epithelium has already been documented although this remained to what extent a similar situation exists in dogs. Thus, we considered whether the limited number of OR genes number expressed in our samples was a consequence or not of this spatial segregation (19). To approach this question, four and six biopsy samples were taken from two dogs, a Cane Corso and a Golden Retriever respectively (Figure 4). In Additional data le 7 are given the FPKM values of the different OR transcripts.
These data les show that up to 414 and 512 OR genes are not expressed in the Cane Corso and Golden Retriever samples respectively.

Discussion
The dog genome contains a large number of OR genes (4), able to explain in part, the great variety of volatile components a dog can recognize and how in a complex environment recognize an odor to which it has been trained. To be effective the OR genes have to be transcribed and expressed. Of these two aspects, we presently know very little. Several transcriptome studies have analyzed the spectrum of OR expressed in humans, rats, and mouse (8-12) but until recently very few studies have been done with dogs (13). A major reason has been the di culties and problems associated with collecting appropriate samples.
As shown with the Bichon and the Golden retriever samples, a gentle brushing of the nasal epithelium allows us to recover su cient olfactory neurons to extract their total RNA content and to perform a transcriptomic analysis.
However, the relative quanti cation of the speci c transcripts of the olfactory neurons such as those of the OMP or the Gα subunit of the G(olf) protein, indicates that these two samples were heavily contaminated by other cells such as the supporting cells (29,30). The main consequence of this heavy contamination is a dilution of the OSN transcripts. Thus, the OMP mRNA transcript, the 3rd most expressed transcript in murine OSN (10) However, in order to compare the transcription pro le of the different paralog OR transcript within a sample a correction factor for crude FPKM values should be applied. Since each gene is present twice in any genome, the FPKM ratio of the two OR genes (Additional data le 4) of a pair is a consequence of the e cacy difference of the capture itself for each of these two genes. As explained in the result section (Additional data le 5), a correction was applied by affecting each crude RNA FPKM value by a factor corresponding to the ratio of the FPKM values of this gene to the most expressed gene of the sample calculated from the DNA gene libraries.
Previous studies have been made regarding the transcriptome analysis of several mammal olfactory tissues (8)(9)(10)(11)(12)(13) These studies have shown that nearly all OR genes would be expressed at a detectable level. In contrast to this, in our study, it appears that far less dog OR dog gene transcripts are detected. Interestingly, no more than 30 genes reach an FPKM value ≥ 1% of the OR the most expressed in the sample (Additional data le 5). As shown in Fig. 3, we observed no correlation between the highest FPKM values in any sample and the number of expressed OR genes in the corresponding sample and in all samples, we observed a very large range of expression as much as 10,000 times.
The absence of correlation between the FPKM values of the DNA genes and the RNA transcripts, indicates the low number of expressed genes is not a consequence of a failure of the hybridization capture but might correspond to a characteristic of the canine RNA olfactory pro le, in at least of the samples analyzed (Additional data le 6),). Given this we considered whether the relative low number of OR gene being expressed could be due to the sampling itself, as a consequence of a strong regionalization of the expression of the different OR genes all along the canine olfactory epithelium, as previously observed with rat (19). To tackle this question, we prepared several samples representing different site locations of a Cane Corso and Golden Retriever olfactory epithelium from dogs compassionately euthanized for highly advanced cancer. As shown in the data presented in Additional data le 7, although the RNA pro les of the different biopsies are not strictly identical in either of the two animals, very importantly, up to 512 and 414 OR genes (i.e. 56 and 46% of the whole set of OR genes) are not detectable whatever the samples and their localization in the OE and up to 40% are silent when one combines the data of the 10 biopsy samples. At present we have no explanation regarding the much larger number of expressed OR genes found by Saraiva et al who reported that only 14% of OR genes were not detected (13). This could be due to breed differences sampled, (in the case of Saraiva et al it was a mixed breed). Nevertheless, it is important to keep in mind, that the absolute number of observed expressed OR genes, is probably less important in characterizing the RNA pro ling of any species than the range to which the genes are expressed. It is unclear what meaning in biological terms such low expression of genes may represent. Whatever the issue of the absolute number of dog OR genes expressed, it is very important to consider the observation that the ratio of expression of the human and mouse OR genes RNA is much lower (8,12) to that found by Saraiva (13) and from our study reported here.

Conclusion
The focus of this study was aimed at evaluating a non-invasive sampling approach which could be ethically and practically used in research applications study the olfactome of dogs across a wide range of different variables including breed diversity, age, behavioral conditioning and environmental situations.
Our approach, was inspired by human otorhinolaryngology routine practice and the need to establish a veterinarian led procedure which is easy to establish and does not in ict pain or impact on animal wellbeing. The data we present here support that this is a valid and robust qualitative and quantitative approach to investigating expression pro les and its variation. The different samples obtained by nasal brushing and biopsy are highly similar to each other. Up to approximately half of dog OR genes appear to be silent or not detectable although this needs to be established in su cient sample sizes across a wide range of breeds. There is also a signi cant number of genes where their expression is either very low or exceptionally high. The absolute number of genes expressed genes may not necessarily represent an important biological parameter given the low level of expression of some genes. A concerted international effort is now required to investigate and characterize the dog olfactome and determine is impact on canine health and welfare. DATA AVAILABILITY All data will be freely available upon publication. Any further request could be obtained from the corresponding author galibert@univ-rennes1.fr.

CONFLICT OF INTEREST
The authors declare that they have no competing interests.

FUNDING
Funding were obtained from the CNRS (Centre National de la Recherche Scienti que) and the University of Rennes (UR1), France, through the annual recurrent budget. No grant was solicited neither obtained for this work.
Author's contribution NA extracted the nucleic acids, analyzed the data, made the tables, gures and Additional data les and participated in the writing of the paper, GC provided the samples, A.S.G provided the biopsy samples and FG conceived the study, analyzed the data, wrote the paper and got the funding. All authors read and participated to its nal edition.

ACKNOWLEDGEMENT
We are grateful to the Genotoul bioinformatics platform Toulouse Midi-Pyrenees (Bioinfo Genotoul) for providing computing resources and to Clemence Genthon and Olivier Bouchez from the Genomic Plateform INRA Auzeville.    Table 4 Level of expression of a couple of key OSN transcripts.

Tables
This table shows that the ratio of the level of expression between the most expressed OR and the a subunit of the Golf (GNAL) differs according to the species. In the case  Tables (2a to 2D).  This image is not available with this version.