Aquatic adaptation and the evolution of smell and taste in whales

Introduction While olfaction is one of the most important senses in most terrestrial mammals, it is absent in modern toothed whales (Odontoceti, Cetacea). Furthermore, behavioral evidence suggests that gustation is very limited. In contrast, their aquatic sistergroup, baleen whales (Mysticeti) retain small but functional olfactory organs, and nothing is known about their gustation. It is difficult to investigate mysticete chemosensory abilities because experiments in a controlled setting are impossible. Results Here, we use the functional regionalization of the olfactory bulb (OB) to identify the loss of specific olfactory functions in mysticetes. We provide the whole-genome sequence of a mysticete and show that mysticetes lack the dorsal domain of the OB, an area known to induce innate avoidance behavior against odors of predators and spoiled foods. Genomic and fossil data suggest that mysticetes lost the dorsal domain of the OB before the Odontoceti-Mysticeti split. Furthermore, we found that all modern cetaceans are revealed to have lost the functional taste receptors. Conclusion These results strongly indicate that profound changes in the chemosensory capabilities had occurred in the cetacean lineage during the period when ancestral whales migrated from land to water. Electronic supplementary material The online version of this article (doi:10.1186/s40851-014-0002-z) contains supplementary material, which is available to authorized users.


Antarctic minke whale (Balaenoptera bonaerensis) genome assembly 1.1 Genome sequencing
We used a whole genome shotgun strategy and the next-generation sequencing technologies on the Illumina HiSeq 2000 sequencer to sequence the genome of Antarctic minke whale Balaenoptera bonaerensis. Whale DNA was extracted from muscle tissue of an individual, that was purchased from a fish market in Japan. Paired-end TruSeq sequencing library with average insert size of 330bp was constructed and sequenced (100bp x2, eight lanes in total). As a result, 1.43G read-pairs, 285Gbp were obtained (Table S1).

Genome-size estimation and sequence-error correction based on the k-mer frequency spectrum
KmerFreq_HA program in the SOAPec ver. 2.01 package [2] was used to construct the k-mer frequency spectrum (k=21) (Fig. S1). Following Fig. S1, the total number of k-mers (K_num) was 195,248,671,032 and the peak depth (K_depth) was 60, respectively. Assuming that the genome size G can be estimated as G=K_num/K_depth, the Antarctic minke whale genome size was estimated to be 3.25Gbp. Note that this genome size would be overestimated because the clearly confirmable heterozygous peak (Fig. S1) increases the K_num value.
Assuming that most low-frequency k-mers were generated by sequencing errors, Corrector_HA program in the SOAPec package was used to correct these errors. Parameters were as follows: -k 21 -j 1 -l 6 -e 1 -w 1 -q 30 -r 36. As a result, 249Gbp sequences, as shown in Table S1, were retained for genome assembly.  Figure S1. k-mer frequency spectrum (k=21)

Genome assembly
As shown in Fig. S1, the Antarctic minke whale genome is highly heterozygous. Therefore, PLATANUS [3]  Statistics of assembled sequences are shown in Table S2. Total length of the contig assembly is as long as the estimated genome size (see §1.3). However, the total length of the scaffold assembly is much lower than that. It can be explained that PLATANUS assembler clashed bubbles which originated in heterozygous alleles efficiently. Actually, a common minke whale genome assembly was reported when we wrote this paper, and the total length of scaffolds of the common minke whale was 2.44Gbp [4], consisting with our results.
The Balaenoptera bonaerensis genome assembly thus obtained was named KUjira_1.0.
The sequenced reads and assembled genome are available in the DDBJ/EBI/NCBI databases under the following BioProject ID: PRJDB1465. Accession no. of the sequenced short reads is DRR014695. Accession nos. of assembled scaffolds are DF397027-DF818470.

Evolution of OCAM gene in the cetacean lineage
The V domain of the OB is characterized by the expression of the OCAM gene. We found that both minke whale and dolphin have kept an intact OCAM gene in their genomes. To understand the presence or absence of purifying selection on this gene in the anosmic odontocetes lineage, we analyzed the nonsynonymous to synonymous substitution rate ratio ω (d n /d s ) in cetacean branches based on the maximum likelihood (ML) method. The OCAM genes of two artiodactyls, cow (GenBank accession no. XM_002684635.2) and sheep (XM_004002813.1) were used as outgroups. DNA sequences of whale, dolphin, cow and sheep OCAM genes were aligned using the L-INS-i program in the MAFFT package [5,6] ver. 6.240 with manual adjustments, and the CODEML program in the PAML4.4 package [7] was used to analyze changes of selective pressure on each branch (Fig. S2). In this analysis, the transition/transversion rates were not fixed and the F3×4 model was used for codon usage biases.
The ω ratio on the dolphin branch is as small as that on the other branches. To test whether this gene has been evolved under the presence of purifying selection on this branch or not, we followed the method of Zhang et al. [8] Following the method of Nei and Gojobori [9], the numbers of nonsynonymous (N) and synonymous (S) sites of the dolphin OCAM gene were 1615 and 470, respectively. As shown in Table S3, this gene has significantly (p<0.01) evolved under the presence of strict purifying selection even on the 'anosmic' dolphin branch. Figure S2. The nonsynonymous to synonymous rate ratios (ω) in each branch, calculated based on the free-ratio model. The estimated numbers of nonsynonymous substitutions/synonymous substitutions, calculated by the method of Nei and Gojobori [9], are also shown under each branch based on the ancestral nucleotide sequences inferred by the Bayesian method [10].

Phylogenetic analyses and classification of TAAR, OR and V1R genes
In order to classify cow TAAR genes into TAAR1-9 and cetacean OR genes into class I/II, and to find whaledolphin orthologous genes, phylogenetic trees of deduced amino acid sequences of intact TAAR, OR and V1R genes were inferred. Trees were constructed by the neighbor-joining (NJ) method [11], based on the Poisson correction distance matrices. Sequences were aligned using the L-INS-i program, and gap sites were excluded from the analyses. Bootstrap values were obtained by 500 resamplings, and the calculations were carried out using MEGA5 [12]. TAAR1 genes were rooted in Fig. S3, as suggested by Hashiguchi and Nishida [13]. Class I ORs and class II ORs were rooted reciprocally in Fig. S4, as suggested by Niimura and Nei [14]. The V1R tree (Fig. S5) is unrooted. Seventeen cow intact TAAR genes were classified into TAAR1-9 as shown in Fig. S3. Whale and dolphin intact OR genes were classified into class I/II as shown in Fig. S4. Figure S4 shows that both minke whales and dolphins have maintained two class I OR genes, OR51E2 and OR51E1. These genes are also known as PSGR (prostate-specific G-protein coupled receptor) [15] and PSGR2 [16], respectively, and are known to express specifically in the prostate tissue. In addition to these two class I ORs, both whale and dolphins have maintained four class II intact ORs (OR6B1, OR6Q1, BotaORs802.8 and OR2AT4). Not only OR51E1 and OR51E2 but also these four class II OR orthologs might have functions unrelated to olfaction. Actually, OR6Q1 is known to express in epithelial cells on the surface of the tongue [17], suggesting that it has a non-olfactory function. In contrast, Fig. S5 shows that minke whales and dolphins do not share any orthologous V1R genes.

Amplification of a genomic region between TAAR4 and TAAR9 genes in the minke whale genome
As shown in Fig. 2c, a large-scale deletion was confirmed in the genomic region between TAAR4 and TAAR9 genes in the minke whale genome. However, unlike in the case of the OMACS (Fig. 2a) and NQO1 (Fig. 2b) genes, minke whale does not share the same genomic deletion with dolphin, indicating that the accuracy of KUjira_1.0 assembly is not validated by comparing the assembly with that of related species. Therefore, we amplified the genomic region between the TAAR4 and TAAR9 genes of Antarctic minke whale genome using a pair of primers shown in Table S4. PCR amplification was performed using AccuPrime Taq DNA Polymerase High Fidelity (Invitrogen) with 30 cycles of denaturation (94°C for 20sec.), annealing (60°C for 20sec.) and extension (68°C for 15min.). The expected length of the PCR amplicon of minke whale calculated based on the KUjira_1.0 assembly was 54747-45891 = 8856bp. The actual length of the PCR amplicon of minke whale was equal to the expected length (Fig. S6), indicating that our whale genome assembly is accurate. We sequenced the both ends of this amplicon on an automated sequencer (ABI3130, Applied Biosystems) using TAAR9f and TAAR4f primers as sequencing primers and confirmed that we correctly amplified the target region.

Evolution of the taste receptor genes in cetaceans
We annotated nucleotide sequences of the TAS1R (umami and sweet taste receptor) and TAS2R (bitter taste receptor) gene families in the minke whale, dolphin, and cow genome assemblies (see Supplementary Data 1-3).
In all the assemblies, the TAS1R family was composed of three genes, TAS1R1, TAS1R2, and TAS1R3. In the cow genome assembly, all exons of TAS1Rs did not include local disrupting mutations. However, a contig containing the 4 th , 5 th , and 6 th exons of TAS1R3 was inversely located against a contig containing the 1 st , 2 nd , and 3 rd exons in the chromosomal assembly. This may be due to misassembling because an alternative genome assembly of the The multiple alignments of each TAS1R in the cow, minke whale and dolphin as well as the human and mouse revealed that minke whales and dolphins share same pseudogenization mutations in the TAS1R1 gene and the TAS1R3 gene but they do not share any pseudogenization mutations in the TAS1R2 gene. Unfortunately, the 4 th exon of TAS1R2, where a frame-shift mutation was confirmed in whale KUjira_1.0 assembly, was not found from the dolphin Ttru_1.4 assembly. In order to reveal whether mysticetes and odontocetes share the same frameshift mutation in the 4 th exon of TAS1R2 gene or not, we amplified and sequenced the whole 4 th exon of TAS1R2 in an odontocete genome as well as three mysticete genomes (see Table S7) using a pair of primers shown in Table S5. The 4 th exon of TAS1R2 was not disrupted in all cetaceans except for that in minke whales (Fig. S7).
Considering all these things, we conclude that theTAS1R1 and TAS1R3 genes were turned into pseudogenes before the Mysticeti-Odontoceti split, but the last common ancestor (LCA) of mysticetes and odontocetes possessed intact TAS1R2 gene.  Concerning TAS2Rs, 20 intact genes and one truncated gene (TAS2R7) were found in the cow UMD_3.1 assembly, whereas only an intact and a truncated genes (TAS2R67 and TAS2R2TA, respectively) were found in the whale KUjira_1.0 assembly, and neither intact nor truncated genes were found in the dolphin Ttru_1.4 assembly. The truncated gene, TAS2R2TA, in the minke whale might be derived from the same loci of a pseudogene, TAS2R2BP, because TAS2R2BP was also located in the scaffold end and TAS2R2TA and TAS2R2BP were not overlapped each other in the seven-TM topology. There were 16, 12, and 11 pseudogenes in the cow, minke whale, and dolphin genome assemblies, respectively. A gene tree of all the TAS2R genes was showen in Fig. S8. The TAS2R gene repertoire in the dolphin Ttru_1.4 assembly is consistent with a previous study [19] except for TAS2R62BP, which is not annotated in the previous study.
We constructed the multiple alignment of each TAS2R orthologous gene set using E-INS-i program in the MAFFT package [5,6] to understand when cetacean TAS2Rs became pseudogenes. We found that both mysticetes and odontocetes share the same pseudogenization mutations in all sets of orthologous TAS2R genes except for TAS1R16 and TAS2R67. Whales and dolphins do not share any pseudogenization mutations in the TAS2R16 gene, and TAS2R67 gene, which we could not find in the dolphin genome, is intact in the minke whale genome. This indicates that the LCA of mysticetes and odontocetes possessed at least 2 intact TAS2Rs, TAS2R16 and TAS2R67.
We amplified and sequenced the TAS2R16 and TAS2R67 genes in several cetartiodactyls using several sets of primers shown in Table S6 to validate this indication (see Table S7). We confirmed that the TAS2R16 gene has been pseudogenized independently in mysticetes and odontocetes (Fig. S9a). TAS2R67 was not amplified in any odontocetes (Table S7). This gene may be deleted completely in all odontocetes. Not only minke whales but also sei whales possess intact TAS2R67 gene. Hippopotamus lost this gene (Fig. S9b). PCR and sequencing R notes: a A pair of PCR primers 16F and 16R is designed based on Kujira_1.0 assembly to amplify a 1,171bp-length region including the whole amino acid coding region of TAS2R16 gene. b A pair of primers 67F and 67R is designed based on KUjira_1.0 assembly to amplify a 1,414bplength region including the whole amino acid coding region of TAS2R67 gene, and a pair of primers 67inF and 67inR is designed to amplify a partial sequence of the amino acid coding region of TAS2R67 gene (833bp-length, used to amplify hippopotamus TAS2R67). Figure S8. A NJ tree of TAS2R genes in five mammals. This tree was based on a multiple alignment of deduced amino acid sequences of all TAS2Rs annotated in this study, including truncated genes and pseudogenes. Human and mouse TAS2R sequences annotated by Hayakawa et al. [20] were added. TAS2R16 and TAS2R67 subtrees are highlighted. The evolutionary distance was calculated for each sequence pair (the pairwise-deletion option) in amino acid sequences based on a Poisson correction distance matrix. Bootstrap values were obtained by 1000 resamplings.  Table S7. Red branches indicate possession of pseudogenes.