Positive selection on secretory and structural components of salivary glands within the ecologically diverse bat family Phyllostomidae

The leaf-nosed bats (Phyllostomidae) are outliers among chiropterans with respect to the unusually high diversity of dietary strategies within the family. Salivary glands, owing to their functions and high ultrastructural variability among lineages, are proposed to have played an important role during the phyllostomid radiation. Salivary gland secretory products directly interact with food materials and pathogens which can provide selective interactions. To identify genes underlying salivary gland functional diversification, we sequenced submandibular gland transcriptomes from phyllostomid species representative of divergent dietary strategies. From the assembled transcriptomes, we identified and tested 3,266 single gene orthologs and identified 57 evolving under positive selection. All enriched gene ontology terms were related to defense against other organisms and the cell membrane/extracellular environment. Based on GO annotation, many of the defense-related loci under selection form secretory products. Salivary glands perform a complex array of tasks, and we identified positive selection on additional proteins bound to membranes and in the extracellular environment with multiple functions. Although illuminating exact function is difficult, results suggest that regulatory release of secretory products, not the products themselves, is disproportionately shaped by positive selection on coding sequence. Identified instances of selection on such cellular components may help explain the ultrastructural variability of salivary glands previously documented.

Morphometric and evolutionary models suggest the ecological and phenotypic diversity among lineages and species are associated with strong selection on dietary specialization features [3,[10][11][12].
Bats exhibit several characteristics rendering them interesting to examine, most uniquely their ability 3 to fly and echolocate. Furthermore, bats act as vectors for zoonotic diseases including SARS coronavirus, Ebola, Nipah virus, and rabies [13][14][15]. A lesser examined characteristic thought to be directly linked to their dietary multiplicity and adaptation is the anatomical diversity of their salivary glands [16,17]. In mammals, there are typically three major pairs of salivary glands: the submandibular, parotid, and sublingual glands, and all use intracellular processes that involve the synthesis, modification, packaging and secretion of proteins in membrane-bound granules. These glands are considered part of the digestive system as they secrete digestive enzymes, however their products perform additional functions including antimicrobial resistance and biochemical communication [18][19][20][21][22].
Tandler and Phillips [23] have shown that among 55 species of bats, secretory products were correlated with diet, especially in insectivorous species. Further, Phillips et al. [24] found evidence that salivary glands play a role in lipid metabolism in Myotis lucifugus. Because of this wide variation and their direct link to immunity, diet and reproduction, SMGs and their products are hypothesized to play an important role in the adaptive radiation of mammals [25].
Linking genetic variation and gene products to selection and adaptation is still a challenge in evolutionary biology. More recently, the implementation of next-generation sequencing has facilitated the search for signatures of adaptation through DNA and/or RNA comparative analyses. Indeed, sequencing transcriptomes offers an in-depth, data rich method to identify selective pressures that have influenced the evolution of tissue-specific genes. Of interest are genes that have undergone positive selection to adapt physiological, immunological and ecological processes to new environments [26,27].
Selection analyses among bat genes have generally been limited to few species for specific purposes [28,29]. Hawkins et al. [26] analyzed orthologs from 18 different bat genomes and transcriptomes and found most genes under selection were related to immunity and collagen production. Here, Phyllostomidae was examined specifically because their extensive and relatively rapid radiation to new feeding strategies. We sequenced the SMG transcriptomes of nine phyllostomid bats representing different subfamilies and different diets, and through analysis of orthologs characterized 4 how selection on coding sequence has shaped SMG evolution.

Results
To investigate putative proteins underlying the evolution of SMGs we performed RNASeq on 11 Chiroptera species: nine phyllostomids with varying dietary strategies, and two outgroups representing the bat ancestral state of insectivory (figure 1). Among the 11 samples, between 53,757 and 105,666 transcripts were assembled per species. Over 87% of the paired-reads mapped back to each assembly indicating most reads were used. Among the transcripts, we predicted between 17,820 and 54,578 ORFs, and among the ORFs, we found between 5,392 and 9,580 unique hits to the SwissProt database (Additional file 2: Table S1). The most similar isoform to the reference sequence was chosen to represent the annotation. SwissProt annotations were not used a priori for ortholog clustering. However, if after clustering there were multiple SwissProt annotations in a single gene ortholog group, the most common annotation was used to describe the ortholog group.
In all, OrthoMCL grouped 79,817 annotations into 10,267 orthologous groups that included between 2 and 45 members (Additional file 1: Figure S1A). Among the orthologs groups, 4,800 groups represented single gene families present in seven or more species and almost half of these families were found in all 11 species (Additional file 1: Figure S1B). Among these 4,800 groups, 4,667 (97%) had the same SwissProt annotation among orthologs, suggesting overall consistency between OrthoMCL clustering and Trinotate annotations. In the remaining 133 groups, there was one dominant gene annotation and the outliers were likely a result of improper classification by Trinotate, given BLAST hits are often closely related paralogs and not true orthologs, which seemed to the case in most instances. Therefore, the annotations were manually corrected.
To additionally test if the ortholog predictions were generally accurate and DNA sequences largely reflected species relationships, we generated a phylogenetic tree from the concatenated alignments of 500 randomly sampled orthologs. Although individual gene trees may not reflect a species trees, we expected that a consensus would emerge from large volume of DNA sequences if orthologs were accurately predicted and aligned. We found that the resulting concatenated tree accurately reflected previous species trees generated by [3,8,49], with high bootstrap support at each node (figure 1).

Adaptation among SMG proteins
Among the 4,800 single gene orthologs, 3,266 had alignments with enough shared sites to produce output from CODEML. After correcting for FDR, we found 57 genes where models of evolution allowing positive selection were significantly better fit to the data than neutrality in both the M2a and M8 tests (Table S2).
To contrast background salivary gene functions to those under selection, we summarized the GO terms of the 3,266 tested genes using the REVIGO server ( figure 2A). Overall most of these gene products are thought to be localized to the cytosol, but many were also positioned in the plasma membrane and extracellular region. The majority of terms in the "molecular function" category was protein binding, zinc ion binding and DNA binding. Background cellular processes were also apparent from the molecular function GO term analysis, i.e. transcription, transcription regulation and protein regulation. Similarly, common biological processes identified were related to transcription regulation, cell death, and protein synthesis and transport. By contrast, the terms of the genes under selection assigned to the three major categories (molecular function, cellular component and biological process) were largely different from those obtained in our global assessment of protein function (figure 2B). These proteins were largely receptors (molecular function), in the extracellular exosome or plasma membrane (cellular component) that are a part of the immune response (biological process). Interestingly, terms related to diet, like carbohydrate or lipid metabolism, were not associated with genes under positive selection.
We used PANTHER's over-representation test to determine whether any terms were significantly overrepresented among the genes under selection. There were no molecular function terms that were significantly more common among the genes under selection. However, there were significantly over- survey of the multiple GO terms assigned to each gene we found that 11 of the 57 genes were involved in the immune response (Additional file 3: Table S2). Moreover, eight of the 11 (73%) immunity-related loci had secretory annotations, and a binomial test was used to quantify enrichment for secretory products among immune genes under selection (p = 0.012). Of the remaining 46, 17 genes were membrane bound, three were secreted, and two had terms for both membranes and secretion.

Discussion
Salivary glands secrete products that have important biological role in diet/digestion, oral health, and communication. They exhibit a high degree of morphological variation even among individuals of the same species and their role in the adaptation of mammals to novel niches has been largely under investigated. Additionally, salivary proteins are among the first to directly encounter food and introduced pathogens, which lends biological importance in the context of adaptive evolution. Here, we isolated and sequenced the genes expressed from the SMGs of phyllostomid bats, the most ecologically diverse family of mammals, to shed light into how positive natural selection has driven adaptation within the group. Among the sequenced transcriptomes we were able to annotate between 5,392 and 9,580 expressed proteins, which were successfully clustered as 4,800 single gene orthologs. Out of the 3,266 single gene orthologs tested, only a small proportion, 1.7% of genes, were under selection. Consistently, Hawkins et al. [26] yielded the same frequency of genes under selection from the transcriptomes and genomes of 18 bat species using multiple tissue sources (e.g. spleen, trigeminal ganglia, inner ear, embryos, etc. transcriptomes). This suggests that overall the genes expressed in SMGs are not necessarily subjected to stronger selective pressures in coding regions than genes expressed in other tissues. Given the diverse function and morphology of salivary glands previously shown [23], current results could indicate a role for regulatory evolution in SMG diversification, which can be assessed with expanded sample sizes per lineage.
Salivary glands perform a complex array of tasks and can present the most varied cellular ultrastructure in distinct taxa [23]. The evolution of cellular structure and the resulting array of phenotypes could be attributed to gene regulation, alternative splicing, and gene sequence evolution.

7
The great morphological variation observed in the ultrastructure of phyllostomid bat salivary gland granules led Phillips et al. [25] to speculate that adaptation to novel niches drives salivary gland evolution. SMG acinar cells are responsive to a variety of extracellular signals which can affect gene expression, protein synthesis, and protein modifications, and this responsiveness can be driven by the density and distribution of cell surface receptors [16]. Consistently, we found that out of the 46 nonimmune genes, 25 were either a membrane bound, secreted, or both. Unfortunately, membrane receptors could be linked to many functions. For example, basigin (BASI) is a globally expressed receptor linked to spermatogenesis, tumor progression, neural network formation and odontogenesis, among many other processes [50]. Therefore, it was difficult to parse and identify the function of each receptor expressed in salivary glands. However, molecular changes among receptors would likely alter the regulation and release of secretory products with a variety of functions.
A notable aspect of the ontological enrichment for genes with known roles in immunological or secretory processes provides indication about the specific cellular settings in which they function. For example, salivary glands are known to function in immunity through the localized function of immune cells [51]. Eleven of the 57 proteins inferred to be affected by positive selection were linked to immunity and defense, and eight of these had a secretory component (Additional file 3: Table S2).
These 11 genes had common functions associated with the innate and humoral immune response and stimulating the activation of immune cells to viral infected sites. For example, among these genes, were BPIA1 and BPIA2, both of which are secreted and known to localize to upper airways and prevent biofilm formation by gram-negative bacteria [52,53]. A high exposure to pathogens unique to bats could explain this result, but positive selection among immune genes is not uncommon in animals [54][55][56]. Defense and immunity genes typically have a high evolutionary rate attributed to a continuous arms race between pathogens and host immune response [57][58][59], and our data provide insight about how positive selection has shaped bat oral tract defense against introduced pathogens.
Interestingly, Schneeberger et al [60] found that total white blood cell (WBC) counts in neotropical bats varied among bat species according to their dietary niches, where carnivorous bats had the highest WBC counts, insectivores had the lowest, and frugivorous had intermediate amounts. They 8 hypothesized carnivorous species are at the highest risk of obtaining infectious diseases, as parasite and pathogen transmission is expected to be facilitated from prey, especially when those are relatively related species. Nevertheless, frugivorous species may still face a notable risk of acquiring pathogens by ingesting contaminated plant matter [61][62][63], and plantivores will also consume insects as a non-major dietary component. We examined the percentage of immune genes expressed in each species to determine if the same pattern was also present (Additional file 2: Table S1), but found that carnivorous and sanguivorous species expressed the fewest immune genes at 3.5 and 3.1%, respectively, and the highest was only 4.45% in M. lucifugus. This suggests is that generally the same immune genes are expressed across lineages, but selection is shaping gene sequences in speciesspecific context.

Conclusions
Salivary glands are informative target organs to study the link between genetic variation and signatures of potential adaptations. Analysis of orthologs expressed in salivary transcriptomes enables detection of positive selection that helps explain the wide variation in cellular structures related to the regulation of salivary products released into the secretory ducts. Moreover, the identification of immunity-related genes under selection in the SMG is consistent with the necessity of hosts to continuously fend pathogens to which they are exposed in natural environments.

Methods
Nine species were chosen to maximize represention of the phylogenetic and dietary diversity of Phyllostomidae (figure 1). We also included two insectivorous bats, Myotis lucifugus from Vespertilionidae and Pteronotus parnellii from Mormoopidae, as outgroups. Tissues were extracted and frozen in liquid nitrogen within five minutes after euthanasia. Additional details from tissue loans provided by the Natural Science Research Laboratory (NSRL) of the Museum of Texas Tech University can be found in Additional file 1: Table S1.

RNA isolation, sequencing, and assembly
RNA was extracted from SMGs of each bat using Trizol (Invitrogen, Carlsbad, CA, USA) following manufacturer protocols. Oligo-dT magnetic beads were used to enrich for mRNA strands with poly-A tails and a strand-specific paired-end cDNA library was prepared using a ScriptSeq kit (Epicentre, Madison WI USA). Libraries were sequenced on Illumina platform (see Additional file 1: Table S1). For each species, paired-end reads were filtered for quality using Trimmomatic 0.36 [30] with the parameters LEADING:20 TRAILING:20 SLIDINGWINDOW:1:30 MINLEN:75 (MINLEN:50 was used for D. rotundus), and de novo transcript assemblies were constructed from clean read pairs with Trinity v2.7 [31] using default settings.

Gene discovery and annotation
Putative open reading frames and translated peptide sequences were identified using TransDecoder [32] and the resulting peptide sequences were processed through the Trinotate pipeline to identify functional properties and Gene Ontology (GO) annotations associated with biological processes, molecular functions, and cellular components. To summarize, peptide sequences were queried against the SwissProt database [33] using BLASTP [34] and the Pfam [35] database using hmmer 3.2.1 [36]. Peptides were also scanned for a signal peptide using signalP [37], and transmembrane domains using tmhmm 2.0 [38].

Orthology clustering
Orthology assignment is still a major challenge in bioinformatics and evolutionary biology. Here, we developed a process to filter out similar transcripts produced by Trinity. The first step was to assume similar sequences would have similar Trinotate annotations. We parsed the Trinotate output to identify the best sequence representative for each unique SwissProt annotation. To choose the best gene representative among multiple coding sequences (CDSs) with the same SwissProt annotation, we multiplied the length of the CDS to the percent identity of the SwissProt hit. This metric correlated with E value, but effectively acted as a bit score when E values were identical. The CDS with the highest metric was chosen to represent the annotation. If a CDS did not have a SwissProt annotation, it was removed. We then ran combined best sequences from all species, and ran this dataset through the OrthoMCL [39] pipeline to identify orthologous groups. Only single gene ortholog groups were used in downstream selection tests.
Poor ortholog assignment can influence sequence relationships in a phylogeny, and since the relationships among phyllostomids are robust, a reasonable test of ortholog assignment would be to reconstruct a phylogenetic tree and determine if the resulting tree reflects previously described relationships. Therefore, we reconstructed a phylogenetic tree from 500 randomly sampled single gene orthologs shared among all 11 individuals. Each orthologous group was translated, aligned using LINSi parameters in MAFFT [40] and reverse translated to construct a codon alignment. Resulting alignments were concatenated and we used RAxML [41] to find the best tree from the un-partitioned dataset using the ML and rapid bootstrapping algorithm, a GTRGAMMA model of nucleotide substitution, and with 100 starting trees.

Identifying genes under selection
Single gene orthologous groups that were found in seven or more individuals were tested for evidence of selection using the maximum likelihood approach described by Goldman and Yang [42]. Amino acid sequences were aligned using LINSi parameters in MAFFT [40], and the resulting alignment was reverse translated. Each gene tree was individually estimated using RAxML. We used CODEML in PAML [43] to estimate the role of selection on gene evolution by comparing the rate of nonsynonymous substitution per nonsynonymous site (d N ) to the rate of synonymous substitution per synonymous site (d S ). The d N /d S ratio can be used as a sensitive measure of selective pressure; however, in most cases the overall d N /d S is less than one and only a few amino acid sites are evolving quickly. Therefore, to determine whether a gene was evolving adaptively, we calculated the likelihood of models that allow d N /d S to vary among codon sites (M1a, M2a, M7, and M8). For all genes we used likelihood ratio tests (LRTs) to compare nested models that allow and disallow codon site d N /d S to be greater than one (M2a v M1a, M8 v M7), and to test for significant differences between nested models [44]. If both models that allow d N /d S rates to be greater than one (M2a and M8) were significantly better fits to the data (LRT, p < 0.05), we inferred these genes were evolving under positive selection.
We performed a false discovery rate correction (FDR) on the p-values resulting from the M2a v M1a LRT because the M2a tests were more conservative than M8 and resulted in fewer LTR tests with p < 0.05. FDR was estimated using the qvalue function in the "qvalue" R module [45,46].

Functional analysis of genes under selection
For all single gene orthologs tested for selection we counted the number of reoccurring GO terms. We then used the GO term proportion to summarize the general function of genes tested in the SMGs of phyllostomids using REVIGO [47], which nests redundant and similar terms from long GO term lists by semantic clustering. We repeated the same analysis using GO terms described from orthologs under positive selection. To test for GO term enrichment for the genes under selection, PANTHER [48] was applied to identify statistically overrepresented GO terms from the list of genes under positive selection using the genes tested as the reference list. We used the Fisher's exact test and applied the FDR correction and statistical significance was determined below an adjusted p-value < 0.05. We estimated whether a protein was membrane bound, a receptor, immune related, or secreted by searching for specific GO terms. If "secretion", "extracellular space" or "extracellular region" was present among GO terms, we annotated the gene as secreted. If "immune", "defense", or "antimicrobial" was present, the gene was annotated as an immune. If "membrane" was present we called the gene membrane bound and if "receptor" was present, we called the gene a receptor (Additional file 3: Table S2). Figure 1 Schematic of species sampled, their relationships, and general dietary strategies. 20 Figure 2 Tree map produced from the analysis of redundant GO terms in REVIGO for A) all GO terms linked to genes tested and B) GO terms linked to genes under selection.