Comparative Evolution of Sand Fly Salivary Protein Families and Implications for Biomarkers of Vector Exposure and Salivary Vaccine Candidates

Sand fly salivary proteins that produce a specific antibody response in humans and animal reservoirs have been shown to be promising biomarkers of sand fly exposure. Furthermore, immunity to sand fly salivary proteins were shown to protect rodents and non-human primates against Leishmania infection. We are missing critical information regarding the divergence amongst sand fly salivary proteins from different sand fly vectors, a knowledge that will support the search of broad or specific salivary biomarkers of vector exposure and those for vaccines components against leishmaniasis. Here, we compare the molecular evolution of the salivary protein families in New World and Old World sand flies from 14 different sand fly vectors. We found that the protein families unique to OW sand flies are more conserved than those unique to NW sand flies regarding both sequence polymorphisms and copy number variation. In addition, the protein families unique to OW sand flies do not display as many conserved cysteine residues as the one unique to the NW group (28.5% in OW vs. 62.5% in NW). Moreover, the expression of specific protein families is restricted to the salivary glands of unique sand fly taxon. For instance, the ParSP15 family is unique to the Larroussius subgenus whereas phospholipase A2 is only expressed in member of Larroussius and Adlerius subgenera. The SP2.5-like family is only expressed in members of the Phlebotomus and Paraphlebotomus subgenera. The sequences shared between OW and NW sand flies have diverged at similar rates (38.7 and 45.3% amino acid divergence, respectively), yet differences in gene copy number were evident across protein families and sand fly species. Overall, this comparative analysis sheds light on the different modes of sand fly salivary protein family divergence. Also, it informs which protein families are unique and conserved within taxon for the choice of taxon-specific biomarkers of vector exposure, as well as those families more conserved across taxa to be used as pan-specific vaccines for leishmaniasis.

Sand fly salivary proteins that produce a specific antibody response in humans and animal reservoirs have been shown to be promising biomarkers of sand fly exposure. Furthermore, immunity to sand fly salivary proteins were shown to protect rodents and non-human primates against Leishmania infection. We are missing critical information regarding the divergence amongst sand fly salivary proteins from different sand fly vectors, a knowledge that will support the search of broad or specific salivary biomarkers of vector exposure and those for vaccines components against leishmaniasis. Here, we compare the molecular evolution of the salivary protein families in New World and Old World sand flies from 14 different sand fly vectors. We found that the protein families unique to OW sand flies are more conserved than those unique to NW sand flies regarding both sequence polymorphisms and copy number variation. In addition, the protein families unique to OW sand flies do not display as many conserved cysteine residues as the one unique to the NW group (28.5% in OW vs. 62.5% in NW). Moreover, the expression of specific protein families is restricted to the salivary glands of unique sand fly taxon. For instance, the ParSP15 family is unique to the Larroussius subgenus whereas phospholipase A2 is only expressed in member of Larroussius and Adlerius subgenera. The SP2.5-like family is only expressed in members of the Phlebotomus and Paraphlebotomus subgenera. The sequences shared between OW and NW sand flies have diverged at similar rates (38.7 and 45.3% amino acid divergence, respectively), yet differences in gene copy number were evident across protein families and sand fly species. Overall, this comparative analysis sheds light on the different modes of sand fly salivary protein family divergence. Also, it informs which protein families are unique and conserved within taxon for the choice of taxon-specific biomarkers of vector exposure, as well as those families more conserved across taxa to be used as pan-specific vaccines for leishmaniasis.

INTRODUCTION
Vector borne diseases represent almost half of the neglected tropical infectious diseases. When attempting to get a blood meal, most vectors of disease deliver the pathogen in the host skin and together with the pathogen these arthropods deliver saliva (Coutinho-Abreu et al., 2015;de Castro et al., 2017). Blood sucking arthropods secrete a plethora of bioactive compounds in their saliva to counteract the mammalian host hemostatic system in order to get a successful blood meal. Salivary antihemostatic components such as anticoagulants (Chagas et al., 2014), vasodilators (Ribeiro et al., 1989;Lerner et al., 1991;Champagne and Ribeiro, 1994), and inhibitors of platelet aggregation (Calvo et al., 2007(Calvo et al., , 2011Assumpcao et al., 2013) have been described and some of these proteins characterized at the molecular level (Coutinho-Abreu et al., 2015). In addition, the biological activity of some arthropod salivary proteins was shown to promote the establishment of pathogens in the mammalian host (Coutinho-Abreu et al., 2015;de Castro et al., 2017). Furthermore, immunity to specific sand fly salivary proteins was shown to protect rodents and non-human primates against leishmaniasis (Kamhawi et al., 2000;Gomes et al., 2008;Collin et al., 2009;Oliveira et al., 2015).
It is well established that humans and animal reservoirs make antibodies to proteins in the saliva of insects including those present in sand flies. These findings have prompted research groups to explore the use of sand fly salivary proteins as markers of sand fly exposure for humans and animal reservoirs (Teixeira et al., 2010;Drahota et al., 2014;Marzouki et al., 2015;Sima et al., 2016;Kostalova et al., 2017). A recombinant sand fly salivary protein of 43 kDa (rSP03B), from the sand fly P. pernicious, belonging to the yellow family of proteins was shown to be a marker of sand fly exposure in dogs living in Southern and Central Italy and in Portugal (Drahota et al., 2014;Kostalova et al., 2017). Interestingly, the yellow related protein from P. orientalis (rPorSP24) was demonstrated to be a good marker of sand fly exposure in domestic animals including dogs from a L. donovani foci in Ethiopia (Sima et al., 2016). For humans living in visceral leishmaniasis disease endemic areas in Brazil, the combination of salivary proteins LJM11 and LJM17 (yellow related proteins) was demonstrated to be the best biomarker of Lutzomyia longipalpis exposure in humans (Teixeira et al., 2010), while Linb13, a protein of 30 kDa belonging to the antigen-5 family of proteins, was shown to be the best biomarker of Lutzomyia intermedia exposure in humans living in a cutaneous leishmaniasis endemic area (Carvalho et al., 2017). The salivary protein PpSP32 from P. papatasi was demonstrated to be the marker of P. papatasi exposure for humans living in cutaneous leishmaniasis endemic areas in Tunisia (Marzouki et al., 2015) and in Saudi Arabia (Mondragon-Shem et al., 2015).
The use of sand fly salivary proteins as markers of vector exposure represents therefore a practical application that can be implemented in epidemiological studies as well as for vector control programs. Therefore, having a well-defined catalog of sandfly salivary proteins as well as a better understanding of the evolutionary relationship of salivary proteins from different sand fly vectors will allow us to make more precise selection for these appealing biomarkers. In the current study, we focused on the evolutionary analysis of protein families unique to Old World (OW) sand flies comparing that with the protein families shared between OW and New World (NW) sand flies as well as those unique to NW sand flies. This comparative analysis also unveiled that the some salivary protein families unique to OW sand flies emerged and diversified in different manners than their counterparts unique to NW sand flies. These findings can inform which proteins are the best candidates to be used as a pan-specific or species-specific biomarkers of sand fly exposure or as well as a pan-specific vaccine against leishmaniasis.

Sequences
Nucleotide and amino acid sequences were retrieved from the NCBI databases from sand fly salivary gland transcriptomes (Valenzuela et al., 2001(Valenzuela et al., , 2004Anderson et al., 2006;Oliveira et al., 2006;Hostomská et al., 2009;Abdeladhim et al., 2012Abdeladhim et al., , 2016Rohousova et al., 2012;de Moura et al., 2013;Kato et al., 2013;Martín-Martín et al., 2013;Vlkova et al., 2014). Signal peptides were removed from the protein sequences whereas sequences encoding signal peptides and stop codons were removed from the nucleotide sequences for further analyses. Only sequences displaying more than 5% divergence at the amino acid level were assumed to be encoded by true paralog genes and included in the analyses, rather than being alleles of the same gene and otherwise discarded. Sand fly groups, species, and sequence accession numbers are provided in Supplementary Table 1.

Sequence Alignment
Multiple sequence alignments of putative peptides were carried out using Clustal Omega built in the MacVector software 15.8 (Olson, 1994) and in MEGA7 (Kumar et al., 2016). For the construction of phylogenetic trees, the gap penalties were not taken into account in the multiple sequence alignments.  Frontiers in Cellular and Infection Microbiology | www.frontiersin.org FIGURE 1 | Phylogenetic analysis of ParSP17 protein family. The evolutionary history was inferred by using the Maximum Likelihood method based on the Le Gascuel matrix-based model (Gomes et al., 2008). Sand fly species are indicated by the different symbols in the legend on the right. Tree branches were color-coded so as to represent specific taxon: Green color represents the Larroussius and Adlerius subgenera; Blue color points to proteins of the Phlebotomus and Paraphlebotomus subgenera; and Black color indicates the proteins belonging to New World sand flies. Although the PduK84 sequence is illustrated here, it was not included in further analyses because it is truncated.

DNA Polymorphism, Protein Divergence, and Phylogenetic Analysis
The evolutionary analyses were performed in the DnaSP 5.10 software (Librado and Rozas, 2009). The parameter ω refers to the rate of non-synonymous nucleotide polymorphisms (Ka) over the synonymous rate of nucleotide polymorphisms (Ks) (Nei, 1987). Slide window analyses of ω along the nucleotide sequences encoding such proteins were also obtained. The diversity of the protein family sequences refers to the p-distance [proportion (p) of amino acid sites at which the two sequences to be compared are different (Nei, 2000)] obtained in the MEGA7 software (Kumar et al., 2016). The evolutionary histories of salivary protein families were inferred by using the Maximum Likelihood method and conducted in MEGA7 (Kumar et al., 2016). The amino acid substitution model was selected based on the best fit provided by the Model Selection tool built in the MEGA 7 software. The bootstrap consensus trees inferred from 1,000 replicates (Felsenstein, 1985) were taken to represent the evolutionary history of the taxa analyzed (Felsenstein, 1985). Branches corresponding to partitions reproduced in <50% bootstrap replicates are collapsed. Initial tree(s) for the heuristic search were obtained by applying the Neighbor-Joining method to a matrix of pairwise distances estimated using a JTT model (Felsenstein, 1985).

Statistical Analysis
D'Agostino and Pearson normality test was performed to assess whether or not data follow a normal distribution, and the Mann-Whitney test was carried out in order to test the significance of the differences in protein divergence and omega (ω) values. Both tests were performed using the Prism 7 software (GraphPad).

Updated Sand Fly Salivary Protein Catalog
From 14 sand fly salivary gland transcriptomes we compiled the sand fly salivary proteins unique to either NW or OW sand flies as well as protein families common to all sand fly species and developed an updated catalog for all the analyzed sand fly salivary proteins (Table 1). With this extensive catalog of sand fly salivary proteins we identified 12 protein families shared between OW and NW sand flies, lufaxin, apyrase, yellow-related protein, silk-related protein or PpSP32, endonuclease, hyaluronidase, adenosine deaminase, small odorant binding-like proteins, D7 family of proteins, antigen-5 related protein, ParSP17 (39 kDa protein of unkown function), ParSP80 (16 kDa protein of unkown function) (Figure 1 and Supplementary Figures 4-15), seven protein families unique to OW sand flies, pyrophosphatase, phospholipase A2, ParSp15 (5 kDa protein of unknown function), ParSP25 (32 kDa protein of unknown function), SP16  Figures 1-3). Two protein families unique to OW sand flies that are only shared by two sand fly species (ParSP23-like and ParSP2.5like; Supplementary Figures 2,3). We also identified 12 protein families unique to NW sand flies: toxin, RGD, c-type lectin, 14 kDa protein family, ml domain, 5 ′ nucleotidase, 9 kDa protein family, salo, 11.5 kDa protein family, maxadilan, 71 kDa protein family, and 5 kDa protein family.

Salivary Proteins Shared Between Old World and New World Sand Flies
The phylogenetic trees of 9 out of 12 salivary proteinencoding gene families shared between OW and NW sand flies (Supplementary Figures 4-15) correlated well with the sand fly species phylogeny, constructed based on ITS-2 sequences (Aransay et al., 2000). The resulting phylogenetic tree indicates that these shared gene families of sand fly salivary proteins have been evolving under purifying (negative) selection or are selective neutral (ω < 1; Barton and Etheridge, 2004). Such salivary protein families encompassed lufaxin, apyrase, yellow-related protein, antigen-5, endonuclease, hyaluronidase, ParSP80, odorant binding proteins (OBPs), and adenosine deaminase. On the other hand, Silk-related protein or PpSP32, the D7 family of proteins (Supplementary Figures 6,12; Abdeladhim et al., 2016) and the ParSP17-like family of proteins (Figure 1) displayed phylogenies diverging from the sand fly species phylogeny, pointing that strong selective forces related to the function and/or immunogenicity of such proteins have driven their evolution since the split of NW and OW sand flies that likely took place about 100MYs with the continental separation of the American continent from the European and African ones.

Molecular Evolution Between OW and NW Sand Fly Salivary Proteins
The overall rates of molecular evolution of the salivary protein families unique to OW sand flies were similar to the families shared with NW sand flies (Figures 2A,B). The median protein divergences (% divergence) for the families unique to OW sand flies and shared with the NW ones were 41.4 and 35.5%, respectively (Figure 2A). Likewise, the median ratios of nonsynonymous over synonymous replacements (ω) for the families unique to OW sand flies and shared with the NW ones were 0.34 and 0.23, respectively ( Figure 2B). This is in sharp contrast to the pattern of divergence observed in the protein families unique to NW sand flies when compared with their counterparts shared with Old World sand flies (Figures 2C,D), as observed elsewhere (Abdeladhim et al., 2016). In the protein families unique to NW sand flies, the median protein divergence was 54.7% whereas 39.3% divergence was noticed for the protein families shared with OW sand flies ( Figure 2C, p < 0.0008). By the same token, the median ω value was significantly greater in the protein families unique to NW sand flies (median ω = 0.49) as compared to those shared with OW sand flies (median ω = 0.2, p < 0.0008; Figure 2D).

Molecular Evolution Within OW and NW Sand Flies
The molecular evolution of the protein families was also taken into account individually, in regard to the analyses of % divergence and the ratio of non-synonymous over synonymous replacements (ω; Figure 3). Among the protein families unique to OW sand flies, phospholipase A2, pyrophosphatase, and ParSP25-like displayed the lowest rates of diversification (divergence < 30%; ω < 0.35; Figures 3A,B) whereas ParSP2.5, ParSP23, and SP16 were among the most divergent families (divergence > 50%; ω > 0.44; Figures 3A,B). Within the OW sand fly protein families shared with NW sand flies, ParSP80, and hyaluronidase families showed consistently the lowest rates of sequence divergence and non-synonymous to synonymous codon replacement (divergence < 12%; ω < 0.12; Figures 3A,B). In contrast, the OBPs, silk-related, D7, and ParSP17 families exhibited the greatest rates of diversification (divergence > 39%; ω > 0.26; Figures 3A,B). Regarding the protein families unique to NW sand flies, the SALO, maxadilan, and ML domain families presented the highest rates of molecular evolution (divergence > 60%; ω > 0.62; Figures 3C,D; Abdeladhim et al., 2016) whereas the 5 kDa protein family presented itself as the least divergent protein family (divergence = 36%; ω = 0.26; Figures 3C,D; not previously assessed). Amongst the protein families of NW sand flies shared with OW species, the D7 family presented the highest levels of diversification (divergence = 43%; Figure 3C) whereas the antigen-5 protein family exhibited the lowest divergence at the amino acid level (divergence = 19%; Figure 3C). Regarding the ω values, the silk family displayed the highest ratio of non-synonymous over synonymous replacements (ω = 0.30; Figure 3D) whereas the lowest ratio was exhibited by the apyrase family. Regarding the OBP family, it is actually a superfamily, encompassing at least six protein families (Supplementary Figure 15). Hence, the high rates of divergence for the OBPs should be view with caution, as noted below.

Molecular Diversification Along Protein Lengths
As specific regions of a protein can be subjected to different selective constraints and in turn evolve at different rates, slide-window analyses of non-synonymous over synonymous replacements (ω) for the genes encoding the salivary protein unique to OW sand flies and shared with NW sand flies were carried out (Figures 4, 5). Contrasting to the genes encoding the protein families unique to NW sand flies, which display multiple codons under positive (diversifying) or relaxed purifying selection (ω ≥ 1; Abdeladhim et al., 2016), the gene families unique to Old World sand flies exhibited for the most part codons under purifying (negative) selection (ω < 1; Figure 4). In the latter group, only ParSP15 and phospholipase A2 bore a few codons under positive selection (ω ≥ 1; Figure 4), mostly in the 5 ′ portion of the genes. The gene families shared with NW sand flies also displayed low levels of sequence divergence (Figure 3), with exception of the ParSP17 family which bear multiple codons under positive selection (Figure 5). In such (ParSP17; Figure 5A), a greater number of codons under positive selection were noticed in the gene sequences for the sand flies belonging to the related Phlebotomus and Paraphlebotomus subgenera (P/P; Figure 5B) than for the sand flies belonging to related Larrossious, Adhelerius, and Euphlebotomus subgenera (L/A/E; Figure 5C).

Copy Number Variation of Salivary Protein-Encoding Genes Across Species
As important as sequence polymorphism (SNPs and INDELs) for the evolution of gene families and emergence of new molecular functions are gene duplication events (Innan and Kondrashov, 2010). Although less than two gene duplication events per species were noticed for two NW sand fly gene families (yellow-related and the small OBPs) shared with OW sand flies (  Figure 15). In the gene families unique to NW sand flies (SALO, RGD, mannose-binding lectin, c-type lectin, and spider toxin-like) up to eight gene duplication events were detected in six gene families (Figure 6; SALO, RGD, mannose-binding lectin, c-type lectin, and spider toxin-like; Abdeladhim et al., 2016). In sharp contrast to the high rate of copy number variation in gene families unique to NW sand flies, no gene duplication event was noticed for the sand fly salivary protein encoding gene families unique to OW sand flies (Figures 6-10 and Supplementary Figures 1-3).

Emergence of Specific Salivary Protein Encoding Genes
Besides the differences in the rates of gene diversification and gene duplication events among protein families, the molecular evolution of salivary protein encoding gene families in sand flies has also displayed different modes of gene emergence as well as specific signatures of protein structure and divergence.
A few salivary gland gene families appear to have arisen upon duplication from a gene expressed in another tissue that subsequently acquired a specific promoter driving expression into the salivary gland, a mechanism called sub-functionalization (neofunctionalization) (Hahn, 2009;Innan and Kondrashov, 2010). Amongst the salivary gene families unique to NW sand flies, the c-type lectin, the mannose-binding lectin, and the spider toxin-like gene families share paralogs expressed in other tissues of sand flies and other unrelated arthropods (Abdeladhim et al., 2016). Similar phenomena were noticed for salivary protein families unique to OW sand flies, such as phospholipase A2 (Tunaz et al., 2003) and pyrophosphatase (Silva et al., 2015), as well as in protein families shared with NW sand flies, like hyaluronidase (Allalouf et al., 1975), endonuclease (Broderick et al., 2014), adenosine deaminase (Dolezelova et al., 2005), and OBPs (Benoit et al., 2017). Hence, such gene families seem to have emerged by subfunctionalization.
The presence of an OBP-like domain in the D7 protein Ctermini (Hekmat-Scafe et al., 2000) points to the emergence of D7 genes by gene fusion (Kaessmann et al., 2009) between an ancient OBP and an unknown gene. In fact, conserved cysteine signatures ( Table 2), as bore by OBPs, can unveil the mechanism of gene birth even among fast evolving genes. Multiple sand fly salivary protein families bear cysteine signatures (≥4 cysteines; Table 2). In the salivary protein families unique to NW sand flies, conserved cysteine residue signatures (62.5%, five out of eight protein families; Table 2) are present at a much higher frequency than in the protein families shared between OW and NW sand flies (50%, 6 out of 12; Table 2). Amongst the protein families unique to the OW sand flies, such proportion is reduced to only 28.5% of the protein families (two out of seven families; Table 2).

Taxon-Specific Expression of Salivary Protein Encoding Genes
In order to adapt to the new ecological niches, sand flies have faced the hemostasis components and immune systems of the mammal and bird species indigenous to their habitat. In order to face such selective pressures, the establishment of new protein variants may have been required for the survival of the sand flies. It is important to mention, nonetheless, that even within OW sand flies, differences in the levels of polymorphisms were noticed. The rates of codons under positive selection varied within subgroups: for instance, higher rates were noticed within the P/P subgenera in ParSP17 (Figure 5). In addition, most of the gene duplication events have taken place in P. duboscqi for the P/P subgenera as well as in P. orientalis and P. perniciosus for the L/A/E subgenera (Figure 6). It is noteworthy that members of the P/P subgenera only express up to three of the salivary proteins unique to OW sand flies (SP2.5, SP16, and/or pyrophosphatase) whereas members of the L/A/E subgenera for the most part secrete salivary proteins of four to five different families (ParSP15, ParSP25, ParSP23, phospholipase A2, SP16, and/or pyrophosphatase; Table 1). Among the later, ParSP15 family FIGURE 7 | Multiple sequence alignment and molecular phylogenetic analysis of the sand fly ParSP15 salivary protein family. (A) Multiple sequence alignment of ParSP15 proteins. ParSP15 (P. ariasi), PorMSP104 (P. orientalis), PpeSP12 (P. perniciosus), PtSP71 (P. tobbi), and PkanSP19 (P. kandelakii). Black background shading represents identical amino acids. Gray background shading represents similar amino acids. (B) The evolutionary history of ParSP15 salivary protein family was inferred by using the Maximum Likelihood method based on the JTT matrix-based model (Jones et al., 1992). Sand fly species are indicated by the different symbols in the legend on the right. is unique to the Larroussius subgenus whereas phospholipase A2 is only expressed in member of Larroussius and Adlerius subgenera. Along the same lines, transcripts for hyaluronidase and endonuclease were missing in the P/P sialotranscriptomes whereas such protein are expressed in NW and L/A/E salivary glands (Table 1).

DISCUSSION
The comparative analysis of salivary protein gene families between OW and NW sand flies reveals that differences in the mode of evolutionary diversification amongst gene families that could have potential implications for the application of such proteins in the selection of markers of vector exposure and vaccines as: (1) At the sequence level, protein families unique to OW sand flies have evolved at a similar pace than those shared between OW and NW species, yet at a slower pace than the families unique to NW sand flies; (2) The evolutionary rates of both the protein/gene families unique to or shared between sand fly groups range from more divergent to more conserved families; (3) Gene families unique to NW sand flies have been diverging for the most part due to positive selection whereas FIGURE 8 | Multiple sequence alignment and molecular phylogenetic analysis of the sand fly ParSP25 salivary protein family. (A) Multiple sequence alignment of ParSP25 proteins. ParSP25 (P. ariasi), PorASP106 (P. orientalis), PpeSP08 (P. perniciosus), PtSP73 (P. tobbi), PkanSP20 (P. kandelakki), and PabSP12 (P. arabicus). Black background shading represents identical amino acids. Gray background shading represents similar amino acids. (B) The evolutionary history of ParSP25 salivary protein family was inferred by using the Maximum Likelihood method based on the JTT matrix-based model (Jones et al., 1992). Sand fly species are indicated by the different symbols in the legend on the right. The evolutionary history of Phospholipase A2 salivary protein family was inferred by using the Maximum Likelihood method based on the JTT matrix-based model (Jones et al., 1992). Sand fly species are indicated by the different symbols in the legend on the right.  (Whelan and Goldman, 2001). Sand fly species are indicated by the different symbols in the legend on the right. Tree branches were color-coded so as to represent specific taxon: Green color represents the Larroussius and Adlerius subgenera; Red color indicates the Euphlebotomus subgenus; Blue color points to proteins of the Phlebotomus and Paraphlebotomus subgenera. the families shared between NW and OW (except ParSP17) as well as those unique to OW sand flies are less divergent due to purifying selection; (4) Events of gene duplication are more often observed in the gene families unique to NW sand fly species, such as SALO, RGD, mannose-binding lectin, c-type lectin, and spider toxin-like, whereas the emergence of new gene copies in OW sand flies is more evident in the gene families shared with NW sand flies, such as antigen-5, apyrase, D7, Silk, yellow-related, and OBPs; (5) Different from NW sand flies (Abdeladhim et al., 2016), the emergence of salivary gland gene families in OW sand flies relies less often on ancient genes bearing sequences encoding conserved cysteine signatures; and (6) a few salivary protein gene families is only expressed in sand flies belonging to specific sub-genera within OW sand flies; for instance, ParSP15 is unique to the Larrossious sub-genus, and SP2.5 is only expressed in members of the Phlebotomus subgenus. Overall, these findings highlight the striking differences in the rates of molecular evolution of salivary protein encoding genes in NW and OW sand flies, which are underscored not only by sequence polymorphisms but also by copy number variation.
Also, sand fly salivary proteins are recognized by the humoral immune system of humans and animal reservoirs previously bitten by sand flies; therefore, some sand fly salivary proteins have been shown to work as biomarkers of vector exposure. For such proteins to be used as biomarker of vector exposure for the identification of taxon-specific sand fly bites, the candidates are supposed to be not only immunogenic and conserved but also expressed only within sand flies belonging to a specific taxon. For species-specific biomarkers the candidates should be divergent or with some level of homology that allows the selection of specific peptides. For specific salivary proteins to be used as vaccine components across species, we predict that the candidates need not only to be immunogenic but also conserved proteins to some extent.
Biomarkers of vector exposure have been identified among the salivary proteins of Lu. longipalpis (Teixeira et al., 2010), Lu. intermedia (Carvalho et al., 2017), P. papatasi (Marzouki et al., 2015), P. perniciosus (Drahota et al., 2014;Kostalova et al., 2017), and P. orientalis (Sima et al., 2016). For the most part, the best salivary protein candidates belong to conserved protein families. Although the less conserved Silk protein (Figure 3) was found to be the best biomarker of P. papatasi exposure (Marzouki et al., 2015), members of the apyrase, ParSP25, and antigen-5 conserved families have been shown as good markers of exposure in P. perniciosus (Drahota et al., 2014;Kostalova et al., 2017), P. orientalis (Sima et al., 2016), and Lu. intermedia (Carvalho et al., 2017), respectively. It is worth noting that proteins of the conserved yellow family (Figure 3) have been found to be the best biomarkers of vector exposure in dogs and humans amongst distantly related sand fly species, such as Lu. longipalpis, P. orientalis, and P. perniciosus (Teixeira et al., 2010).
Regarding sand fly salivary protein vaccines, SALO (Gomes et al., 2008) and lufaxin (Collin et al., 2009) from Lu. longipalpis protects hamsters and dogs, respectively, against Leishmania infantum infection. On the other hand, PdSP15 (PduM02 herein; OBP family) from P. duboscqi (and its P. papatasi ortholog) protects mice (Valenzuela et al., 2001) and non-human primates (Oliveira et al., 2015) against Leishmania major infection. As SALO is the most divergent among the unique salivary protein families in NW sand flies (Figure 3), it is more likely to work as a specific vaccine for L. longipalpis transmitted leishmaniasis. Regarding PdSP15 (PduM02), it belongs to the diverse OBP (super-) family, yet it only shares orthologs with P. papatasi and P. sergenti (Supplementary Figure 15). In fact, PdSP15 orthologs are relatively conserved (% divergence = 36.3%; ω = 0.54). Hence, a PdSP15-based vaccine is restricted to protect leishmania transmission from sand flies of the P/P subgenera. On the other hand, lufaxin is likely the best candidate for a pan-specific vaccine, as it is conserved and shared between OW and NW sand flies (Figure 3 and Supplementary Figure 9).
The data pointing out yellow proteins as the best biomarkers of vector exposure across sand fly species underscores the idea that conserved salivary proteins from the saliva of different sand fly species are recognized, processed, and presented in similar ways by the host immune systems. Thereby, it gives supports to the possibility that conserved yellow epitopes can be used as a pan-biomarker for sand fly exposure. For the same reasons, it is possible that conserved lufaxin epitopes can be used as panspecific vaccines against leishmaniasis transmitted by different sand fly species.
ParSP25 proteins were shown to work as biomarkers of vector exposure for P. perniciosus and P. orientalis (Drahota et al., 2014;Sima et al., 2016;Kostalova et al., 2017). Such a protein family was among the least divergent families unique to OW sand flies (Figures 2A,B) and was only expressed in member of the L/A/E subgenera (Table 1). Thereby, it is likely that ParSP25 epitopes can be used as a taxon-specific biomarker of vector exposure to recognize bites of sand flies from L/A/E subgenera and distinguish them from bites of sand flies belonging to P/P ones that display overlapping habitats.
The evolution of salivary gland gene families unique to NW and to OW sand flies has faced completely different selective pressures, leading to a much faster pace of sequence and copy number variation in NW sand flies and a more constrained divergence in OW species. These evolutionary differences are further highlighted by the fact that the emergence of new gene families unique to NW sand flies relied upon sequences encoding specific cysteine codons, which were less evident in the genes encoding the protein families unique to OW sand flies. Regarding the protein families shared between OW and NW sand flies, the rates of sequence divergence were similar between OW and NW sand flies, yet more gene duplication events were noticed in OW species. Evolutionary differences are noticed even within OW sand flies, as more gene duplication are observed in P/P than in L/A/E, and the expression of some gene families was restricted to either P/P or L/A/E sand fly taxon.
As the salivary protein families shared between OW and NW sand flies are inter-specifically conserved, such proteins are suitable for the development of pan-specific biomarker of vector exposure and pan-specific vaccines. Along the same lines, the more conserved protein families unique to OW sand flies are ideal candidates for the development of taxon-specific biomarkers of vector exposure, as even the expression of such proteins can be restricted to a specific taxon. On the other hand, the development of species-specific biomarkers of vector exposure would require the identification of species-specific epitopes amongst the most divergent proteins families unique to either NW or OW sand flies.

AUTHOR CONTRIBUTIONS
IC-A performed the evolutionary analysis and wrote the manuscript. JV wrote and edited the manuscript. Multiple sequence alignment of Antigen-5 proteins. PPTSP29 (P. papatasi), PduK107 and PduM48 (P. duboscqi), ParSP05 (P. ariasi), PorASP74 (P. orientalis), PpeSP07 (P. perniciosus), PkanSP14 (P. kandelakki), PabSP4 (P. arabicus), PagSP05 (P. argentipes), Luloag-5 (Lu. longipalpis), LayS79 (Lu. ayacuchensis), Linb-13 (N. intermedia), and LolAg5a (B. olmeca). Black background shading represents identical amino acids. Gray background shading represents similar amino acids. Asterisks indicate the conserved cysteine residues. (Bottom) The evolutionary history of Antigen-5 salivary protein family was inferred by using the Maximum Likelihood method based on the Whelan And Goldman model (Whelan and Goldman, 2001). Sand fly species are indicated by the different symbols in the legend on the right. Tree branches were color-coded so as to represent specific taxon: Green color represents the Larroussius and Adlerius subgenera; Red color indicates the Euphlebotomus subgenus; Blue color points to proteins of the Phlebotomus and Paraphlebotomus subgenera; and Black color indicates the proteins belonging to New World sand flies. The evolutionary history of Apyrase salivary protein family was inferred by using the Maximum Likelihood method based on the Le_Gascuel_2008 model (Gomes et al., 2008). Sand fly species are indicated by the different symbols in the legend on the right. Tree branches were color-coded so as to represent specific taxon: Green color represents the Larroussius and Adlerius subgenera; Red color indicates the Euphlebotomus subgenus; Blue color points to proteins of the Phlebotomus and Paraphlebotomus subgenera; and Black color indicates the proteins belonging to New World sand flies.
Supplementary Figure 6 | Multiple sequence alignment and molecular phylogenetic analysis of the sand fly D7 salivary protein family. (Top) Multiple sequence alignment of D7. PPTSP28a (P. papatasi), PduK69 and Pduk103 (P. duboscqi), PsSP7 (P. sergenti), PpeSP04 and PpeSP04B (P. perniciosus), ParSP12 and ParSP16 (P. ariasi), PorMSP28 and PorMSP38 and PorMSP43 (P. orientalis), PtSP42 and PtSP54 and PtSP57 (P. tobbi), PkanSP11 and PkanSP12 (P. kandelakki), PabSP20 and PabSP54 and PabSP59 and PabSP84 (P. arabicus), LJL13 (Lu. longipalpis), LolD7 (B. olmeca), LayS101 (Lu. ayacuchensis), and Linb-42 (N. intermedia). Black background shading represents identical amino acids. Gray background shading represents similar amino acids. Asterisks indicate the conserved cysteine residues. (Bottom) The evolutionary history of D7 salivary protein family was inferred by using the Maximum Likelihood method based on the Whelan And Goldman model (Whelan and Goldman, 2001). Sand fly species are indicated by the different symbols in the legend on the right. Tree branches were color-coded so as to represent specific taxon: Green color represents the Larroussius and Adlerius subgenera; Red color indicates the Euphlebotomus subgenus; Blue color points to proteins of the Phlebotomus and Paraphlebotomus subgenera; and Black color indicates the proteins belonging to New World sand flies. Multiple sequence alignment of Endonuclease. Black background shading represents identical amino acids. ParSP10 (P. ariasi), PpeSP32 (P. perniciosus), PorMSP101 (P. orientalis), PkanSP26 (P. kandelakki), PabSP49 (P. arabicus), PagSP11 (P. argentipes), LolEndo (B. olmeca), Linb-46 (N. intermedia), LJL138 (Lu. longipalpis), and LayS147 (Lu. ayacuchensis). Gray background shading represents similar amino acids. Asterisks indicate the conserved cysteine residues. (Bottom) The evolutionary history of Endonuclease salivary protein family was inferred by using the Maximum Likelihood method based on the Whelan And Goldman model (Whelan and Goldman, 2001). Branches corresponding to partitions reproduced in <50% bootstrap replicates are collapsed. Sand fly species are indicated by the different symbols in the legend on the right. Tree branches were color-coded so as to represent specific taxon: Green color represents the Larroussius and Adlerius subgenera; Red color indicates the Euphlebotomus subgenus; Blue color points to proteins of the Phlebotomus and Paraphlebotomus subgenera; and Black color indicates the proteins belonging to New World sand flies.
Supplementary Figure 8 | Multiple sequence alignment and molecular phylogenetic analysis of the sand fly Hyaluronidase salivary protein family. (Top) Multiple sequence alignment of Hyaluronidase. PperHyal (P. perniciosus), PorASP112 (P. orientalis), PtSP125 (P. tobbi), PkanSP21 (P. kandelakki), PabSP72 (P. arabicus), Linb-54 (N. intermedia), LolHyaz (B. olmeca), LJLHYAL (Lu. longipalpis). Black background shading represents identical amino acids. Gray background shading represents similar amino acids. (Bottom) The evolutionary history of Hyaluronidase salivary protein family was inferred by using the Maximum Likelihood method based on the General Reversible Chloroplast model (Adachi et al., 2000). Sand fly species are indicated by the different symbols in the legend on the right. Tree branches were color-coded so as to represent specific taxon: Green color represents the Larroussius and Adlerius subgenera; Red color indicates the Euphlebotomus subgenus; Blue color points to proteins of the Phlebotomus and Paraphlebotomus subgenera; and Black color indicates the proteins belonging to New World sand flies.
Supplementary Figure 13 | Multiple sequence alignment and molecular phylogenetic analysis of the sand fly Yellow salivary protein family. (Top) Multiple sequence alignment of Yellow. PPTSP42 and PPTSP44 (P. papatasi), PduK06 and PduM10 (P. duboscqi), PsSP22 and PsSP26 (P. sergenti), ParSP04 and ParSP04b (P. ariasi), PorASP2 and PorASP4 (P. orientalis), PtSP37 and PtSP38 (P. tobbi), PpeSP03B and PpeSP03 (P. perniciosus), PkanSP04 (P. kandelakki), PabSP26 (P. arabicus), PagSP04 (P. argentipes), LolYLWb and LolTLWc and LolYLWA (B. olmeca), Linb-21 (N. intermedia), LayS22 and LayS118 (Lu. ayacuchensis), and LJM17 and LJM17 and LJM111 (Lu. longipalpis). Black background shading represents identical amino acids. Gray background shading represents similar amino acids. Asterisks indicate the conserved cysteine residues. (Bottom) The evolutionary history of Yellow salivary protein family was inferred by using the Maximum Likelihood method based on the Whelan And Goldman model (Whelan and Goldman, 2001). Sand fly species are indicated by the different symbols in the legend on the right. Tree branches were color-coded so as to represent specific taxon: Green color represents the Larroussius and Adlerius subgenera; Red color indicates the Euphlebotomus subgenus; Blue color points to proteins of the Phlebotomus and Paraphlebotomus subgenera; and Black color indicates the proteins belonging to New World sand flies. Names in bold represent 44 kDa protein homologs.
Supplementary Figure 15 | Multiple sequence alignment and molecular phylogenetic analysis of the sand fly small Odorant Binding Protein (OBPs) salivary protein family. (Top) Multiple sequence alignment of OBPs. PPTSP12 and PPTSP14 and PPTSP14.2a and PPTSP15 (P. papatasi), PduM12 and PduM60 and PduM07 and PduM50 and PduM57 and PduM31 and PduM49 and PduM58 and PduM62 and PduM99 and PduM02 and PduM03 and PduM06 (P. duboscqi), PsSP14 and PsSP14 and PsSP55 and PsSP9 (P. sergenti), PtSP9 and PtSP17 and PtSP32 and PtSP31 and PtSP18 and PtSP23 (P. tobbi), PpeSP02 and PpeSP09 and PpeSP11 (P. perniciosus), PorASP28 and PorASP31 and PorASP37 and PorASP61 and PorASP61 and PorASP64 (P. orientalis), ParSP03 and ParSP01 and ParSP08 (P. ariasi), PkanSP05 and PkanSP06 and PkanSP07 (P. kandelakki), PabSP45 and PabSP2 (P. ariasi), PagSP93 and PagSP07 and PagSP01 and PagSP12 and PagSP02 and PagSP13 (P. argentipes), LolSOBPa and LolSOBPb and LolSOBPc (B. olmeca), Linb-7 and Linb-8 and Linb-28 (N. intermedia), LayS69 (Lu. ayacuchensis), and LuloOBP (Lu. longipalpis). Black background shading represents identical amino acids. Gray background shading represents similar amino acids. Asterisks indicate the conserved cysteine residues. (Bottom) The evolutionary history of OBPs salivary protein family was inferred by using the Maximum Likelihood method based on the Le_Gascuel_2008 model (Gomes et al., 2008). Sand fly species are indicated by the different symbols in the legend on the right. Tree branches were color-coded so as to represent specific taxon: Green color represents the Larroussius and Adlerius subgenera; Red color indicates the Euphlebotomus subgenus; Blue color points to proteins of the Phlebotomus and Paraphlebotomus subgenera; and Black color indicates the proteins belonging to New World sand flies.