Phylogenomic Analyses and Comparative Studies on Genomes of the Bifidobacteriales: Identification of Molecular Signatures Specific for the Order Bifidobacteriales and Its Different Subclades

The order Bifidobacteriales comprises a diverse variety of species found in the gastrointestinal tract of humans and other animals, some of which are opportunistic pathogens, whereas a number of others exhibit health-promoting effects. However, currently very few biochemical or molecular characteristics are known which are specific for the order Bifidobacteriales, or specific clades within this order, which distinguish them from other bacteria. This study reports the results of detailed comparative genomic and phylogenetic studies on 62 genome-sequenced species/strains from the order Bifidobacteriales. In a robust phylogenetic tree for the Bifidobacteriales constructed based on 614 core proteins, a number of well-resolved clades were observed including a clade separating the Scarodvia-related genera (Scardovia clade) from the genera Bifidobacterium and Gardnerella, as well as a number of previously reported clusters of Bifidobacterium spp. In parallel, our comparative analyses of protein sequences from the Bifidobacteriales genomes have identified numerous molecular markers that are specific for this group of bacteria. Of these markers, 32 conserved signature indels (CSIs) in widely distributed proteins and 10 signature proteins are distinctive characteristics of all sequenced Bifidobacteriales species and provide novel and highly specific means for distinguishing these bacteria. In addition, multiple other molecular signatures are specific for the following clades of Bifidobacteriales: (i) 5 CSIs specific for a clade comprising of the Scardovia-related genera; (ii) 3 CSIs and 2 CSPs specific for a clade consisting of the Bifidobacterium and Gardnerella spp.; (iii) multiple other signatures demarcating a number of clusters of the B. asteroides-and B. longum- related species. The described molecular markers provide novel and reliable means for distinguishing the Bifidobacteriales and a number of their clades in molecular terms and for the classification of these bacteria. The Bifidobacteriales-specific CSIs, found in important proteins, are predicted to play important roles in modifying the cellular functions of the affected proteins. Hence, biochemical studies on the cellular functions of these CSIs could lead to discovery of novel characteristics of either all Bifidobacteriales, or specific groups of bacteria within this order. Some of the functions affected/modified by these genetic changes could also be important for the probiotic/pathogenic activities of the bifidobacteria.

Phylogenetic analyses based on 16S rRNA, as well as sequences for a number of housekeeping genes/proteins, are the main approaches used in the past to distinguish among the Bifidobacteriales species and genera (Miyake et al., 1998;Ventura and Zink, 2003;Ventura et al., 2004Ventura et al., , 2006Ventura et al., , 2007aBiavati and Mattarelli, 2006;Yarza et al., 2008;Bottacini et al., 2010;Turroni et al., 2011;Mattarelli et al., 2014). In recent years, complete or draft genome sequences have become available for all currently recognized Bifidobacterium species and subspecies (Ventura et al., 2009b;Milani et al., 2014). Based on these sequences, a panel of multiplex PCR primers has been developed enabling rapid and specific identification of different Bifidobacterium species and subspecies (Ferrario et al., 2015). Based on genome sequences, two recent studies have also examined the evolutionary relationships among Bifidobacterium species employing large datasets of sequences comprising the core proteins of this genus Sun et al., 2015). The robust phylogenetic trees obtained in these studies provide important insights concerning the evolutionary relationships among the Bifidobacterium species and they strongly support the existence of 6-7 distinct clusters within this genus. These clusters are referred to as the B. asteroides, B. pseudolongum, B. longum, B. bifidum, B. adolescentis, B. pullorum, and B. boum groups Sun et al., 2015). Similar clusters are also observed in phylogenetic trees based on the 16S and 23S rRNA genes as well trees based on other gene/protein sequences. Comparative analyses of the Bifidobacterales genomes are also providing useful insights concerning species-specific characteristics that are suggested to play important roles in the adaptation of particular species to either human or insect gut environment (Ventura et al., 2009b;Bottacini et al., 2010Bottacini et al., , 2012Turroni et al., 2010).
Due to the health-promoting effects of bifidobacteria, it is of much interest to identify genetic and biochemical characteristics that are specific for the Bifidobacteriales or particular groups/clusters within this order of bacteria. Currently, very few such characteristics are known. One important class of genome sequence-based molecular markers, which have proven very useful for evolutionary, taxonomic and functional studies are conserved signature insertions or deletions (CSIs) that are uniquely present in the genes/proteins homologs from a defined group of organisms Gupta, 2005, 2012;Gupta, 2010Gupta, , 2014. Conserved signature proteins (CSPs), which are genes/proteins that are uniquely found within a monophyletic group of organisms, provide another class of useful molecular makers for evolutionary and functional studies (Gao et al., 2006;Ventura et al., 2007a;Gao and Gupta, 2012;Gupta, 2016a,b). Both these types of markers constitute highly reliable characteristics of specific groups of organisms and they have been extensively utilized for the identification/demarcation of prokaryotic taxa of different ranks in molecular terms (Gao and Gupta, 2012;Gupta et al., 2013aGupta et al., ,b, 2016.
In the present work, we report detailed phylogenetic and comparative analyses on protein sequences from the sequenced members of the order Bifidobacteriales in order to identify CSIs and CSPs that are specific for different groups within this order. These studies have led to identification of 32 CSIs in widely distributed proteins and 10 CSPs that are uniquely found in all or most of the genome sequenced Bifidobacteriales species providing novel molecular markers that distinguish this order from all other bacteria. In addition, our work has also identified multiple other CSIs and CSPs that distinguish a number of clades of Bifidobacteriales, including a clade consisting of the Bifidobacterium and Gardnerella species, another clade consisting of the Scardovia-related genera, and multiple signatures that are specific for different clusters of B. asteroides or B. longum related species. These signatures provide novel means for the identification and demarcation of the members of the described clades in molecular terms and for functional studies that could lead to discovery of novel biochemical and/or other novel properties of these bacteria.

Phylogenetic Analysis
A phylogenetic tree for 62 genome-sequenced members from the order Bifidobacteriales was constructed based on concatenated sequences of 614 proteins. The protein families used in this phylogeny were identified using the UCLUST algorithm (Edgar, 2010) to identify proteins families present in at least 80% of the input genomes which shared at least 50% sequence identity and 50% sequence length. Each identified protein family was individually aligned using Clustal Omega (Sievers et al., 2011) and trimmed using Gblocks 0.91b (Castresana, 2000) with relaxed parameters (Talavera and Castresana, 2007). The concatenated dataset of the trimmed sequence alignments contained 197, 777 aligned amino acid residues. A maximumlikelihood tree based on this alignment was constructed using FastTree 2 (Price et al., 2010) employing the Whelan and Goldman model of protein sequence evolution (Whelan et al., 2001) and RAxML 8 (Stamatakis, 2014) using the Le and Gascuel model of protein sequence evolution (Le and Gascuel, 2008). SH-like statistical support values (Guindon et al., 2010) for each branch node in the final phylogenetic tree were calculated using RAxML 8 (Stamatakis, 2014). This process was completed using an internally developed software pipeline.
In parallel, a phylogenetic tree based on the 16S rRNA gene sequences of type strains covering all described species within the order Bifidobacteriales was also constructed. The 16S rRNA sequences were retrieved from Ribosomal Database Project (Cole et al., 2014) and aligned using the SINA aligner (Pruesse et al., 2012) to form a multiple sequence alignment that was 1604 aligned nucleotides long with common gaps removed. A maximum-likelihood phylogenetic tree based on this multiple sequence alignment was created using MEGA 6 employing the General Time-Reversible model of sequence evolution with branch support based on 1000 bootstrap replicates (Tamura et al., 2013).

Identification of Conserved Signature Indels
Conserved signature indels (CSIs) were identified by the procedures described in detail recently (Gupta, 2014). Briefly, BLASTp (Altschul et al., 1997) searches were performed on each protein in the genome of Bifidobacterium adolescentis ATCC 15703 (Accession number AP009256.1) against all available sequences in the GenBank non-redundant database. Multiple sequence alignments were then created using ClustalX (Jeanmougin et al., 1998) for proteins that returned high scoring matches from Bifidobacteriales and other prokaryotes.
The alignments were then visually inspected for the presence of insertions or deletions that were flanked on both sides by at least 5-6 conserved amino acid residues in the neighboring 30-40 amino acids. Detailed BLASTp searches were then carried out on short sequence segments containing the indel and the flanking conserved regions (60-100 amino acids long) to determine the specificity of the indels. SIG_CREATE and SIG_STYLE (available on Gleans.net) were then used to create Signature files for CSIs that were specific for the Bifidobacteriales order or its subgroups as described in earlier work (Gupta et al., 2013a;Gupta, 2014). Due to space limitations, sequence information for all Bifidobacterium species, particularly for different subspecies of B. longum, B. animalis, B. pseudolongum, and B. thermacidophilum, is not shown in the presented alignment files. However, unless otherwise noted, all of the described CSIs are specific for the indicated groups (i.e., similar CSIs were not present in the protein homologs from other bacteria in the top 500 Blast hits). It should be noted that significant blast hits for a number of CSIs and CSPs described here are also observed for one of the following three Chlamydia trachomatis strains (SwabB1, H1 IMS, and H17 IMS) deposited by the Sanger Institute. We suspect that these anomalous results are due to cross contamination of the sequenced cultures from the above Chlamydia trachomatis strains by a Gardnerella vaginalis strain. We have communicated our concern with the supporting evidence to the Sanger Institute.

Identification of Conserved Signature Proteins
BLASTp searches were carried out to examine the specificity of some previously described conserved signature proteins (CSPs), which were indicated to be specific for the order Bifidobacteriales (Gao and Gupta, 2012). Additionally, limited work to identify CSPs for the B. asteroides group of species was carried out by conducting BLASTp searches on all proteins from the genomes of Bifidobacterium asteroides (Bottacini et al., 2012) as query sequences. BLASTp searches were performed against all available sequences in the GenBank non-redundant sequence database and the results of these searches were then manually inspected, as described in earlier work (Gao et al., 2006;Gao and Gupta, 2012), for proteins for which all significant hits were from the B. asteroides group of species.

Homology Modeling of Elongation Factor Tu from Bifidobacterium longum
Homology models of EF-Tu homolog from Bifidobacterium longum were built using the solved crystallographic structure of EF-Tu from Escherichia coli (PDB ID: 3U6K) as the template. Initially, 200 models were generated using MODELER v9.14 (Sali and Blundell, 1993) and ranked/selected using assigned discrete optimized potential (DOPE) scores (Shen and Sali, 2006). The model with the highest DOPE score was then submitted to the ModRefiner program to obtain atomic-level energy minimization and to obtain a model with reliable stereochemistry quality (Xu and Zhang, 2011).

Phylogenetic Analysis of the Species from the Order Bifidobacteriales
Phylogenomic analyses of members of the genus Bifidobacterium have been previously reported based on core protein sequences from 45 and 48 described species from this genus Sun et al., 2015). However, these studies did not include the other members of the order Bifidobacteriales such as Gardnerella, Scardavia, Alloscardovia, and Parascardovia, as well as several unnamed Bifidobacterium spp. (viz. strains A11, 7101, AGR2158, MSTE12, 12.1.47BFAA) whose genomes have been sequenced. Additionally, the genome sequence of a recently described species B. aesculapii is also now available (Toh et al., 2015). To comprehensively examine the evolutionary relationships among different members of the order Bifidobacteriales, a phylogenetic tree was constructed for all 62 genome sequenced members of the family which included 54 Bifidobacterium species/strains, 5 species from Scardovia and related genera (viz. Alloscardovia and Parascardovia) and three strains of Gardnerella vaginalis. The tree was constructed based on the concatenated sequences of 614 universally or nearly universally present core proteins for which sequence information could be obtained from the 62 sequenced genomes. A maximum-likelihood tree based on these sequences, which represents the most comprehensive phylogenetic analysis of the order Bifidobacteriales to date, is presented in Figure 1.
In the tree shown, members of the order Bifidobacteriales, at the highest level, form two main clusters. One of these clusters referred to as the Scardovia cluster groups together the genera Scardovia, Parascardovia, and Alloscardovia, whereas the second cluster is comprised of members of the genus Bifidobacterium and Gardnerella. Importantly in this tree, as well as in an earlier study in a phylogenetic tree based on concatenated sequences for RpoB, RpoC, and GyrB proteins, different strains of Gardnerella vaginalis were found to branch in between the Bifidobacterium species (Gao and Gupta, 2012), making the genus Bifidobacterium polyphyletic. Earlier phylogenetic studies on members of the genus Bifidobacterium have identified a number of different clusters, which are referred to as the B. asteroides, B. pseudolongum, B. longum, B. bifidum, B. adolescentis, B. pullorum, and B. boum groups (Ventura et al., 2006;Turroni et al., 2011;Lugli et al., 2014;Sun et al., 2015). The existence of these groups/clusters is also confirmed and supported by the tree shown in Figure 1. Of these clusters, the species-related to B. asteroides cluster exhibited the deepest branching within the genus Bifidobacterium, as also observed in earlier work Sun et al., 2015). The B. asteroides clade is generally demarcated as being comprised of the B. asteroides, B. indicum, B. coryneforme, and B. actinocoloniiforme species (marked as cluster III in Figure 1). However, as discussed later, a number of clusters, marked I, II, and IV, which are either part of the B. asteroides clade or are related to this clade are also distinguished in phylogenetic trees and by the CSIs identified in this work.
We have also created a phylogenetic tree based on 16S rRNA gene sequences of all named Bifidobacteriales species (Supplementary Figure S1). The overall branching pattern in the 16S rRNA tree is similar to that observed in the concatenated protein tree with Scardovia and related genera forming the deepest branches in the tree and the genera Scardovia, Alloscardovia, and Parascardovia were part of one of the deepest branching clusters. The different clusters of the Bifidobacterium spp. that are observed in the concatenated protein tree were also supported by the 16S rRNA tree and G. vaginalis was found to branch in between these clusters. The polyphyletic nature of the genus Bifidobacterium in 16S rRNA gene based phylogenies is also observed in earlier work (Yilmaz et al., 2014).

Identification of Molecular Markers That Are Specific for the Order Bifidobacteriales
The main focus of this work is the identification of molecular characteristics that are specific for the Bifidobacteriales species and could be used for their identification as well as functional studies. As noted earlier, conserved inserts and deletions (i.e., indels or CSIs) in genes/proteins and conserved signature proteins that are uniquely found in a phylogenetically coherent group of organisms provide very useful molecular markers for such purposes. The indels that provide useful molecular markers are of defined size and they are flanked on both sides by conserved regions to ensure that they are reliable characteristics (Gupta, 1998;Gupta and Griffiths, 2002;Ajawatanawong and Baldauf, 2013). These conserved indels in gene/protein sequences result from highly specific and rare genetic changes, hence when such an indel is uniquely found in a phylogenetically coherent group of species, its simplest explanation is that the genetic change responsible for it occurred once in a common ancestor of the indicated group and then the change was passed on to various descendants (Gupta, 1998(Gupta, , 2014Rokas and Holland, 2000;Gao and Gupta, 2005). Based upon the presence or absence of a conserved indel in outgroup species, it is also possible to determine whether a given indel represents an insert or a deletion (Gupta, 1998;Gao and Gupta, 2012).
Comparative analyses of protein sequence alignments from bifidobacteia species carried out in this work have led to the identification of 32 CSIs in a broad range of highly conserved proteins, which are specifically found in different Bifidobacteriales taxa (see Table 1). One example of a CSI that is specific for all members of the order Bifidobacteriales is shown in Figure 2. In this case, a 4 amino acid (aa) insertion is present in a highly conserved region of the protein synthesis elongation factor EF-Tu, which is commonly shared by all sequenced bifidobacteria species, but it is not found in any other bacteria in the top 500 BLAST hits. The protein EF-Tu is a highly conserved protein, which is universally present in all organisms (Harris et al., 2003) and the 4 aa CSI in this protein is a distinctive characteristic of homologs from all sequenced Bifidobacteriales species. Sequence information for 31 other CSIs, which are also specifically shared by members of the order Bifidobacteriales, and which are present in proteins involved in different other functions, is provided in Supplementary Figures S2-S32 and some of their   characteristics are summarized in Table 1. Barring an isolated exception, all of the CSIs listed in Table 1 are specifically found in different members of the order Bifidobacteriales and are not present in the protein homologs from other bacteria. Due to their specific presence in bifidobacteria species, the described CSIs provide novel molecular markers for distinguishing and demarcating members of the order Bifidobacteriales from all other bacteria. We have previously described 14 CSPs, whose homologs were specifically found in the 13 different sequenced bifidobacteria species that were available at the time (Gao and Gupta, 2012). Updated BLASTp searches on the sequences of these CSPs confirm that 10 of these CSPs, information for whom is provided in Table 2, are still distinctive characteristics of members of the order Bifidobacteriales and they provide additional molecular markers for identification and functional studies on bifidobacteria.

Molecular Signatures for Some of the Subclades of Bifidobacteriales
In the phylogenetic tree based on concatenated protein sequences, Bifidobacteriales species form a number of different clusters. At the deepest level, of the two main clusters that are observed, one consists of the genus Scardovia and related genera, whereas the other is comprised of species from the genera Bifidobacterium and Gardnerella. In our analyses, we have also identified a number of CSIs and CSPs which distinguish these two clades of the Bifidobacteriales. Figure 3 shows one example of a CSI consisting of a 1 aa insertion in the DNA polymerase IV protein that is specifically found in different Bifidobacterium species and Gardnerella, but which is not found in any of the sequenced Scardovia-related genera of the Bifidobacteriales. Two other CSIs in the ribosomal RNA small subunit methyltransferase E protein and GTP-binding Frontiers in Microbiology | www.frontiersin.org

FIGURE 2 | Continued
Genebank Identification numbers of the protein sequences are shown, and the topmost numbers indicate the position of this sequence in the species shown on the top line. Due to space constraints, sequence information for different subspecies is not shown. However, unless otherwise indicated, these CSIs are present in the sequenced subspecies of B. longum, B. animalis, B. pseudolongum, and B. thermacidophilum. Information for large numbers of other CSIs, which are also specific for the order Bifidobacteriales is presented in Table 1 and Supplementary Figures S2-S32. protein YchF are also specifically shared by members of the genera Bifidobacterium and Gardnerella. Sequence information for these CSIs is presented in Supplementary Figures S33, S34 and some of their characteristics are summarized in Table 3. Additionally, we have also confirmed that the homologs of 5 of the 6 previously described CSPs (Gao and Gupta, 2012), information for which is summarized in Table 2, are also present in only members of these two genera.
We have also identified a number of CSIs that are commonly and specifically shared by members of the genus Scardovia and related genera for which sequence information is available. One example of a CSI which is specifically found in members of the genera Scardovia, Parascardovia and Alloscardovia, consisting of 1 aa insertion in the triosephosphate isomerase protein, is presented in Figure 4. Four other CSIs in four different proteins (viz. FHA domain protein, Glycosyl transferase, PAC2 family protein and Phosphate-ABC-transporter substratebinding protein) are also largely specific for these genera of Bifidobacteriales. Sequence information for these CSIs is provided in Supplementary Figures S36-S39 and their characteristics are summarized in Table 3. Interestingly, the CSIs in the Glycosyl transferase and PAC2 family proteins are also commonly shared by G. vaginalis.
A number of distinct clusters within the genus Bifidobacterium are consistently observed in different phylogenetic studies including in the phylogenetic trees constructed in this work (Figure 1). A number of CSIs identified in our work serve to distinguish some of the Bifidobacterium clusters. Three of the identified CSIs are specific for the B. longum group and sequence information for one of these CSIs, consisting of a 1 aa insertion in the phosphogluconate dehydrogenase, is shown in Figure 5. Sequence information for the other 2 CSIs that are also specific for a subgroup of species from the B. longum clade are presented in Supplementary Figures S40,  S41 and their characteristics are summarized in Table 3. One additional CSI consisting of a 1 aa insertion in transketolase protein is specifically shared by members of the B. longum, B. bifidum, and B. adolescentis clades. Members of these clusters group together in phylogenetic trees and the shared presence of this CSI supports the view that that the members of these taxa are more closely and specifically related to each other.
The members of the B. asteroides cluster forms the deepest branching group within the genus Bifidobacterium. A number of CSIs identified in this study are specific for group of species, which are either part of the B. asteroides clade or related to this clade. The B. asteroides clade is demarcated as being made up of the species B. asteroides, B. indicum, B. coryneforme, and B. actinocoloniiforme species (marked cluster III in Figure 1) Sun et al., 2015). Surprisingly, in our work no CSI was identified that was commonly shared by all of the species from this clade. However, our work identified four CSIs for a cluster (cluster II) comprising of all of other species from the B. asteroides clade, except B. actinocoliniiforme, which shows the deepest branching within this clade. One example of a CSI specific for members of the B. asteroides cluster II consisting of 1 aa insertion in the purine biosynthesis protein purH is shown in Figure 6A. Sequence information for three other CSIs that are also specific for the B. asteroides group is presented in Supplementary Figures S43-S45. In our phylogenetic trees as well as in different identified signatures, two Bifidobacterium spp. strains A11 and 7101, isolated from honey bee guts (Anderson et al., 2013), also consistently group with the B. asteroides. Two CSIs identified in our work are specifically shared by B. asteroides and the Bifidobacterium sp. A11 and Bifidobacterium sp.7101 (referred to as B. asteroides cluster I) providing additional evidence of the close relationship of these Bifidobacterium strains to the B. asteroides. Sequence information for these CSIs is presented in Supplementary Figure S46. Lastly, one additional CSI identified in this work, consisting of a 3 aa insertion in a conserved region of the protein 5'-methylthioadenosine nucleosidase, is commonly shared by all the members of the FIGURE 3 | Partial sequence alignment of DNA polymerase IV showing a 1 aa insertion that is specific for the Bifidobacterium and Gardnerella species, but not found in any other Bifidobacteriales. Information for other CSIs specific for this clade is presented in Table 3 and Supplementary Figures S33-S35.
Frontiers in Microbiology | www.frontiersin.org  B. asteroides as well as by B. crudilactis and B. psychaerophilum. The latter two species form a deeper branching cluster that appears to be specifically related to the B. asteroides clade in the tree based on concatenated protein sequences (marked as B. asteroides cluster IV in Figure 1). The shared presence of this CSI by the B. asteroides clade and B. crudilactis and B. psychaerophilum support the inference that these species are specifically related to the B. asteroides clade. In addition to the described CSIs, BLAST searches on the protein sequences of B. asteroides have also identified 5 CSPs, whose homologs are specifically present in the members of B. asteroides group of species. Information for these CSPs is also presented in Table 2. Of these CSPs, three CSPs are specific for the commonly described B. asteroides clade (Cluster III in Figure 1), whereas the remaining two are specific for the clusters I and II.

DISCUSSION
Members of the order Bifidobacteriales are one of the main groups within bacteria where several members exhibit healthpromoting probiotic effects on humans (Biavati et al., 2000;Biavati and Mattarelli, 2006;Ventura et al., 2007bVentura et al., , 2009aCronin et al., 2011;Turroni et al., 2011). Other Bifidobacteriales species are also responsible for implicated in the development of dental caries as well as bacterial vaginosis and urinary tract infections (Bradshaw et al., 2006;Mantzourani et al., 2009;Ventura et al., 2009b;Kenyon and Osbak, 2014). However, very little is known at present concerning the genetic or biochemical characteristics of these bacteria that mediate their beneficial or pathogenic effects. In the present work, we have carried out detailed phylogenetic and comparative analyses of protein sequences from the genomes of Bifidobacteriales species to examine in depth their evolutionary relationships and also to identify molecular markers that are unique to these bacteria at multiple phylogenetic levels. Based on a robust and comprehensive phylogenetic tree for the Bifidobacteriales species based on 614 core proteins from the sequenced genomes, the following inferences regarding the evolutionary relationships among the Bifidobacteriales species could be made. (i) The sequenced Bifidobacteriales species appear to form two main clusters, a deeper clade consisting of the Scardovia-related genera (viz. Scardovia, Parascardovia and Alloscardovia) and another cluster grouping together Bifidobacterium and Gardnerella genera. (ii) Gardnerella vaginalis rather than branching separately is found to consistently branch in between different Bifidobacterium species. FIGURE 4 | Example of 1 aa conserved signature indel in the protein triosephosphate isomerase that is specific for the Scardovia clade comprising of the genera Scardovia, Parascardovia, Metascardovia, and Alloscardovia. Information for other CSIs specific for this clade is presented in Table 3 and Supplementary Figures S36-S39. (iii) Within Bifidobacterium species, a number of distinct clusters, referred to as the B. asteroides, B. pseudolongum, B. longum, B. bifidum, B. adolescentis, B. pullorum, and B. boum groups, are observed as described in earlier work Sun et al., 2015). Of these clusters, the B. asteroides group forms the deepest branching lineage within the Bifidobacterium (Bottacini et al., 2012;Lugli et al., 2014;Sun et al., 2015).
The present work also identified large number of novel molecular signatures in the forms of CSIs and CSPs, which are specific characteristics of the members of the order Bifidobacteriales at multiple phylogenetic levels. Of these signatures, 32 CSIs and 10 CSPs are specific for the entire order Bifidobacteriales. The identified Bifidobacteriales-specific CSIs are present in assorted widely distributed proteins carrying out wide variety of cellular functions. All of the 10 Bifidobacterialesspecific CSPs are proteins of unknown functions. Given the specificity of these CSIs and CSPs for the Bifidobacteriales, the genetic changes leading to these molecular characteristics have likely occurred in a common ancestor of the Bifidobacteriales Gupta, 2005, 2012). Additionally, our analyses have also identified many other molecular signatures (CSIs and CSPs), which independently support the existence of a number of clades of bifidobacteria that are consistently observed in phylogenetic trees. The clades identified by these molecular signatures include, (i) a clade encompassing the genera Scardovia, Parascardovia and Alloscardovia, (ii) signatures that are commonly shared by FIGURE 6 | Conserved signature indels that are specific for the B. asteroides-related clades of the Bifidobacteriales. (A) Partial sequence alignment of the purine biosynthesis protein purH showing a 1 aa insertion which is specific for the B. asteroides cluster II species in the protein tree ( Figure 1); (B) Excerpt from sequence alignment of the protein 5 ′ '-methylthioadenosine nucleosidase showing a 3 aa insertion that is specific for the B. asteroides-related cluster IV in the protein tree.
Bifidobacterium and Gardnerella species to the exclusion of other bifidobacteria, and (iii) signatures demarcating specific clusters of B. asteroides-or B. longum-related species.
The order Bifidobacteriales presently contains a single family, Bifidobacteriaceae. Based upon the results of phylogenomic studies and identified molecular signatures, it appears that the members of this order could be divided into two familylevel groups, one comprising of the Scardovia-related genera (viz. Scarodivia, Parascardovia, and Alloscardovia) and the other consisting of the genera Bifidobacterium and Gardnerella. However, genome sequence information for members of several newly described Scardovia-related genera (viz. Aeriscardovia, Neoscardovia, and Pseudoscardovia), is lacking at present (Simpson et al., 2004;García-Aljaro et al., 2012;Killer et al., 2013). In future studies, depending upon whether the species from these genera branch with the Scardovia-clade and their sharing of the molecular signatures specific for this clade, the possibility of dividing the order Bifidobacteriales into two or more families could be considered.
The genus Bifidobacterium, which is comprised of 49 species and subspecies, contains most of the recognized taxa within the order Bifidobacteriales. Although earlier phylogenetic studies have consistently observed 6-7 distinct clusters of Bifidobacterium species (Ventura et al., 2006(Ventura et al., , 2007bTurroni et al., 2011;Lugli et al., 2014;Sun et al., 2015), due to lack of any other distinguishing characteristics, no attempt has been made to formally recognize any of these clusters. In our work, we have identified a number of molecular signatures that are either completely or largely specific for the members of two of these clusters (viz. the B. asteroides and B. longum groups). Of these clusters, the distinctness of the B. asteroides group (comprising of the species B. asteroides, B. indicum, B. coryneforme, B. actinocoloniiforme, B. sp. A11, and B. sp. 7101) which forms the deepest branching lineage within the Bifidobacterium, is supported by 2 CSIs and 4 CSPs that are uniquely shared by most of the members of this clade. Further, most of the species which are part of the B. asteroides clade have been isolated from the gastrointestinal tract of honey bees, and unlike other bifidobacteria, they are also capable of carrying out respiratory metabolism (Killer et al., 2010(Killer et al., , 2011Bottacini et al., 2012;Lugli et al., 2014;Sun et al., 2015). All of these characteristics indicate that the members of the B. asteroides clade are a good candidate for recognition as a distinct genus level taxon within the order Bifidobacteriales.
The molecular markers for the order Bifidobacteriales and some of its clades, in addition to their utility for taxonomic and diagnostic studies (Ahmod et al., 2011;Gupta, 2014;Wong et al., 2014), also provide important new tools for genetic and biochemical studies. Earlier work on a number of CSIs in the Hsp60 and Hsp70 proteins has established that both large and small CSIs in conserved proteins are essential for the group of organisms in which they are found (Singh and Gupta, 2009;Gupta, 2016b). Removal of these CSIs, or any significant change in them, was shown to be incompatible with the cellular growth of the CSI-containing organisms. Thus, the identified CSIs are predicted to play essential role in the organisms in which they are found. Structural studies on several studied CSIs show that the sequences corresponding to them are present in the surface loops of the proteins (Singh and Gupta, 2009;Gupta and Khadka, 2016). Limited structural work on some of the Bifidobacteriales-specific CSIs that we have carried out also shows that these CSIs are located in the surface loops of the proteins. One example of the structural location of a Bifidobacterialesspecific CSIs is illustrated in Figure 7. In this case, a homology model of protein synthesis elongation factor Tu from B. longum was created to determine the location of the 4 aa Bifidobacterialesspecific CSI found in this protein. A structural comparison of the EF-Tu from B. longum and E. coli shown in Figure 7 reveals that the CSI in the B. longum homolog is present in the protein surface loop within the GTPase domain of EF-Tu. The surface loops in proteins play important role in mediating proteinprotein or protein-ligand interactions and it is expected that the identified CSIs are involved in mediating novel interactions that are specific and essential for the CSI-containing organisms (Akiva et al., 2008;Hashimoto and Panchenko, 2010). Similar to the CSIs in the EF-Tu protein, our work has identified numerous other CSIs in different essential proteins, which are specific for the Bifidobacteriales species. Functional studies on proteins harboring these CSIs provide an important means for discovering novel biochemical characteristics that are unique to either all Bifidobacteriales or specific clades of these bacteria, and which could possibly also provide useful insights into the growth-promoting as well as pathogenic effects of some of these bacteria.

AUTHOR CONTRIBUTIONS
GZ, BG, MA, BK carried out comparative analyses of the bifidobacteriales genomes to identify signatures reported here. ZG and MA constructed phylogenetic trees and BK carried out homology modeling of the protein sequences. BG, MA, and RG were responsible for the writing and editing of the manuscript. All of the work was carried out under the direction of RG.