groEL Gene-Based Phylogenetic Analysis of Lactobacillus Species by High-Throughput Sequencing

Lactobacillus is a fairly diverse genus of bacteria with more than 260 species and subspecies. Many profiling methods have been developed to carry out phylogenetic analysis of this complex and diverse genus, but limitations remain since there is still a lack of comprehensive and accurate analytical method to profile this genus at species level. To overcome these limitations, a Lactobacillus-specific primer set was developed targeting a hypervariable region in the groEL gene—a single-copy gene that has undergone rapid mutation and evolution. The results showed that this methodology could accurately perform taxonomic identification of Lactobacillus down to the species level. Its detection limit was as low as 104 colony-forming units (cfu)/mL for Lactobacillus species. The assessment of detection specificity using the Lactobacillus groEL profiling method found that Lactobacillus, Pediococcus, Weissella, and Leuconostoc genus could be distinguished, but non-Lactobacillus Genus Complex could not be detected. The groEL gene sequencing and Miseq high-throughput approach were adopted to estimate the richness and diversity of Lactobacillus species in different ecosystems. The method was tested using kurut (fermented yak milk) samples and fecal samples of human, rat, and mouse. The results indicated that Lactobacillus mucosae was the predominant gut Lactobacillus species among Chinese, and L. johnsonii accounted for the majority of lactobacilli in rat and mouse gut. Meanwhile, L. delbrueckii subsp. bulgaricus had the highest relative abundance of Lactobacillus in kurut. Thus, this groEL gene profiling method is expected to promote the application of Lactobacillus for industrial production and therapeutic purpose.


Introduction
Lactobacillus genus is a Gram-positive, non-spore-forming, catalase-negative, rhabditiform, facultative anaerobic or microaerophilic group of bacteria [1]. According to the fermentation products, Lactobacillus species can be segmented into homofermentative and heterofermentative lactobacilli. Due to fastidious nutritional requirement, Lactobacillus lives in nutrient-rich environment, such as fermented or spoiled foods, animal feed, plant and soil surfaces, and bodies of invertebrate and Leuconostoc lactis, Pediococcus acidilactici, Pediococcus pentosaceus, and Weissella cibaria were grown in de Man-Rogosa-Sharpe (MRS) medium with 0.05% (wt/vol) L-cysteine hydrochloride in an anaerobic environment at 37 • C. All bacterial strains employed in the research were obtained from the Culture Collection of Food Microorganisms of Jiangnan University (Wuxi, China).
The genomes of all above bacterial strains were extracted and purified as described earlier [21].

Sample Collection and Genome Extraction
In this study, six human, six rat, and six mouse fecal samples were collected. The methods of collection, transportation, and storage of the stool samples were reported previously [15]. The bacterial genome DNA of these samples were extracted with FastDNA SPIN Kit for Feces (MP Biomedicals, Carlsbad, CA, USA) in accordance with producer's suggestions.
Traditional fermented yak milk samples were retrieved from the herdsmen's homes in the pasture region of Qinghai, China. The samples were collected, transported, and stored as reported previously [22]. The genomic DNA from these kurut samples was extracted using FastDNA Spin Kit for Soil (MP Biomedicals, Carlsbad, CA, USA) in accordance with producer's instruction.
All procedures followed in this research were authorized by the Ethical Committee of Jiangnan University (Wuxi, China).

Selection of Lactobacillus groEL Gene
The sequences of 73 Lactobacillus core genes based on a previous study [9] were retrieved from the National Center for Biotechnology Information (NCBI) genome database and European Molecular Biology Laboratory (EMBL) database. With MEGA7.0 software, the sequences of genome were aligned using ClustalW program, and phylogenetic trees were established using maximum likelihood (ML) method with a supporting bootstrap value of 1000. The resulting gene sequences were filtered based on the results in the literature [15,23]. Accordingly, groEL gene was chosen as the marker gene for sequencing analysis of the Lactobacillus strains.

Phylogenetic Analysis
Phylogenetic trees on the basis of groEL gene and the V3-V4 region of 16S rRNA gene of Lactobacillus strains (Table S4) were constructed using MEGA7.0 software [24,25]. The ClustalW program was used for sequence alignment [26], and maximum likelihood (ML) method was used to yield phylogenetic trees with bootstrap support value of 1000.

Lactobacillus groEL-Specific Primer Design
The available groEL gene sequences of 108 species of Lactobacillus (Table S2) were downloaded from the NCBI and EMBL databases, and sequences alignment was conducted using the ClustalW program [26]. The results of the analysis were then used to guide the designing of primers using the PRIMER [27] and OLIGO [28] software. The sequences of designed degenerate primers were as follows: forward primer Lac_groEL_F, 5 -GCYGGTGCWAACCCNGTTGG-3 and reverse primer Lac_groEL_R, 5 -AANGTNCCVCGVATCTTGTT-3 . The WebLogo pictures of primers were made using the website (http://weblogo.berkeley.edu/) [10].
Shanghai Shengong Biological Engineering Co., Ltd. (Shanghai, China) was responsible for the synthesis of the designed primers.

Evaluation of the Accuracy and Specificity of the Degenerate Primers
A mock Lactobacillus community was prepared using nine Lactobacillus strains (Table S3) that were cultured separately on MRS medium in an anaerobic atmosphere until they reached the late log phase. The DNA extraction methods of these strains were consistent with those reported in previous literature [15] and mixed in a ratio of 0.01 ng to 50 ng. The colony-forming units (cfu) of the Lactobacillus strains were calculated based on the groEL gene copy numbers by the means of a calculator provided by the URI Genomics & Sequencing Center (http://cels.uri.edu/gsc/cndna.html). Subsequently, PRIMER-BLAST was applied to perform specific testing by in silico PCR amplification, and NCBI non-redundant (nr) database was used as a reference database [29]. Three parallel experiments were carried out.

PCR Conditions, Quantifiable and Sequencing Measures
The extracted DNA of 35 Lactobacillus strains and 12 non-Lactobacillus strains described above were used for PCR amplification. The PCR conditions used in groEL profiling were as followed: 5 min at 95 • C, followed by 35 cycles of 30 s at 95 • C, 30 s at 58 • C, 60 s at 72 • C, and final elongation at 72 • C for 7 min. Furthermore, the DNA extracted from fecal and kurut samples were used for amplification of partial groEL gene using the primers Lac_groEL_F/Lac_groEL_R, as well as the V3-V4 region within 16S rRNA gene and primer pair 341F/806R was used on the basis of previously reported protocol [30]. Further purification and quantification of all PCR amplified products were conducted according to reported protocols [15]. The purified DNA was sequenced by 2 × 300-bp paired-end Illumina sequencing. The DNA amplicons library was made with TruSeq DNA LT Sample Preparation Kit (Illumina, San Diego, CA, USA) and sequenced by MiSeq Illumina platform with the MiSeq v3 Reagent Kit (600 cycles) according to manufacturer's instructions.

Cross-Alignment Analysis
All available universal target (UT) sequences of groEL gene of Lactobacillus were retrieved from the chaperonin database, which contains 533 strains correspond to 129 Lactobacillus species (Table S5). With MatGAT software, cross-alignment of all sequences was conducted.

Selection of the Lactobacillus groEL Gene
The sequences of 73 core genes based on previous studies were retrieved for the selection of marker gene for Lactobacillus genus. The results showed the following distinct types of core genes: first, core genes without highly variable region sequences and low resolution that could not be used to distinguish between Lactobacillus species; second, core genes with highly variable regions and high resolution that could distinguish between different Lactobacillus species but their ends did not contain relatively conserved sequences, which restrict the designing of primers; third, core genes with highly variable regions, high resolution to distinguish between Lactobacillus species, and relatively conserved sequences at their ends, e.g., the groEL gene, thus suitable for designing primers. Therefore, the groEL and 16S rRNA gene sequences of all Lactobacillus species that deposited in the EMBL and NCBI databases were downloaded and pairwise sequence similarity using BLAST were performed. The results revealed that the lowest and average pairwise similarity values of the groEL gene were 69.63% (L. iners and L. ingluviei) and 79.61%, respectively, whereas those of the 16S rRNA gene were 83.77% (L. thailandensis and L. iners) and 89.58% respectively. These indicate that the groEL gene evolves at a high rate that allows identification of Lactobacillus species more accurately when comparing to 16S rRNA. What's more, discrimination of the 129 species of Lactobacillus was achieved based on the cross-alignment of groEL sequences (Table S5). As a result, the groEL gene was selected as the core gene for identification of Lactobacillus species.

Phylogenetic Analysis of the Lactobacillus groEL Gene
The members of Lactobacillus genus were previously separated into 24 phylogenetic groups including Pediococcus genus based on the concatenated protein sequences of single-copy core genes. In view of this, a phylogenetic tree was constructed based on the groEL gene sequences containing 97 Lactobacillus strains and Pediococcus strains of 22 phylogenetic groups, using the Maximum Likelihood method with 1000 bootstrap values ( Figure 1). According to the phylogenetic tree, Lactobacillus amylovorus, Lactobacillus acidophilus, Lactobacillus cripatus, Lactobacillus gallinarum, Lactobacillus helveticus, Lactobacillus gasseri, Lactobacillus johnsonii, and Lactobacillus iners were gathered into a large cluster, whereas Lactobacillus delbrueckii subspecies were individually assembled into a cluster. Meanwhile, Lactobacillus ruminis was separated from the Lactobacillus salivarius group when the phylogenetic tree was built on the basis of groEL gene. On the other hand, the closely related Lactobacillus species and subspecies formed clusters, and distinct Lactobacillus species formed different clusters. The Pediococcus species were gathered into an individual branch on the phylogenetic tree. However, the phylogenetic dendrogram using the 16S rRNA gene indicated some distinct Lactobacillus species were grouped into the same clusters, such as Lactobacillus iners and Lactobacillus amylophilus, Lactobacillus brevis and Lactobacillus amylophilus ( Figure S1). These results suggest that groEL gene could better differentiate between the Lactobacillus species when compared to 16S rRNA gene.

Design of Novel Primer Sets for Identifying Lactobacillus
The NCBI database and the chaperonin sequence database contain enough groEL gene sequences of Lactobacillus strains to meet the requirements of sequence alignment and new species identification. The emergence of next-generation sequencing technology facilitates amplification and sequencing of partial gene sequences of bacterium. The full-length groEL gene is approximately 1600 bp long, which is not suitable for short-read sequencing platform. Therefore, partial Lactobacillus groEL gene was used for sequencing and designing a novel primer set (Lac_groEL_F and Lac_groEL_R) as per the primer design criteria using the MiSeq Illumina sequencing platform. Based on the results of multiple sequence alignment, the 340-824 bp region of the groEL gene was targeted for PCR amplification. After the procedure of PCR, the amplicon was about 480 bp which could be sequenced by 2 × 300-bp paired-end Illumina sequencing. Figure 2 shows the sequence conservation of the groEL gene as represented by WebLogo images.

Development of Biological Information Analysis Tool for Lactobacillus
All of the available complete sequences of the Lactobacillus groEL gene were downloaded from the NCBI, EMBL, and chaperonin sequence database and summarized in an annotation library (Lac_groEL_database). This database is constantly updated with newly obtained sequences of the Lactobacillus groEL gene.
Disassembly data were analyzed with the QIIME1.9.1 software. Briefly, disassembly sequences were first screened to select sequences with quality Q value higher than 30, and high-quality sequences were spliced with Flash software package. The splicing condition was overlap base number of >10 bp, and no mismatch base was found. The barcode and primer sequences were then removed to obtain the final effective sequence. Notably, sequence similarity of 97% could be clustered into one operational taxonomic unit (OTU). OTUs based on the gene composition of Lactobacillus groEL were comparable with those in Lac_groEL_database.

Performance Evaluation of the Degenerate Primers for Identifying Lactobacillus Species
In silico PCR was performed using PRIMER-BLAST for evaluating the specificity of the novel primer set. The result showed that only one amplicon was generated from the Lactobacillus genomes, indicating that the primer set had good specificity for Lactobacillus. Further, an in vitro experiment was carried out by using the primers Lac_groEL_F/Lac_groEL_R for amplifying DNA obtained from 35 Lactobacillus and 12 non-Lactobacillus species. The results showed that PCR product could be obtained using the template DNA extracted from Lactobacillus, Pediococcus, Weissella, and Leuconostoc species ( Figure S2), indicating that the newly designed primer set could distinguish Lactobacillus and genera historically associated with or grouped within the lactobacilli. Other bacteria cannot be amplified by corresponding gene segments in the research.
In addition, the ability to distinguish Lactobacillus accurately and sensitively by the new degenerate primers were evaluated. Briefly, the DNA of different Lactobacillus species were mixed in known quantities. After PCR amplification with new primers for this artificial Lactobacillus community, the amplicons were sequenced using MiSeq Illumina sequencing platform. The findings indicated that the predicted relative concentration of Lactobacillus had a good correlation with the relative concentration measured using the degenerate primer pair Lac_groEL_F/Lac_groEL_R, proving the accuracy of the new primers ( Figure 3). The lowest amount of Lactobacillus DNA, by the means of MiSeq Illumina sequencing platform, was 0.05 ng corresponding with a concentration of 10 4 cfu/mL. Hence, the detection limit of the newly constructed primers was considered 10 4 cfu/mL.

Resolution of Partial groEL Gene of Lactobacillus by Analyzing Samples from Varied Habitats
To evaluate the overall performance and resolution of the Lactobacillus new primers in different ecosystems, the Lactobacillus diversity of 24 samples, including six human fecal samples, six rat fecal samples, six mouse fecal samples, and six fermented yak milk samples was analyzed. Moreover, the V3-V4 region of 16S rRNA amplicons of these samples were sequenced. The 16S rRNA gene method assigned approximately 0.05-25% of the reads from the stool samples to the Lactobacillus genus, and the Lactobacillus species composition was not clear due to the limited resolution of the universal primer set 341F/806R. In contrast, using the groEL gene-based method, nearly all of the amplified sequences clustered as Lactobacillus and classified down to species level. Table 1 shows the microflora composition of four different habitats on the basis of 16S rRNA profiling method. Table 2 shows the species composition of the habitats on the basis of groEL gene profiling method. The values represent the average of the relative abundance of the microbes of six samples, and the value in parentheses is the standard error of the sample. The relative abundance values of genera and species less than 0.01 are omitted.
Further, using the groEL gene profiling, L. mucosae accounting for about 60% lactobacilli was found to be the most abundant Lactobacillus in people's intestinal tract, followed by L. gasseri and L. salivarius (Table 2). In contrast, partial 16S rRNA gene profiling revealed that the Lactobacillus genus accounted for less than 3% of the human intestinal flora, the average is 1% ( Table 1).
Analysis of genome DNA from six rat and six mouse fecal samples based on novel primer set Lac_groEL_F/Lac_groEL_R suggested the usefulness of this method for accurate cataloging of Lactobacillus species. In particular, L. johnsonii and L. intestinalis showed the highest relative abundance among the Lactobacillus population in the rat gut (Table 2), whereas L. johnsonii and L. reuteri were shown to be the main Lactobacillus in mouse gut (Table 2). Interestingly, 16S rRNA gene profiling method showed that the Lactobacillus genus accounted for more than 10% of the microbes in the rat and mouse intestines (Table 1).
In addition to using the method developed in this paper for studying the composition of lactobacilli in complex ecosystems, the formation of Lactobacillus species was also investigated in a relatively simple system. Tibetan people in Qinghai, China, prepare kurut using traditional methods. Six kurut samples were profiled using Lac_groEL_F/Lac_groEL_R, and the results showed that L. delbrueckii subsp. bulgaricus was the most abundant Lactobacillus species. Meanwhile, 16S rRNA gene profiling revealed that more than 82% of the microbes in kurut are Lactobacillus, followed by the Streptococcus genus at approximately 15% (Table 1).
Notably, OTU analysis of the results obtained from partial groEL and 16S rRNA gene profiling methods demonstrated 97% identity. Lactobacillus species with a relative abundance of less than 0.01 in each sample were excluded as "other Lactobacillus <1%" in the table, and OTU which could not be clustered as Lactobacillus was shown as "unassigned" when the results were based on groEL gene profiling method. The genera with a relative abundance of <1% in each sample were excluded as "other Genera <1%" in the table, and unassigned genera whose relative abundance were less than approximately 5% includes OTUs that were temporarily not recognized as any other genus when the results were based on 16S rRNA profiling method.

Discussion
To design a Lactobacillus-specific primer pair using MiSeq Illumina sequencing platform, several criteria need to be satisfied: first, the target gene must be present across all Lactobacillus species; second, the target gene should be sufficient to distinguish different species; third, the target gene should involve a highly variable region flanked by conserved regions on both sides; fourth, the region targeted for amplification within the target gene should be less than 500 bp long; and fifth, sufficient number of target gene sequences should be available [15,31]. The primer pair designed in this study met these criteria.
The detection limit of Lac_groEL_F/Lac_groEL_R was 10 4 cfu/mL estimated from the means of MiSeq Illumina sequencing platform. Notably, in one previous study, the detection limit of Lactobacillus ITS profiling was 1.85 × 10 3 cells/µL [10], but ITS marker has different operon copy number within a genome which may skew the diversity estimates of bacterial communities. Since groEL is a single-copy gene in Lactobacillus species, the estimates obtained in this study could be used as a reference. The assessment of detection specificity using the Lactobacillus groEL profiling method found that Lactobacillus, Pediococcus, Weissella, and Leuconostoc genus could be distinguished, but non-Lactobacillus Genus Complex could not be detected, indicating that the primers Lac_groEL_F/Lac_groEL_R are not specific only for Lactobacillus. It is not clear that Lactobacillus present in the microbial flora of the sample if the results merely based on the PCR amplification and gel imaging. Further confirmation would be needed by sequencing and alignment.
To evaluate the performance of groEL gene profiling method, Lactobacillus species diversity was assessed in different ecosystems. The results showed the composition and abundance of Lactobacillus species in people's intestinal tract varied between different people. L. mucosae had the highest relative abundance of Lactobacillus in the intestine of Chinese, inconsistent with the previous finding that L. acidophilus and L. helveticus were the two dominant species in the gut of adult humans [32]. This discrepancy between the results could be explained by the finding that gut microbial composition differs between individuals based on internal and external factors such as diet, ethnicity, age, and physiological condition [33][34][35]. Furthermore, L. delbrueckii subsp. bulgaricus showed the highest relative abundance among Lactobacillus species in fermented yak milk samples, consistent with previous results [21]. The evaluation of the species diversity of Lactobacillus under different ecological environments is expected to provide valuable information for promoting the application of Lactobacillus for industrial production and therapeutic purpose.
There have been many classification methods since Lactobacillus was introduced in 1901, and it is likely that no single gene can distinguish between all Lactobacillus species. Thus, similar to most methods, groEL gene profiling is not without drawbacks. Cross-alignment analysis showed that lactobacilli have extremely high phylogenetic diversity (Table S5). Notably, the groEL sequences downloaded from the chaperonin sequence database (cpnDB) includes two strains of Lactobacillus plantarum and Lactobacillus rumis which are misclassified as Lactobacillus salivarius. In fact, Lactobacillus plantarum cannot be identical with Lactobacillus salivarius since these species are so distant on the basis of groEL gene sequence. Table 3 showed that 9 species shared groEL gene sequence identity over 97% with one or more Lactobacillus species. Thus, caution should be exercised when identifying these Lactobacillus species using the groEL gene as a genetic marker. The combination of groEL gene profiling with profiling based on 16S rDNA, rpoB, clpC, or any other gene may be more effective in distinguishing between Lactobacillus species. Moreover, complete genomic information, including that on groEL gene sequences, of some Lactobacillus species is not available. These species would be labeled as "unassigned Lactobacillus" when annotated in database. Thus, the database needs to be updated regularly when new species are discovered to enable the identification of all Lactobacillus species. Furthermore, Pediococcus, Weissella, and Leuconostoc genus produced corresponding fragments when using the degenerate primers for PCR amplification, indicating that the primers may be effective when assessing genera that are closely related to Lactobacillus genus. Table 3. Lactobacillus species with groEL sequence identity of over 97% with another Lactobacillus species.

Species
groEL identity % with the Closet Species 2 Closest Species

Conclusions
In conclusion, a newly designed primer pair based on groEL gene could distinguish Lactobacillus species from other non-Lactobacillus Genus Complex bacterial species, with a detection limit of 10 4 cfu/mL. The results of characterization of the Lactobacillus population in different ecosystems, such as human, rat, and mouse gut and traditional fermented yak milk, demonstrated the efficacy of this new groEL gene-based profiling method, indicating its applicability for academic research and industrial purposes.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/10/7/530/s1, Figure S1: A phylogenetic tree using Maximum Likelihood method on the basis of v3v4 region of 16S rRNA gene sequences for lactobacilli, Figure S2: Specificity of the newly designed primer pair in amplifying the selected partial groEL gene. Table S1: Lactobacillus strains used for primer specificity test, Table S2: 108 species of Lactobacillus used for primers design, Table S3: List of strains used for creation of the artificial sample, Table S4: List of Lactobacillus strains used in phylogenetic trees based on the v3v4 region of 16S rRNA and groEL gene, Table S5