Revisiting the phylogeny of Wolbachia in Collembola

Abstract The endosymbiont Wolbachia has been detected in a few parthenogenetic collembolans sampled in Europe and America, including three species of Poduromorpha, two species of Entomobryomorpha, and two species of Neelipleona. Based on 16S rRNA and ftsZ gene sequences, most of the Wolbachia infecting parthenogenetic collembolans were characterized as members of supergroup E and showed concordant phylogeny with their hosts. However, the two neelipleonan symbionts form another unique group, indicating that Wolbachia has infected parthenogenetic collembolans multiple times. In this study, five parthenogenetic collembolan species were identified as hosts of Wolbachia, and four new Wolbachia strains were reported for four collembolan species sampled in China, respectively, including a neelipleonan strain from Megalothorax incertus (wMinc). Our results demonstrated that the Wolbachia multilocus sequence typing (MLST) system is superior to the 16S rRNA + ftsZ approach for phylogenetic analyses of collembolan Wolbachia. The MLST system assigned these Wolbachia of parthenogenetic collembolans to supergroup E as a unique clade, which included wMinc, supporting the monophyletic origin of Wolbachia in parthenogenetic collembolan species. Moreover, our data suggested supergroup E as one of the most divergent lineages in Wolbachia and revealed the discrepancy between the phylogenies of Wolbachia from parthenogenetic collembolans and their hosts, which may result from the high level of genetic divergence between collembolan Wolbachia, in association with the geographic differentiation of their hosts, or the possible horizontal transmission of Wolbachia between different collembolan species.


| INTRODUCTION
Wolbachia is an obligate endosymbiont that is widespread in arthropods and filarial nematodes. Prominent effects of Wolbachia infection in arthropods include reproductive manipulations, such as feminization of genetic males, parthenogenetic induction, killing of male progeny from infected females, and cytoplasmic incompatibility (Werren, Baldo, & Clark, 2008). Generally, these phenotypes result in an increased frequency of infected females in host populations and thus promote the maternal transmission of Wolbachia through generations (LePage & Bordenstein, 2013).
Wolbachia cannot be cultivated in vitro; thus, traditional classification methods are ineffective for characterizing Wolbachia species.
As a group of basal hexapods, Collembola (springtails) have been on earth for more than 400 million years. They are small but are abundant in most terrestrial ecosystems, living primarily in the soil and feeding on fungi or decaying plant material (Hopkin, 1997).
Based on a molecular typing system involving 16S rRNA and ftsZ genes, collembolan Wolbachia fall into three groups: (1) (Tanganelli et al., 2014). These findings suggest that there have been multiple occurrences of Wolbachia infection in parthenogenetic springtails. However, the neelid group was placed in different positions in trees based on 16S rRNA, ftsZ, or 16S rRNA + ftsZ; Thus, Tanganelli et al. (2014) defined the neelids as a "group" rather than a "supergroup" and proposed that further studies are necessary to confirm the accuracy of this designation. Currently, there is no MLST information on collembolan Wolbachia. With limited data, the phylogenetic position and transmission pattern of collembolan Wolbachia are hardly resolved.
In this study, we conducted the first diagnostic screening for Wolbachia in collembolan species sampled in China, covering the four orders of Collembola. As a result, we recovered four new Wolbachia strains from four parthenogenetic species, including the first case of Wolbachia infection in the family Onychiuridae. The phylogenetic positions of these symbionts, as well as the Wolbachia from a Danish population of F. candida (i.e., F. candida DK), were revealed using the Wolbachia MLST system and wsp gene. Our analyses supported all the newly identified strains belonging to supergroup E, including the one from neelipleonan species, and suggested that the phylogeny of collembolan Wolbachia is discordant with that of the hosts. In addition, by analyzing all the known Wolbachia of parthenogenetic collembolans, our study revealed the high genetic divergence within supergroup E.

| Collembolan species
Twelve lines of eleven collembolan species (

| Diagnostic screening for Wolbachia infection in Collembola
The presence of Wolbachia was tested via PCR amplification, specifically by targeting the Wolbachia 16S rRNA gene. Primary screening was performed on DNA samples extracted from multiple specimens (~100 individuals), with two sets of templates prepared for each collembolan line. For each confirmed host species of Wolbachia, the prevalence was calculated according to the screening results from 10 individuals. All extractions were conducted with the Wizard ® SV Genomic DNA Purification System (Promega). The primers employed for screenings are listed in Table S1.

| Amplification and sequencing of Wolbachia and collembolan genes
For each collembolan species carrying Wolbachia, we amplified the following genes from single specimens: seven Wolbachia genes, including the 16S rRNA gene, five MLST loci (gatB, coxA, hcpA, ftsZ, and fbpA), and the wsp gene; three host genes, including the mtCOI, 18S, and 28S rRNA genes. The primers employed for each gene are listed in Table S1. The PCR program was set to the following specifications: 3 min at 94°C, followed by 40 amplification cycles (30 s at 94°C, 30 s at the annealing temperature specific to the different primers, and 1 min per thousand base pairs at 72°C) and a 10 min final extension at 72°C.
Most of the PCR products were sequenced directly using the same primers used for amplification, except that some wsp genes were cloned for sequencing. Briefly, purified fragments were ligated into the pMD 19-T vector (Takara, Dalian) and transformed into competent Escherichia coli Top10 cells (TIANGEN). Putative positive clones were sequenced with M13 primers (Shanghai Sangon Biotech; Shanghai Sunny Biotechnology). DNA sequences were assembled with the embedded program Seqman from the DNASTAR package (Burland, 2000) and checked with BLAST at NCBI (http://blast.ncbi. nlm.nih.gov/Blast.cgi). Every gene sequence was verified from at least three individuals.

| Data assembly and phylogenetic analyses
The phylogenetic status of newly identified Wolbachia strains and their cophylogenetic patterns with hosts were examined using four For the dataset of Wolbachia 16S rRNA and ftsZ genes (Table S2), we chose 46 strains for which both gene sequences were available referring to Tanganelli et al. (2014). Sequences of other nine strains from supergroups A, B, F, I, K, and L were sampled from GenBank to optimize taxon sampling among Wolbachia supergroups and host taxonomy. Together with our new data on five Wolbachia strains, a total of 60 strains were included in phylogenetic analyses. The dataset of 57 taxa was also analyzed by excluding strains of supergroups I, K, and L, which were not covered in Tanganelli et al. (2014 (Table S3).
For the dataset of host genes, the corresponding sequences from one proturan (Baculentulus tianmushanensis) and two diplurans (Lepidocampa weberi and Octostigma sinensis) were retrieved from GenBank and used as outgroups (Table S4).
The nucleotide sequences of each gene were aligned separately on the GUIDANCE web server (Penn et al., 2010) using the algorithm MAFFT (Katoh & Standley, 2013). In the alignment of the wsp gene, columns with GUIDANCE scores lower than 0.93 were considered to be unreliable and were removed. For other genes, reliable regions of alignments were selected in GBLOCKS 0.91b (Castresana, 2000), with half of the gap positions being allowed. Finally, the alignments were checked for signs of intragenic recombination using the software package RDP3 (Martin et al., 2010), and recombinants were removed from the MLST and wsp datasets (Table S5). The alignments for each multiple-gene dataset were then concatenated, respectively, using the software BioEdit (Hall, 1999), with missing data assigned with gaps.

| Estimation of evolutionary divergence for collembolan Wolbachia
16S rRNA gene sequence is the only available data for all the seven Wolbachia strains reported from parthenogenetic collembolan (Czarnetzki & Tebbe, 2004;Tanganelli et al., 2014;Vandekerckhove et al., 1999). Pairwise genetic distances were estimated for the seven 16S rDNA sequences together with our newly recovered sequences from Wolbachia infecting parthenogenetic collembolans in MEGA6 (Tamura, Stecher, Peterson, Filipski, & Kumar, 2013), using the Kimura two-parameter model with gamma-shaped rate variation across sites.
The input alignment (1,237 nt in length) was generated with the method described above.
In order to further evaluate the divergence among Wolbachia of parthenogenetic collembolans, the mean pairwise distance within each Wolbachia supergroup was also calculated and compared, based on the alignments of 16S rRNA, ftsZ genes, and the MLST system used in phylogenetic analyses, respectively. All of these sequences were submitted to GenBank (Accession Nos. KT799584-KT799616), and the sequences of the MLST loci were accepted by the PubMLST database as well (Table S3).

| Phylogenetic analyses of collembolan Wolbachia
The alignment of the data assembly was 1,946 nt for the 60-taxon An independent clade was recovered for the three neelipleonan Wolbachia strains, including wMinc, which was apart from supergroup E, the clade for other symbionts of parthenogenetic collembolans (Fig.   S7A,B). wMinc itself could also form an independent clade, when six previously reported Wolbachia strains of parthenogenetic collembolans were excluded from analyses (Fig. S7C). Moreover, in the gene tree with 60 taxa, the "neelid group" was grouped with supergroups K and L (Fig. S7A). However, single-gene trees based on 16S rRNA and ftsZ showed different positions for collembolan Wolbachia strains with low support values (Fig. S7D,E).
ML and Bayesian analyses of the Wolbachia MLST system resulted in a new robust phylogeny: All of the collembolan Wolbachia strains identified in this study were grouped into a unique lineage as supergroup E, including wMinc, with high support values (ML: 91, Bayesian: 100; in percentage). All supergroups were distinguished from each other, and supergroup E was the sister-group to clade A + H (Figure 1).
This monophyletic clade for collembolan Wolbachia was consistently recovered in phylogenies of individual genes, although topologies of these trees showed some differences (Fig. S8).
The wsp phylogeny further confirmed that the clade recovered from parthenogenetic collembolan species was monophyletic, by placing wMinc in supergroup E (Fig. S9). In addition, none of the collembolan Wolbachia sequences were identified as an outcome of recombination between any other sequences in the MLST and wsp datasets (Table S5).

| Phylogenetic comparison between Wolbachia and Collembola
Although supergroup E was distinct and monophyletic, its members did not cluster with their hosts taxonomically. In the concatenated MLST phylogeny (Figure 2), wThou infecting T. houtanensis

| Genetic divergence of Wolbachia from parthenogenetic collembolans
Among all the known Wolbachia of parthenogenetic collembolans and Neelus murinus) are <2% from most of the other symbionts, except for comparisons with wMyos and wThou (Figure 3b). The genetic distance of 16S rRNA sequences >2% was considered necessary for establishing the new supergroup for Wolbachia (Augustinos et al., 2011); thus, the previously proposed "neelid group" was not supported.
In addition, compared with other Wolbachia supergroups, the average genetic distance of 16S rDNA within supergroup E is moderately divergent; however, its average genetic distances of protein-coding genes are much higher, no matter whether wMyos is included (Figure 3c), suggesting that supergroup E is the most divergent lineage in Wolbachia.

| DISCUSSION
In this study, all the Wolbachia recovered from parthenogenetic collembolans, including a strain from neelipleonan species, were assigned to supergroup E with the five MLST loci (Figure 1). On the contrast, the typing system based on 16S rRNA and ftsZ genes is insufficient in identifying supergroup E members for the following reasons. First, the 16S rRNA gene is highly conserved, with an average diversity of 3.61% between supergroups (Ros et al., 2009). The missing first 400 sites of the two available sequences from supergroup H in the 16S rRNA alignment further reduced the divergence between members of supergroups H and E, leading to a mixed clade of these groups in the phylogenetic tree of 16S rRNA (Fig. S7D). Second, the ftsZ sequences for the two reported strains of neelid group (Wolbachia of N. murinus and Meg. minimus) are only 480 bp in length and provide only 5/7 sites in the ftsZ alignment, while other E-clade members, as well as the related lineages such as supergroup H, K, and L, have few missing data.
With new data of collembolan Wolbachia that were recovered from Chinese springtails, we observed discordance between the phylogenies F I G U R E 2 Comparison between the phylogenies of the Wolbachia stains recovered in this study and their parthenogenetic collembolan hosts. The phylogeny of collembolan Wolbachia (left) was derived from the MLST tree (Figure 1). The phylogenetic relationships among host species (right) were examined with three outgroups (the proturan species Baculentulus tianmushanensis and two dipluran species, Lepidocampa weberi and Octostigma sinensis). The same tree topology was obtained from Bayesian and ML inferences based on the concatenated dataset of mtCOI, 18S rRNA, and 28S rRNA gene sequences. Collembolans are color-coded taxonomically at the level of orders F I G U R E 3 Genetic divergence between the Wolbachia infecting different parthenogenetic collembolan species. (a) Information on all the known parthenogenetic collembolans infected with Wolbachia. Host species (stocks) identified in this study are indicated in bold. Wolbachia of Mes. Macrochaeta was not included in our phylogenetic analyses (referring to Fig. S7), because its ftsZ gene sequence is not available. The GenBank accession number for its 16S rDNA sequence is AJ422184. (b) Pairwise distances for the 16S rRNA gene sequences of 12 Wolbachia strains infecting parthenogenetic collembolans are shown in the line graph. Wolbachia strains are represented with the name of their host species. The scale value of 0.02 is marked with a red line, because the value was considered a threshold to establish new supergroups for Wolbachia (Augustinos et al., 2011). Sequence pairs concerning about wMyos infecting Mes. yosii and wThou infecting T. houtanensis are indicated with big square bracket and vertical line, respectively. (c) The average evolutionary divergence over sequence pairs within each Wolbachia supergroups was calculated, using 16S rRNA, ftsZ, and MLST scheme, respectively. The values of distances are shown with three digits after the decimal point, and the maximums are indicated in bold. The "n/c" denotes that the value cannot be calculated due to only one related sequence available in the supergroup. The "-" indicates the absence of the related sequence in the supergroup of supergroup E members and their hosts (Figure 2). A main cause of the non-matching phylogenies should be the position of wMyos, which is the sister-group to all other strains in clade E in MLST tree ( Figure 1).
As wMyos is the most divergent strain in supergroup E (Figure 3), its position in trees may be affected by long-branch attraction, and its variation is probably related to the special habitat of its host (a sandy beach on an island). Generally, discordant phylogenies of hosts and Wolbachia are explained by horizontal transmission of Wolbachia (Ferri et al., 2011;Werren et al., 1995), and it might be the case in collembolan Wolbachia (Timmermans et al., 2004). However, in this study, no intragenic recom-  (Tanganelli et al., 2014).
Whether it is a secondary loss still needs further study covering more data on collembolan Wolbachia.
Most of the identified collembolan Wolbachia endosymbionts have been recovered from parthenogenetic species and belong to supergroup E, except for a B-type strain that infects O. cincta. Considering the extremely low prevalence reported in O. cincta (Timmermans et al., 2004), retracing the origin of the symbiont is a difficult task. In our unrooted MLST tree, supergroup E is sister to the A + H clade; however, the sister-group relationship between A and H is not well supported ( Figure 1). As supergroup H was considered the sister-group of supergroup E in most of the previous studies (Bordenstein et al., 2009;Lefoulon et al., 2016;Lo et al., 2007), the relationship between A, H, and E would be better illustrated when more MLST sequences are available, especially for supergroup H. The phylogenetic inferences based on 16S rRNA and ftsZ genes indicate that supergroups K and L might also be close to supergroup E (Fig. S7A). Based on rooted phylogenies, a recent phylogenomic study of Wolbachia supergroup relationships proposed that supergroup E, represented by the endosymbiont of F. candida, is the sister-group to all other studied supergroups within genus Wolbachia (Gerth, Gansauge, Weigert, & Bleidorn, 2014). However, the conclusion was drawn out when only one supergroup E strain was analyzed, insufficient data for the coherent lineage supergroup H were used (only one representative, with 19 genes acquired of 90 targeted), and no data for supergroups K and L were involved. In our opinion, the phylogenetic status of supergroup E would be better resolved with genomic data from more collembolan symbionts, as well as additional data from closely related clades, such as supergroups H, K, and L.
In conclusion, we enriched the data of collembolan Wolbachia by diagnostic screening in Chinese collembolans for the first time and revisited the supergroup status of Wolbachia from parthenogenetic collembolans. Using the Wolbachia MLST system, we support the hypothesis of monophyletic origin of symbionts for parthenogenetic collembolan species by designating all five Wolbachia strains recovered in this study to supergroup E, including the neelipleonan Wolbachia wMinc.
The lack of intragenic recombination in MLST and wsp genes between supergroup E and other supergroups further supports supergroup E as a unique clade. Particularly, high genetic divergence within supergroup E is identified, suggesting supergroup E as one of the most divergent lineages in Wolbachia. In addition, the inconsistency between the phylogenies of Wolbachia and parthenogenetic collembolans is newly discovered, which might be a sign of Wolbachia horizontal transmission between different hosts, but on the other hand, it might be caused by the highly differentiated collembolan Wolbachia strain, wMyos, for which host species was collected from a special habitat: a sandy beach on an island.