Diminutive, degraded but dissimilar: Wolbachia genomes from filarial nematodes do not conform to a single paradigm

Wolbachia are alpha-proteobacteria symbionts infecting a large range of arthropod species and two different families of nematodes. Interestingly, these endosymbionts are able to induce diverse phenotypes in their hosts: they are reproductive parasites within many arthropods, nutritional mutualists within some insects and obligate mutualists within their filarial nematode hosts. Defining Wolbachia ‘species’ is controversial and so they are commonly classified into 17 different phylogenetic lineages, termed supergroups, named A–F, H–Q and S. However, available genomic data remain limited and not representative of the full Wolbachia diversity; indeed, of the 24 complete genomes and 55 draft genomes of Wolbachia available to date, 84 % belong to supergroups A and B, exclusively composed of Wolbachia from arthropods. For the current study, we took advantage of a recently developed DNA-enrichment method to produce four complete genomes and two draft genomes of Wolbachia from filarial nematodes. Two complete genomes, wCtub and wDcau, are the smallest Wolbachia genomes sequenced to date (863 988 bp and 863 427 bp, respectively), as well as the first genomes representing supergroup J. These genomes confirm the validity of this supergroup, a controversial clade due to weaknesses of the multilocus sequence typing approach. We also produced the first draft Wolbachia genome from a supergroup F filarial nematode representative (wMhie), two genomes from supergroup D (wLsig and wLbra) and the complete genome of wDimm from supergroup C. Our new data confirm the paradigm of smaller Wolbachia genomes from filarial nematodes containing low levels of transposable elements and the absence of intact bacteriophage sequences, unlike many Wolbachia from arthropods, where both are more abundant. However, we observe differences among the Wolbachia genomes from filarial nematodes: no global co-evolutionary pattern, strong synteny between supergroup C and supergroup J Wolbachia, and more transposable elements observed in supergroup D Wolbachia compared to the other supergroups. Metabolic pathway analysis indicates several highly conserved pathways (haem and nucleotide biosynthesis, for example) as opposed to more variable pathways, such as vitamin B biosynthesis, which might be specific to certain host–symbiont associations. Overall, there appears to be no single Wolbachia –filarial nematode pattern of co-evolution or symbiotic relationship.


INTRODUCTION
The endosymbiotic alpha-proteobacterium Wolbachia represents a striking model for studies of symbioses. These bacteria have been detected in a large proportion of arthropods, where they are considered one of the most widespread symbionts [1,2], and in only two divergent families of parasitic nematodes (filarial nematodes in vertebrates and pratylenchid nematodes feeding on plants) [3,4]. The nature of the relationships with their hosts is particularly fascinating. In arthropods, some Wolbachia are reproductive parasites inducing different phenotypes such as cytoplasmic incompatibility (CI), male-killing, parthenogenesis or feminization of genetic males [5][6][7][8]. Others Wolbachia are nutritional mutualists, as in the case of bedbug symbionts [9], while in the case of the filarial nematodes, their Wolbachia are obligate mutualists [10]. Numerous Wolbachia genomes have been subjected to genomic analyses to determine the nature of the symbiosis [11][12][13][14][15]. Some candidate genes have been identified as being involved in CI [16,17], male-killing [18], feminization [19] and nutritional supplementation [20,21]. The CI phenotype is being exploited as a tool for human disease prevention of mosquito-borne diseases as it is able to suppress pathogens, notably RNA arboviruses (such as dengue, chikungunya, Zika and yellow fever, and potentially other human pathogens, such as Plasmodium) [22][23][24][25][26][27]. With respect to the filarial nematodes, much of the research effort has been focused on drug screening or various treatments targeting Wolbachia to kill the parasitic species responsible for human diseases of lymphatic filariasis (elephantiasis) or onchocerciasis (river blindness), which are a major cause of global morbidity [13,[28][29][30][31][32]. Anti-filarial Wolbachia screening projects using genomic information [28,[33][34][35][36][37] and/or mass screening of chemicals, drugs and biomolecules from varied pre-existing, diversified or focused molecular libraries that select inhibitors of Wolbachia development and reproduction are underway [13,30,[38][39][40][41][42]. Nonetheless, the mechanisms underpinning the obligate mutualism between Wolbachia and their filarial hosts remain largely unknown.
To date, four complete genomes of Wolbachia from filarial nematodes have been published: first, the symbiont of the human parasite Brugia malayi, wBm [33], followed by the symbiont of the bovine parasite Onchocerca ochengi, wOo [43], then the symbiont of the human parasite Onchocerca volvulus, wOv [44], and recently the symbiont of zoonotic parasite Brugia pahangi, wBp [45]. The hypothesis of potential provisioning of resources [such as haem, riboflavin, flavin adenine dinucleotide (FAD) or nucleotides] by Wolbachia to

Impact Statement
Wolbachia are endosymbiotic bacteria infecting a large range of arthropod species and two different families of nematodes, characterized by causing diverse phenotypes in their hosts, ranging from reproductive parasitism to mutualism. While available Wolbachia genomic data are increasing, they are not representative of the full Wolbachia diversity; indeed, 84 % of Wolbachia genomes available from the National Center for Biotechnology Information database to date belong to the two main studied clades (supergroups A and B, exclusively composed of Wolbachia from arthropods). The present study presents the assembly and analysis of four complete genomes and two draft genomes of Wolbachia from filarial nematodes. Our genomic comparisons confirm the paradigm that smaller Wolbachia genomes from filarial nematodes contain low levels of transposable elements and the absence of intact bacteriophage sequences, unlike many Wolbachia from arthropods. However, data show disparities among the Wolbachia genomes from filarial nematodes: no single pattern of co-evolution, stronger synteny between some clades (supergroups C and supergroup J) and more transposable elements in another clade (supergroup D). Metabolic pathway analysis indicates both highly conserved and more variable pathways, such as vitamin B biosynthesis, which might be specific to certain host-symbiont associations. Overall, there appears to be no single Wolbachiafilarial nematode pattern of symbiotic relationship.
their host nematodes has been suggested, beginning with the first comparative genomic analysis [33]. However, the analysis of the highly reduced wOo genome did not strongly support the hypothesis of provisioning of vitamins or cofactors by this strain (the riboflavin metabolism and FAD pathways are incomplete), and transcriptomic analysis suggested more of a role in energy production and modulation of the vertebrate immune response [43]. Alternatively, the relationship between filarial nematodes and Wolbachia may represent a 'genetic addiction' rather than genuine mutualism [46]. Wolbachia genomes from filarial nematodes as compared to Wolbachia genomes from arthropods have smaller sizes [between 957 990 bp for wOo to 1 080 084 bp for wBm versus 1 250 060 bp for wMel (from Drosophila melanogaster), 1 267 782 bp for wCle (from Cimex lectularius), 1 587 994 bp for wPip (from Culex pipiens) and 1 801 626 bp for wFol (from Folsomia candida)]. In addition, the presence of fewer transposable elements [such as insertion sequence (IS) elements and group II intron-associated genes], prophage-related genes and repeat-motif proteins (such as ankyrin domains) has been described [43,47].
The notion of Wolbachia species remains under debate within the scientific community [48][49][50][51]. It is commonly accepted to describe the various Wolbachia strains as belonging to different phylogenetic lineages as 'supergroups' (currently A-F, H-Q and S). In the past, two Wolbachia supergroups G and R had been demonstrated as invalid based on further phylogenetic analyses [52,53]. The first appearance of the 'supergroups' designation dates to 1998 [54], but the concept was popularized later by Lo et al. [55]. Most of the molecular characterizations of Wolbachia strains have been based on either single gene or multi-locus phylogenies [53,[55][56][57][58][59][60][61][62][63][64][65]. The supergroups A, B, E, H, I, K, M, N, O, P, Q and S are exclusively composed of symbionts of arthropods [55,57,59,63,[66][67][68][69][70]. In contrast, supergroups C, D and J are restricted to filarial nematodes [4,58,61,71], whereas supergroup L is found only in plant-parasitic nematodes [3,72]. Supergroup F is, so far, the only known clade comprising symbionts of filarial nematodes as well as arthropods [55,56,73]. Initially, the delimitation of these supergroups was defined arbitrarily by a threshold of 2.5 % divergence of the Wolbachia surface protein gene (wsp) [54]. However, after it was demonstrated that wsp could recombine between Wolbachia strains [74], a multilocus sequence typing (MLST) approach for Wolbachia was proposed [60]. These typing methods were developed based almost exclusively on analyses of supergroup A and B Wolbachia [60], as they constituted the majority of genomes sequenced at that time. Recently, there has been an effort to revisit the MLST paradigm [75] and attempts to classify Wolbachia based on genomics [50]. However, increased genomic information is needed to appraise the phylogenetic diversity of Wolbachia representatives from filarial nematodes. The presently available Wolbachia genomic information is not fully representative of Wolbachia diversity.
Recently, a method based on biotinylated probes was developed to capture large fragments of Wolbachia DNA for sequencing, using PacBio technology (large enriched fragment targeted sequencing -LEFT-SEQ) [76] adapted from previous capture methods using Illumina technology [77,78]. We used this enrichment method to produce draft or complete genomes of Wolbachia from a diversity of filarial nematodes species: Cruorifilaria tuberocauda, a parasite of the capybara, a cavy rodent; Dipetalonema caudispina, a parasite of spider monkeys; Litomosoides brasiliensis, a parasite of bats; Litomosoides sigmodontis, a parasite of cricetid rodents; Dirofilaria (Dirofilaria) immitis, a parasite of canines; and Madathamugadia hiepei, a parasite of geckos. These species had been previously characterized as positive for Wolbachia infections (two supergroup J, two supergroup D, one supergroup C and one supergroup F) [58]. In the present study, we took advantage of this newly explored diversity to draw a more comprehensive picture of symbiosis between Wolbachia and their filarial nematode hosts.
The DNA samples of Cruorifilaria tuberocauda, Dipetalonema caudispina, Litomosoides brasiliensis and Madathamugadia hiepei were provided by the National Museum of Natural History (MNHN), Paris, France. These DNA samples had been extracted from adult worms for a previous study [79].  [44,[80][81][82]. However, the lack of a consensus on the nomenclature of Wolbachia strains has already led to some confusion, as wDi can refer to Wolbachia from Dirofilaria (Dirofilaria) immitis [81] as well as Wolbachia from Diaphorina citri [83]. Therefore, to avoid confusion, we use the longer strain abbreviations in this paper.
The DNA of the Madathamugadia hiepei sample 81YU had been extracted previously and conserved at −20 °C for 8 years [79]. The DNA of the Dipetalonema caudispina sample 362YU2 had been extracted in January 2015 and then conserved at −20 °C for 4 years (unpublished data). A new fragment of a specimen from the same lot (362YU3) was also obtained for new DNA extraction. DNA was extracted from all samples using the DNeasy kit (Qiagen), following the manufacturer's recommendations, including overnight incubation at 56 °C with proteinase K.

Library preparations
According to the amount and quality of DNA of each sample, different library preparation protocols were utilized ( For Madathamugadia hiepei and Litomosoides brasiliensis, the DNA concentrations were either of low concentration or too fragmented to be used for the LEFT-SEQ protocol. For these, the enrichment method adapted for Illumina sequencing was a hybrid protocol between the method described by Geniez et al. [78] and that of Lefoulon et al. [76]. We followed the same procedure for the freshly extracted DNA of Dipetalonema caudispina (362YU2). DNA samples (50 ng for 81YU, 75 ng  Table 2.

De novo assembly pipeline
The bioinformatics pipeline was slightly different for the six samples, because of the variations in sequence protocols (as described above). However, the pre-processing of the reads was similar for both Illumina and PacBio data. The Illumina reads were filtered using the wrapper Trim Galore! (https:// www. bioinformatics. babraham. ac. uk/ projects/ trim_ galore/). For PacBio, circular consensus sequences (CCSs) were generated using the SMRT pipe RS_ReadsOfInsert protocol (PacBio) with a minimum of three full passes and minimum predicted accuracy greater than 90 %. The adapter and potential chimeric reads were removed using seqkt ( github. com/ lh3/ seqtk) as described by Lefoulon et al. [76] (analyses were performed with an in-house shell script). When sufficient PacBio reads were obtained, a de novo long-read assembly was done using Canu [84] according to Lefoulon et al. [76]. Otherwise, a first hybrid de novo assembly was done using Spades [85]. The contigs belonging to Wolbachia were detected by nucleotide similarity using blastn (similarity greater than 80 %, bitscore greater than 50) [86] and isolated. The remaining contigs were manually curated to eliminate potential non-Wolbachia sequence contaminations. To improve the complete assembly of the Wolbachia genomes, a selection of reads mapping to the first Wolbachia draft genome was produced. The Illumina reads were merged with pear [87] (in the case of end-paired reads) and mapped against this contig selection using Bowtie2 [88]. The PacBio reads were mapped against this contig selection using ngmlr (with the PacBio pre-set settings) [89]. A second hybrid de novo assembly was then performed with this new selection of reads using Unicycler [90]. A second selection of Wolbachia contigs using blastn was then performed and manually curated to eliminate potential contaminations. Assembly statistics were calculated using quast [91]. PCR primers were designed to confirm the sites of circularization of the single contigs when applicable (Supplementary file S2).
In addition, contigs representative of filarial nematode genomes were isolated from the first de novo assembly. Mapped reads were selected, as described above, and de novo assemblies of the host genomes were produced using Unicycler [90].
The completeness of the draft genomes was studied using busco v3, which analyses the gene content compared to a selection of near-universal single-copy orthologous genes. This analysis was based on 221 genes common among proteobacteria for the draft genomes of Wolbachia (proteobac-teria_odb9), while it was based on 982 genes common among nematodes for the host draft genomes (nematoda_odb9).

Comparative genomic analyses and annotation
We used different comparative analyses between the produced draft genomes and a set of eight available complete genomes and seven draft genomes of Wolbachia (Table S1) [92].
We calculated the average nucleotide identity (ANI) between the different Wolbachia genomes using ANI Calculator [93] and an in-silico genome-to-genome comparison was done to calculate a digital DNA-DNA hybridization (dDDH) using ggdc [94]. The calculation of dDDH allows analysis of species delineation as an alternative to the wet-lab DNA-DNA hybridization (DDH) used for current taxonomic techniques. ggdc uses a genome blast distance phylogeny approach to calculate the probability that an intergenomic distance yielded a DDH larger than 70 %, representing a novel species-delimitation threshold [94]. We used formula two to calculate the dDDH, because it is more robust using incomplete draft genomes [95].
The Wolbachia genomes were analysed using the RAST pipeline [96]. In order to compare the nature of these genomes using the RAST pipeline, we identified the percentage of coding sequences (CDSs) (calculated by a ratio between total base pairs of CDSs and total genome sequence base pairs), the presence of mobile elements, the group II intron-associated genes, the phage-like genes and the ankyrin-repeat protein genes. The presence of potential ISs was detected using ISsaga [97] (degraded sequences were not manually curated) and the presence of prophage regions was detected using phaster [98]. The Wolbachia genomes were annotated using Prokka [99]. We also examined the correlation between the size of the genome and the previously described genetic characteristics using the Spearman's rank correlation or Pearson rank correlation tests (if applicable after Shapiro-Wilk test) in the R environment [100]. KEGG orthology (KO) assignments were generated using KASS (KEGG Automatic Annotation Server) [101]. KASS assigned orthologous genes by a blast comparison against the KEGG genes database using the BBH (bi-directional best hit) method. The same assignment analysis was performed for the newly produced genomes and the set of 15 Wolbachia genomes from the National Center for Biotechnology Information (NCBI) database. The assigned KO were ordered in 160 different KEGG pathways (Table S2).
Several pathways that showed differences in the number of assigned genes between Wolbachia genomes were selected. A list of genes assigned to these pathways was compiled to study potential losses/acquisitions of these genes across the various Wolbachia (Table S3).

Phylogenomic analyses
Single-copy orthologue genes were identified from a selection of Wolbachia genomes using Orthofinder (version 2.2.6) [102]. Three phylogenomic studies were performed: the first included only 11 Wolbachia genomes from filarial nematodes; the second included only 25 complete genomes, and the third included 49 complete or draft genomes (Table  S1). The supermatrix of orthologue sequence alignments was generated by Orthofinder (implemented as functionality). The poorly aligned positions of the orthologous gene alignments produced were eliminated using Gblocks (version 0.91b) [103]. The phylogenetic analyses were performed with maximum-likelihood (ML) inference using iq-tree (version 1.5.5) [104]. The most appropriate model of evolution was evaluated by ModelFinder (implemented as functionality of iq-tree) [104]. The robustness of each node was evaluated by a bootstrap test (1000 replicates). The phylogenetic trees were edited in FigTree (https:// github. com/ rambaut/ figtree/) and Inkscape (https:// inkscape. org/). To study the evolution of the filarial hosts infected with Wolbachia, the same workflow was applied to the amino-acid files previously produced by the busco analysis (Augustus implemented as functionality) based on the set of 982 orthologue genes common among nematodes (nematoda_odb9) (version 3.0.2) (

Synteny and co-evolutionary analyses
The potential positions of the origin of replication (ORI) were identified based on the ORI position in the wMel and wBm genomes according to Ioannidis et al. [105] for the complete genomes generated for wDimm, wLsig, wDcau and wCtub, as well as available complete genomes of Wolbachia from supergroup C (wOo and wOv), supergroup D (wBp) and supergroup F (wCle), and wCfeJ. The genome sequences were reorganized to start at the potential ORI position to study the genome rearrangement. Then, a pairwise genome alignment of these genomes was produced and plotted using MUMmer v3 [106].
Two global-fit methods were used to study the cophylogenetic pattern between filarial nematodes and their Wolbachia symbionts: PACo application [107] and Parafit function [108] both in the R environment [100]. For these analyses, we independently produced two ML phylogenies: the phylogenetic tree of the 11 Wolbachia from filarial nematodes and the phylogenetic tree of the 11 filarial nematodes as described above. The global-fit method estimates the congruence between two phylogenetic trees changing the ML phylogenies into matrices of pairwise patristic distance, themselves transformed into matrices of principal coordinates (PCo). Then, PACo analysis transformed the symbiont PCo using leastsquares superimposition (Procrustes analysis) to minimize the differences with the filarial PCo. The global fit was plotted in an ordination graph.
The congruence of the phylogenies was calculated by the residual sum of squares value (m 2 XY ) of the Procrustean fit calculation. Subsequently, the square residual of each single association and its 95 % confidence interval were estimated for each host-symbiont association and plotted in a bar chart [107]. A low residual value represented a strong congruence between symbiont and filarial host. In addition, the global fit was estimated using Parafit function [108] in the R environment [100]. The Parafit analysis tests the null hypothesis (H0), that the evolution of the two groups has been independent, by random permutations (1 000 000 permutations) of hostsymbiont association [108]. This test is based on analysis of the matrix of patristic distances among the hosts and the symbionts as described above for PACo.

De novo assembly and completeness of draft genomes
We were able to produce complete circular assemblies for four of the genomes, wCtub, wDcau, wLsig, wDimm, as well as a 41-contig draft genome for wLbra and a 208-contig draft genome for wMhie ( Table 3). The circularization of wCtub, wDcau, wLsig and wDimm was confirmed by PCR amplification of the sites of circularization of the single contigs (Supplementary file S2). The two supergroup J genomes, wCtub and wDcau, are the smallest observed among all sequenced Wolbachia, comprising 863 988 and 863 427 bp, respectively ( Table 3). The genome of wDimm displayed a total size of 920 122 bp and the total length of wLsig was 1 045 802 bp. The draft genome of wLbra had a total length of 1 046 149 bp, very close to the size observed for wLsig. Although the assembly remained fragmented, the draft genome of wMhie had a total length of 1 025 329 bp (Table 3). Varying success in producing complete genome sequences was attributed to DNA quantity and quality ( Table 2); for example, the low quantity and quality of DNA obtained for Madathamugadia hiepei and Litomosoides brasiliensis limited sequencing. The de novo assembly was successful with production of a circularized genome in the case of wLsig based on 180 870 CCS PacBio reads, wDimm based on 155 017 CCS PacBio reads and wCtub based on 179 618 CCS PacBio reads. Of the sequenced CCS reads, 94.67 % mapped to the draft genome for wLsig, 90.57 % for wDimm, but only 35 % for wCtub. However, for wDcau, the analysis of the sequenced 246 679 CCS PacBio reads using Canu did not produce an accurate draft genome (<100 000 bp total length). In contrast, a hybrid de novo assembly, based on all the sequenced data, produced a draft genome containing one large Wolbachia contig, which was circularized using minimus2. Only 1.8 % of the CCS reads produced with LEFT-SEQ (4466) mapped to this wDcau draft and the low efficiency was likely due to the 4-year-old extracted DNA that was used.  (Table S4). In general, Wolbachia genomes from arthropods present higher levels of busco completeness [e.g. Wolbachia from Drosophila melanogaster, wMel, has 180 busco genes (81.4 %)]. The higher busco completeness in these genomes could be because these genomes are less degraded than those of filarial Wolbachia.
Along with the assembly of the Wolbachia genome, draft genomes of the nematode hosts were produced (

ANI and dDDH
The ANI calculation indicates that wCtub and wDcau are divergent from other Wolbachia. For both, the most similar genome is wOv with 83 and 84 % identity, respectively (Fig. 1). The draft genome wLbra shows a stronger similarity of 90 % with the representatives of supergroup D: wBm, wBp, wWb and wLsig. The draft genome wMhie is most similar to wCle from supergroup F with 95 % identity (Fig. 1). Typically, strains representative of the same supergroup share strong identity: 99 % for wOo and wOv (from supergroup C), 99 % for wBm and wBp (from supergroup D), 97 % for wBm or wBp and wWb (from supergroup D), 95 % for wPip and wstri (from supergroup B) and 97 % for wMel and wCau (from supergroup A). A dDDH [94] metric higher than 70 indicates that the two strains might belong to the same species (see Methods).
The present in silico genome-to-genome comparison shows only five cases that might be considered as similar strains of Wolbachia : wCau and wMel; wBm and wBp; wBm and wWb; wBp and wWb; and wOo and wOv (Fig. 1). These proximities have been suggested elsewhere [50]. Both ANI and dDDH analyses suggest that the four newly sequenced Wolbachia genomes are divergent from published Wolbachia genomes.

Phylogenomic analyses
A total of 367 single-copy orthologous genes were identified from among the 25 available complete Wolbachia genomes. The newly sequenced wCtub, wDcau, wLsig and wDimm genomes were included in the ML phylogenetic analyses based on these 367 orthologous genes (Fig. 2a). This phylogenetic analysis confirms that wLsig belongs to supergroup D  and wDimm belongs to supergroup C, as has been described elsewhere [4]. wCtub and wDcau were grouped in the same clade, a sister taxon of the supergroup C. wCtub and wDcau have been described as representatives of supergroup J [58]. The phylogenomic analysis presented here supports the hypothesis that Wolbachia supergroup J is a clade distinct from Wolbachia supergroup C, although the two clades are closely related. A total of 160 single-copy orthologous genes were identified among the 49 complete and draft Wolbachia genomes (Fig. 2b). The two phylogenomic analyses indicate the same topologies for the complete genomes wCtub, wDcau, wLsig and wDimm. In addition, the phylogenomic analysis based on 160 genes shows that wLbra is closely related to wLsig, as a representative of supergroup D, and wMhie is closely related to wCle, as a representative of supergroup F. The draft genome wMhie is the first representative of supergroup F infecting a filarial nematode and the phylogenomic analysis confirms the evolutionary history of wLbra and wMhie, as previously deduced from multi-locus phylogenies [58,73].
The validity of supergroup J had been previously discussed; some studies using multi-locus phylogenies suggest that Wolbachia from Dipetalonema gracile (historically, the only known representative of supergroup J) belongs to supergroup C [57,62,110]. Interestingly, the ftsZ gene used in these MLST studies could not be detected in the wDcau and wCtub genomes. In addition, some multi-locus studies had observed PCR amplification of this gene to be unsuccessful within supergroup J Wolbachia [58,111], while other studies included an ftsZ sequence from Wolbachia from Dipetalonema gracile [56]. To resolve this contradiction, we compared the 33 sequences of Wolbachia from Dipetalonema gracile available from the NCBI database and our complete wDcau genome, which should be closely related, using nblast. For 6 of these 33 sequences, from the ftsZ gene, it appears unlikely that they belong to Wolbachia from Dipetalonema gracile, only having between 72.37 and 89.91% identity with wDcau (while all other PCR sequences show 95.36-99.98 % identity) ( Table S5). Four of these six sequences are identical to genes of Wolbachia from Drosophila spp., one sequence is closely related to genes of Wolbachia from supergroup B [62] and one sequence is closely related to genes of Wolbachia from supergroup C (the ftsZ sequence) [56] (Table S5). Thus, our data suggest that the variable position of Wolbachia from Dipetalonema gracile in previous multi-locus phylogenies might be linked to contamination or errors of sequence submission.

Synteny conservation and co-evolutionary analysis
Strong conservation of synteny among supergroup C genomes and supergroup J genomes was observed (Fig. 3). It had been previously shown that the supergroup C genomes wOo and wDimm exhibit a low level of intra-genomic recombination [47]. Our results indicate a similar pattern of strong conservation of synteny among supergroup J genomes (wDcau and wCtub) and, more interestingly, between supergroup J genomes and wDimm in supergroup C (Fig. 3). This is in contrast to alignment of the complete genomic assemblies between the supergroup D genomes, which show more rearrangement. Of further interest is the observation that a different level of rearrangement can be observed between wBm and wLsig or wBp and wLsig, even when wBm and wBp show less rearrangement between them. While wBm and wBp are characterized by a strong identity as described above (Fig. 1), similar to that observed between wOo and wOv, they show more rearrangement (Fig. 3).
The global-fit analyses do not show a global co-evolution pattern between filariae and their Wolbachia symbionts (PACo m 2 XY =0.038 with P value=1; ParaFitGlobal=0.0048 with P value=0.057; both 1×10 6 permutations). The superimposition plot indicates at least five groups of associations and shows strong inequality (Fig. 4a). The filarial nematodes Dirofilaria (Dirofilaria) immitis and Onchocerca spp. with their symbionts (supergroup C) show lower squared residuals and consequently strong co-evolution. By contrast, Madathamugadia hiepei and its symbiont (supergroup F) show high squared residual and consequently a weak co-evolution (Fig. 4b). The global-fit analysis confirms two different groups of association for Wolbachia from supergroup D and their filarial nematode hosts: on one hand, Brugia and Wuchereria species and their symbionts, and on the other hand, Litomosoides species and their symbionts (Fig. 4a). The same trend is observed for Wolbachia from supergroup J; the filarial nematodes Dipetalonema caudispina and Cruorifilaria tuberocauda and their symbionts present a higher squared residual than the residual sum of squares value (m 2 XY ) suggesting a low congruence of the phylogenies (Fig. 4b). These results support the hypothesis of local patterns of co-evolution with multiple horizontal transmission events of Wolbachia among the filarial nematodes as part of the evolutionary history of this host-endosymbiont system, as previously described [58].

Comparative genomics
We observed a positive correlation between Wolbachia genome size and the percentage of CDSs. Indeed, wDcau and wCtub have the smallest genomes, and have a low percentage of CDSs (71.45 and 74.63 %, respectively) (Fig. 5a). Similarly, a positive correlation was seen between Wolbachia genome size and transposable elements such as ISs, group II intron-associated genes and mobile elements (Fig. 5b, c,  d). Interestingly, amongst the Wolbachia from filarial nematodes, supergroup C and supergroup J Wolbachia are all characterized by the absence or very low levels of transposable elements, unlike supergroup D Wolbachia and wMhie (supergroup F) (Fig. 6, Tables S6-S8). We also observed a positive correlation between Wolbachia genome size and the amount of insertion of phage DNA, as recently described (Fig. 5e, f) [112]. We studied phage DNA by two types of analyses: we used RAST annotation [96] to detect phage or phage-like genes and phaster [98] to detect prophage regions. None of the genomes of Wolbachia from filarial nematodes have significant prophage regions (Table S9) but supergroup D (wBm, wBp and wWb), as well as the supergroup F (wMhie) Wolbachia genomes, contain phage-like gene sequences inserted in their genomes. In the case of wBm, wBp and wWb, mainly phage major capsid protein and some uncharacterized phage proteins were detected, representing 14 (total 2592 bp), 8 (total 1197 bp) and 4 regions (total 957 bp), respectively (Table S10). The closely related wLsig and wLbra, also belonging to supergroup D, do not appear to have phage protein sequences. In the case of wMhie, one phage major capsid protein and eight other phage proteins were detected, representing 5187 bp (Table S10).
A positive correlation was also observed between Wolbachia genome size and ankyrin repeat proteins (Fig. 5g). It has been suggested that Wolbachia from filarial nematodes are characterized by a low level of ankyrin repeat genes, suspected to have evolved as result of their mutualistic lifestyle [113][114][115]. Most of the Wolbachia from filarial nematodes have 1 to 3 copies of ankyrin repeat genes, with the exception of wMhie (supergroup F) with 5 copies, and the supergroup D strains, wBm, wBp and wWb, containing 14, 16 and 11 copies, respectively (Table S11).

Metabolic pathways
Using KAAS, we assigned genes from the 24 studied Wolbachia genomes (including the 7 draft genomes of wNfla, wLug, wstri, wVulC, wWb, wLbra and wMhie) to 160 different KEGG pathways (Table S2). Among these 160 KEGG pathways, 15 were selected based on strong variability among the genomes or because they had previously been suggested as being involved in symbiosis mechanisms [33,43] (Fig. 7, Table 3).
In the context of a nutritional provisioning hypothesis, we observed variability among genomes in the vitamin B metabolism pathways (Fig. 7). The thiamine metabolism   (vitamin B1) pathway appears conserved, with the exception of the tenA gene, detected only in some Wolbachia genomes from arthropods: wMel, wCauA, wNfla (supergroup A); wPip, wLug, wstri, wVulC (supergroup B); wFol (supergroup E); wCle (supergroup F); wCfeT and wCfeJ (no supergroup designated). As previously reported by Darby et al. [43], the riboflavin metabolism (vitamin B2) is incomplete for both wOo (two genes not identified) and wOv (four genes not identified), but only ribE is missing for wDimm, another representative of supergroup C. Similarly, a single gene (ribA) is missing for wDcau and wCtub (supergroup J), as is the case for wMhie (supergroup F) and wLbra (supergroup D). The folate (vitamin B9) and pyridoxine (vitamin B6) metabolisms appear incomplete for representatives of supergroup D (B9, no folA/B/C/KP for all representatives; B6, absent for wBp, wBm, wLsig, but pdxH present for wLsig and pdxH/pdxJ present for wWb), but mainly conserved for other Wolbachia strains (except for wPpe, supergroup L, wFol, supergroup E, and wCfeT, in which only the folC gene is present in the folate pathway; and the absence of the pdxJ gene from wOv in the pyridoxine pathway) (Fig. 7). As described by other authors, we note that only some Wolbachia have a complete biotin metabolism pathway (vitamin B7): wCle (supergroup F) [20], wNfla (supergroup A) [116], wstri and wLug (supergroup B) [21], and wCfeT (no supergroup designated) [92]. In addition, we observed a complete biotin metabolism pathway in wVulC (supergroup B). Interestingly, it has previously been suggested that supplementation of biotin by Wolbachia increases the fitness of insect hosts in the case of wCle from supergroup F [20], as well as wstri and wLug from supergroup B [21]. These genes could not be detected in the newly produced supergroup F genome, wMhie, from a filarial nematode host.
A further set of pathways previously considered of symbiotic interest, the de novo biosynthesis of purines and pyrimidines, has been identified in Wolbachia genomes, but was absent in other proteobacteria such as Rickettsia [33,43]. The pyrimidine metabolism pathway was complete for most of the Wolbachia genomes analysed in the present study, with the exception of wPpe (supergroup L) (Fig. 7). The purine metabolism pathway was almost complete for the entire genome set as well, with the exception of the purB gene, which could not be identified in a large number of symbionts of filarial nematodes (wLsig and wLbra, supergroup D; all representatives of supergroups C and J; wMhie, supergroup F) and some symbionts of insects (wNfla, supergroup A; wLug and wstri, supergroup B; wCle, supergroup F; wCfeJ and wCfeT). The purB gene encoding the adenylosuccinate lyase protein is involved in the second step of the sub-pathway that synthesizes AMP from IMP.
Another important pathway, haem metabolism, suggested to be involved in symbiotic mechanisms by several genome Fig. 6. Graph of IS elements, mobile elements and group II intron-associated genes identified in Wolbachia genomes. The Wolbachia supergroups are indicated in brackets: A to L and 'na' for Wolbachia without supergroup assignment. The graph on the left represents, in red, the number of mobile elements and, in black, the number of group II intron-associated genes detected in the studied genomes using RAST. The graph on the right represents the number of ORFs detected related to IS elements using ISsaga. For each of the detected ORFs related to an IS, the family of the IS is specified by a colour as indicated in the key below the graph. The number associated with each bar is the total number detected. Genomes produced in this study are indicated in bold. Fig. 7. Summary of the metabolic pathways detected in different Wolbachia genomes using KASS. A coloured square indicates that the sequence of the gene(s) on the left of the graph was detected, whereas a lighter shade of the same colour indicates only some copies or some genes were detected. The absence of a square indicates that a given sequence was not detected. The Wolbachia supergroups (A-L) are indicated by different colours: orange for supergroup A, dark blue for B, light green for C, light blue for D, pink for E, purple for F, yellow for J, khaki green for L, and grey when the strain is not described as belonging to a supergroup. Genomes produced in this study are indicated in bold.
analyses, was complete in many of the current genomes. Only one gene, bfr (encoding bacterioferritin, a haem-storage protein), was not detected in wBm and wWb (supergroup D) or any representatives of supergroups C and J. The oxidative phosphorylation metabolism pathway also appears highly conserved, although the cytochrome bd ubiquinol oxidase genes (cydA and cydB) were detected only for Wolbachia belonging to supergroup A and wCfeJ (no supergroup assigned).
With regard to potential host interaction systems, the different secretion system pathways are very conserved (Fig. 7). However, the type II secretion system gene encoding the general secretion pathway protein D (gspD) was neither identified in Wolbachia belonging to supergroups C and J, nor in wBm, wBp and wWb (supergroup D). Similarly, the gene secE involved in the type Sec-SRP pathway was absent in wNfla (supergroup A), wstri, wLug (supergroup B), wCfeT and wCfeJ (no supergroups assigned).
A number of additional interesting variations among the studied Wolbachia genomes were noted, in particular for the cell cycle pathway, the homologous recombination (HR) pathway, the ATP binding cassette (ABC) transporter genes and glycerophospholipid metabolism (Fig. 7). Regarding the cell cycle pathway, representatives of supergroup J showed losses of most cell division proteins (only ftsQ was detected in wCtub), one gene of the two-component system (pleD), as well as the aspartyl protease family protein gene perP. The ftsW cell division protein gene was identified in only a few genomes: wMel, wCauA, wNfla (supergroup A); wFol (supergroup E); wCle, wMhie (supergroup F); wLbra (supergroup D); wCfeT and wCfeJ. Similarly, losses of numerous genes were detected in the HR metabolism pathway involved in repair of DNA damage. Wolbachia belonging to supergroup C, wLsig and wLbra (supergroup D), and wDcau (supergroup J) showed losses of numerous genes within this set (5-9 genes) (Fig. 7). We detected no recombination protein rec or Holliday junction DNA helicase ruv genes in the wDcau, wOo or wOv genomes. Another pronounced difference observed among the studied Wolbachia genomes was the presence of genes encoding ATP binding cassette (ABC) transporters. These membrane transporters appear largely depleted in the Wolbachia genome representatives of supergroups J and C, as well as in wLsig and wLbra (supergroup D) (Fig. 7). The haem exporter, phosphate transport system, lipoprotein-releasing system and zinc transport system appeared to be very conserved, unlike the biotin transport system, iron(III) transport system and phospholipid transport systems.

DISCUSSION
The LEFT-SEQ method was applied to four invertebrate DNA samples, enabling us to produce four complete Wolbachia genomes: wLsig, wDimm, wDcau and wCtub. For wLsig and wDimm, draft genomes had previously been sequenced and analysed [47,81,82] but not submitted to the NCBI database. The complete genomes of wDcau and wCtub are, so far, the smallest Wolbachia genomes. Using the enrichment method associated with Illumina sequencing [78], two draft genomes were sequenced, a 41-contig wLbra draft genome and a 208-contig wMhie draft genome. Our data confirmed that wLsig and wLbra belong to supergroup D, wDimm resides in supergroup C, wMhie belongs to supergroup F, and wDcau and wCtub form a well-supported clade, supergroup J. Thus, wDcau and wCtub are now the first representative genomes of supergroup J, while wMhie constitutes the first representative genome of supergroup F from a filarial nematode. The ANI and dDDH index indicate that wDcau, wCtub, wDimm, wLsig and wLbra are clearly divergent from other studied Wolbachia genomes (all ≤90 % ANI and <70 % dDDH) (Fig. 1). Regarding wMhie, the ANI is 95 % with wCle, suggesting a close genetic proximity, although the dDDH is equal to 58.12 % (modelbased confidence intervals: 76-82.5 %), below the threshold of 70 %. Our data suggest that these two Wolbachia are very similar, despite the fact that one infects a filarial nematode and the other infects bedbugs.
The analysis of supergroup J Wolbachia further highlights limitations of the current MLST system. In the past, the only representative identified from this supergroup was the symbiont of Dipetalonema gracile, a filarial nematode parasite of monkeys. This symbiont was first described as a deep branch within supergroup C [61], and subsequently as the divergent clade J [67,111]. The latter phylogenetic position of this Wolbachia has been questioned by some authors and is often retained as belonging to supergroup C [57,62,110]. More recently, using a concatenation of seven genes and newly studied Wolbachia, Lefoulon et al. [58] demonstrated the validity of supergroup J, as distinct from supergroup C; a phylogenetic position confirmed by the present study (Fig. 2). Our analyses show that the ftsZ gene is not present (or is highly degenerate) in the wDcau and wCtub genomes, while previous Wolbachia phylogenies have been based on this marker. Our analyses (Table S5) suggest that the variable position of Wolbachia from Dipetalonema gracile in some phylogenetic analyses is linked to the fact that some database sequences likely do not belong to this strain.
The two complete genomes, wDcau and wCtub, are divergent from supergroup C Wolbachia. In addition, they are highly divergent from each other, with an ANI of 81%, despite the fact they form their own clade (Fig. 1). These divergences have been suggested by earlier multi-locus phylogenies with Wolbachia from Cruorifilaria and Yatesia species forming one subgroup, and Wolbachia from Dipetalonema spp. forming another subgroup within supergroup J [58]. Our data suggest that the use of the ftsZ gene for MLST studies is not appropriate for Wolbachia that are highly divergent. The fact that the MLST system was designed on the basis of supergroups A and B Wolbachia [60], which have low genetic diversity [50], is a source of concern for its general use when studying divergent phylogenies. Moreover, the risk of erroneous data finding their way into databases (e.g. through contamination, misidentification), combined with the fact that sequences used to build concatenated matrices very often do not originate from the same specimen, weakens multi-locus phylogenies, unless potential confounding factors are taken into consideration.
The symbiosis between Wolbachia and filarial nematodes was often considered and analysed as a uniform pattern of association, but our results reveal strong disparities. Indeed, the genomes of supergroup J present a strong synteny pattern, as was previously described for representatives of supergroup C, unlike those of supergroup D [47] (Fig. 3). We even observe a strong synteny pattern between wDimm (supergroup C) and wCtub or wDcau (supergroup J). Interestingly, the smaller Wolbachia genomes present a low number of genomic rearrangements, associated with the absence or low numbers of transposable elements (either ISs, mobile elements or group II introns) (Fig. 6). Our data support the paradigm that a major difference between Wolbachia from filarial nematodes and those from arthropods is a reduced genome containing fewer (or even zero) transposable elements, prophage-related genes or repeat-motif proteins (as ankyrin domains) [43]. Furthermore, our results highlight the distinction between supergroups C and J Wolbachia versus the supergroup D and F Wolbachia. In addition to having larger genomes, more transposable elements were identified in these genomes: in supergroups D and F, wLbra, wWb and wMhie contain more mobile elements; and wLsig, wLbra and wMhie more ISs (Fig. 6). Traditionally, studies of genome reduction in the cases of symbiotic bacteria indicate an expansion of mobile genetic elements in the initial stages of bacterial adaptation to a host-dependent lifestyle and an absence of mobile genetic elements in long-term obligate symbiosis associations [117]. This suggests that the different associations of Wolbachiafilarial nematodes represent different stages of host-dependent adaptation. Initially, it had been suggested that Wolbachia symbionts co-evolved with their filarial nematodes [4]. Supergroup F Wolbachia were thought to be the only example of horizontal transfer among the filarial nematodes [56,73]. A recent revision of the co-phylogenetic patterns of Wolbachia in filariae based on multi-locus phylogenies suggests that only supergroup C Wolbachia exhibit strong co-speciation with their hosts [58]. Indeed, our global-fit analyses are not compatible with a global pattern of co-evolution, but rather support the hypothesis of two independent acquisitions of supergroup D and J (Fig. 4). These results highlight a differential evolution of Wolbachia symbiosis among the various filarial nematodes, likely having evolved from different acquisitions and subject to different selective pressures (Fig. 4, Supplementary file S3).
Another important aspect of Wolbachia diversity is the association between some Wolbachia and the WO bacteriophage [118][119][120][121]. Indeed, prophage regions have been identified in numerous Wolbachia genomes and the fact that these insertions have not been eliminated by selective pressure support the hypothesis that they could provide factors of importance to Wolbachia [119,122]. In the case of Wolbachia from arthropods, these insertions can constitute a large proportion of the Wolbachia genome. For example, it was recently shown that 25.4 % of the wFol genome comprises five phage WO regions [112]. Our analyses indicate that the largesized genomes, such as wstri or wVulC, have large regions of WO prophage (Fig. 5f, Table S9). No intact region or only vestiges of prophage regions had been observed in previously studied Wolbachia genomes infecting filarial nematodes [33,43]. Our data support this absence of prophages in the newly studied genomes in this report. However, we detected some genes annotated as phage-like in the cases of wBm, wBp and wWb (supergroup D) and wMhie (supergroup F), unlike other representatives of supergroup D (wLsig or wLbra), and the genomes belonging to supergroups C and J Wolbachia (Table S10). Interestingly, while the bedbug symbiont wCle (supergroup F) has fewer phage genes than other Wolbachia from arthropods, numerous phage elements have been found in Wolbachia sequences integrated into the nuclear genome of a strongyloidean nematode (Dictyocaulus viviparus) [110], which were allocated to supergroup F, suggesting significant variation in the role of phage WO within this clade. So far, wWb, wBm and wBp have the largest Wolbachia genomes of filarial nematodes, and while these phage-like insertions represent a negligible proportion of the entire genome, they nevertheless suggest that wWb, wBm and wBp were in contact with bacteriophages that successfully inserted DNA in their respective genomes. At the same time, our study shows that numerous genes involved in HR and the cell cycle pathway (Fig. 7) are absent in the Wolbachia from filarial nematodes other than wBm, wBp and wWb; thus, insertion of DNA might not be possible for the bacteriophages due to the nature of these genomes themselves.
Supergroup F is particularly interesting as it represents the only clade composed of Wolbachia symbionts of both arthropods and filarial nematodes, suggesting horizontal transfer between the two phyla [111]. Previous studies suggested it is more likely that the infection by supergroup F Wolbachia derived from multiple independent host switch events in the Filarioidea, because they infected species that are not closely related [56,58]. In addition, recent phylogenomic studies suggest that supergroup F is a derived clade in the evolutionary history of Wolbachia [3,20,123]. The wMhie genome belonging to the supergroup F is closely related to the bedbug symbiont wCle; however, the characteristics of the genome (small size, few transposase elements, few phage genes, absence of prophage region) are more similar to those observed in representatives of supergroup D.
Previous genomics studies of Wolbachia from filarial nematodes have hypothesized mechanisms that could underpin the obligate mutualism [33,43]. Our data indicate that both haem and nucleotide (pyrimidine and purine) metabolism are particularly conserved among all analysed Wolbachia genomes, even the smallest ones; thus, supporting suggestions of potential provisioning of these resources by Wolbachia (Fig. 7). The hypothesis of mutualism based on nutritional provisioning has been revised after the detection of the incomplete riboflavin (vitamin B2) pathway in wOo [43]. Notably, the genomes of supergroup D show an incomplete folate metabolism pathway (vitamin B9) (no folA/B/C/KP for all representatives), which is complete for the small genomes of both supergroup C and J. The riboflavin pathway (vitamin B2) appears incomplete in supergroup C with four genes missing for wOv, two for wOo and one missing for wDimm, while the pathway is almost complete for the two studied supergroup J representatives (only ribA missing). Another interesting vitamin B pathway is the biotin operon (vitamin B7). It was previously suggested that the evolution of this operon is not congruent with proposed Wolbachia evolutionary history [116]. Our data show the operon is present in wNfla (supergroup A), wstri, wLug, wVulC (supergroup B), wCle (supergroup F) and wCfeJ (not belonging to a described supergroup), and that it might have been acquired horizontally as a nutritional requirement. For wCle, wstri and wNfla, biotin supplementation by Wolbachia increases insect host fitness [20,21]. Interestingly, our study shows that incomplete metabolic pathways are not a function of Wolbachia genome size.
Due to their ubiquitous occurrence and diverse biological interactions with their hosts, be they in nematodes or arthropods, Wolbachia endosymbionts represent a striking model for studies of symbiosis. Analysis of their genomes has been used to attempt to understand the nature of the host-symbiont biochemical mechanisms, their evolutionary trajectories and their potential use for biomedical remediations. Our results pinpoint a differential evolutionary course for Wolbachia symbiosis among the various filarial nematode clades, suggesting evolutions from different acquisitions and subject to different selective pressures. The concept of a uniform model of symbiont-filarial host association among the Wolbachia clades ('supergroups') appears not to show consistent patterning. Overall, the pathway analysis presented in the current study suggests that no single metabolic process governs the entire spectrum of Wolbachia-filarial nematode associations. It is highly likely that such provisioning mechanisms might differ according to the particular host-symbiont association, although in cases where the Wolbachia host is itself a parasite (such as filarial nematodes), the potential metabolic interactions with mammalian and arthropod hosts of the filariae are highly complex. In the past, comparative arthropod and nematode Wolbachia evolutionary studies have largely been independently performed, often due to different objectives. However, understanding of the full range of diversity of Wolbachia genomic information will be required to comprehend their comprehensive symbiotic complexity. The analysis of the new Wolbachia genomes from filarial nematodes presented in the current study, as well as recent studies of Wolbachia genomes from arthropods more closely related to symbionts of filarial nematodes, such as the symbiont of fleas (wCfeJ) [92] or pseudoscorpions (wApol) [70], emphasize this viewpoint. Continued further genomic analyses will be instructive to highlight and help unravel these diverse symbiotic mechanisms.

Funding information
This study was supported by internal funding from New England Biolabs, except for the first Illumina library for wDcau, which was produced at the University of Liverpool using funds supplied by the MNHN.

Ethical statement
Most of the samples were collected as described in a previous study [79] and all procedures were conducted in compliance with the rules and regulations of the respective national ethical bodies. Regarding the newly studied material: the Dirofilaria (Dirofilaria) immitis specimen was provided by the NIAID/NIH Filariasis Research Reagent Resource Center (MTA University of Wisconsin-Oshkosh, USA; www. filariasiscenter. org), and the Litomosoides sigmodontis specimen was provided by the MNHN, Paris, France, where the experimental procedures were carried out in strict accordance with the European Union Directive 2010/63/UE and the relevant national legislation (ethical statement no.13845).