Wolbachia populations across organs of individual Culex pipiens: highly conserved intra-individual core pangenome with inter-individual polymorphisms

Abstract Wolbachia is a maternally inherited intracellular bacterium that infects a wide range of arthropods including mosquitoes. The endosymbiont is widely used in biocontrol strategies due to its capacity to modulate arthropod reproduction and limit pathogen transmission. Wolbachia infections in Culex spp. are generally assumed to be monoclonal but the potential presence of genetically distinct Wolbachia subpopulations within and between individual organs has not been investigated using whole genome sequencing. Here we reconstructed Wolbachia genomes from ovary and midgut metagenomes of single naturally infected Culex pipiens mosquitoes from Southern France to investigate patterns of intra- and inter-individual differences across mosquito organs. Our analyses revealed a remarkable degree of intra-individual conservancy among Wolbachia genomes from distinct organs of the same mosquito both at the level of gene presence–absence signal and single-nucleotide polymorphisms (SNPs). Yet, we identified several synonymous and non-synonymous substitutions between individuals, demonstrating the presence of some level of genomic heterogeneity among Wolbachia that infect the same C. pipiens field population. Overall, the absence of genetic heterogeneity within Wolbachia populations in a single individual confirms the presence of a dominant Wolbachia that is maintained under strong purifying forces of evolution.

values varied from 0.2 to 0.74 indicating between 20 to 74% of base variation at one SNV position in comparison to the consensus nucleotide within each metagenome (Table S7).
We then focused our analysis on SNVs occurring within gene calls in order to study the potential influence of genomic variation on protein-coding genes in our MAGs.A SNV located in the 1st or 2nd position of the codon will be more susceptible to result in a non-synonymous mutation (resulting in changes in the amino-acid sequence) whereas a SNV located in the 3rd position will be more likely to result in a synonymous mutation (no changes in the amino-acid sequence).We also removed coverage outliers.We reported 107, 74, 42, 152 and 35 SNVs occurring in gene calls for MAGs O11, M11, O03, O07 and O12, respectively (Figure S8, Table S7).Among those gene-call SNVs, we counted a mean number of 22, 18 and 42 SNVs at the 1 st , the 2 nd and the 3 rd base position, respectively (Table S7).
Focusing our analysis on the variable positions falling within wSCGs, after filtering out coverage outliers and low departure SNVs, we detected a mean of 3.2 SNVs in 1 gene for each MAG (Figure S8; Table S7).Interestingly, while single-copy gene clusters accounted for 73.86% of all gene clusters in the pangenome, only a limited number of SNVs were detected within the corresponding wSCGs (17.3% for MAG M11, 29.27% for O03, 3.11% for O07, 19.97% for O11, 39.17% for O12) (Figure S8).
In Wolbachia MAG M11, we reported 2 SNVs falling in one wSCG (gene caller id 301) that was not annotated to any function using COG20 and KEGG databases.In wSCG 301, 1 SNV falls at the 2nd position in codon and 1 SNV at the 3rd position in codon.On average, these SNVs have a coverage of 189X and a departure from consensus of 0.21 (Table S7, S13).
In Wolbachia MAG O11, we reported a total of 7 filtered SNVs in a wSCG (gene caller id 755) annotated to an « Ankyrin repeat » function by the COG20 database.In wSCG 755, respectively 2, 3 and 2 SNVs fall at the 1 st , 2 nd and 3 rd position in codon, respectively.On average, these SNVs have a coverage of 486X and a departure from consensus of 0.39 (Table S7, S13).
In Wolbachia MAG O03, we reported 2 filtered SNVs in 1 wSCG (gene caller ids 589) annotated to an « ATP-dependent 26S proteasome regulatory subunit » (COG20), falling in 1 st and 2 nd positions in codon.On average, these SNVs have a coverage of 246X and a departure from consensus of 0.22.(Table S7, S13).
In Wolbachia MAG O07, we reported 1 filtered SNV in one wSCG (gene caller id 230), with no assigned function.This unique SNV falls at the 2 nd position in codon, has a coverage of 146X and a departure from consensus of 0.22 (Table S7, S13).
In Wolbachia MAG O12, we reported a total of 4 filtered SNVs in 2 wSCGs (gene caller ids 423 and 640).In wSCG 423, with no annotated function, 1 SNV falls at the 1 st position in codon and 1 SNV at the 3 rd position in codon.In wSCG 640, annotated to an « ATP-dependent 26S proteasome regulatory subunit » (COG20), 1 SNV falls at the 1 st position in codon and 1 SNV at the 2 nd position in codon.On average, these SNVs have a coverage of 194X and a departure from consensus of 0.25 (Table S7, S13).

Visual inspection of intra-individual SNVs
We visualized the retained SNVs in the context of the gene where they appear (Figure S9-14).
A close examination of SNVs in gene 640 (annotated to an "ATP-dependent 26S proteasome regulatory subunit" by COG20 database, Figure S9) from split 99075 of Wolbachia MAG O12 showed a concomitant increase of SNV departure from consensus and coverage values for gene 640 (Figure S9A-C) despite having a single copy gene signature.A significant correlation was confirmed between departure from consensus and coverage values (r = 0,66; p = 2.10 -4 ; Figure S9D).In addition, blasting a selected sequence from wSCG 640 in MAG O12 (contig 99075, nucleotides 16,898 to 17,082, left-hand side in Figure S9A-C) against the full MAG O12 with a discontiguous megablast (to detect less similar sequences) revealed a hit with a perfect match together with one additional hit with lower identity (73.72% over 74% of the query).Similar visualizations were produced for all 6 genes considered (Figure S9-14).In all cases, the detection of SNVs was linked to a variation in coverage, which produced a « variable » region, though most of the raw SNVs were filtered out either due to a departure from consensus below the fixed threshold or to being coverage outliers.These data highlight non-specific read recruitment that are likely due to i) hidden repeated domains within the whole MAG ii) genes not reconstructed in fragmented genomes.COG20_FUNCTION, COG20_CATEGORY_ACC, COG20_CATEGORY), KOfam assignation information (KOfam_ACC, KOfam, KEGG_Module_ACC) and corresponding amino acid sequence (aa_sequence).the unique identifier of the SCV (entry_id), unique position identifier of the SCV (unique_pos_identifier), name of the split where SCV is occurring (split_name), sample where the SCV has been identified (sample_id), unique gene caller identifier where the SCV is occurring (corresponding_gene_call with "-1" if the position is not in a gene call), order of the codon in the gene call (codon_order_in_gene with "-1" if the position is not in a gene call, codon_number), gene length (gene_length), number of recruited reads mapping to this codon (coverage), number of mapped reads covering the SCV corresponding to the different codons (64 codon combinations), reference codon in the reference genome (reference), most mapped codon (consensus), the two most represented codons (competing_codons), ratio of codon in a given SCV that diverge from the reference codon (departure_from_reference), ratio of codon in a given SCV that diverge from the consensus codon (departure_from_consensus), ratio of the second most frequent codon to the consensus codon (n2n1ratio), the entropy value (entropy)

Supplementary
and the synonymity values (pN, pS, nN, nS for consensus, reference and popular consensus).
All these SCV information are also explained at the following link (represented by a green bar).This gene was annotated as « tRNA A37 threonylcarbamoyltransferase TsaD » by COG20.

Table 5 -
Gene cluster statistics.Gene clusters identified during the pangenomic analysis performed on all reconstructed Wolbachia MAGs (MAGs M11, O11, O03, O07 and O12) and the three selected Wolbachia reference genomes wPipPel, wPipMol, wPipJHB.The table reports the same information as in Supplementary Table4.

Table 6 -Single-copy Core Genes from Wolbachia. wSCGs
identified among the unique GCs computed during the pangenomic analysis performed on all reconstructed Wolbachia MAGs and selected Wolbachia reference genomes.Tablereportsfor each unique GC: gene cluster id (gene_cluster_id), wSCG info (1=wSCG), prophage WO departure_from_consensus), ratio of the second most frequent nucleotide to the consensus nucleotide (n2n1ratio) and the entropy value (entropy).All these SNV information are also explained at the following link https://merenlab.org/2015/07/20/analyzingvariability/.Moreover, the table reports information related to gene clusters where the SNV falls in as in TableS6(including the gene cluster id and the wSCG status of a gene) but also the Supplementary Table7-Raw SNV tables.Tablereportsfor each SNV position: unique identifier of the SNV (entry_id), unique position identifier of the SNV (unique_pos_identifier), nucleotide position of the SNV in the split and the contig (pos, pos_in_contig), name of the split where SNV is occurring (split_name), sample where the SNV has been identified (sample_id), unique gene caller identifier where the SNV is occurring (corresponding_gene_call with "-1" if the position is not in a gene call), coding or non-coding (consensus), the two most represented nucleotides at the position (competing_nts), ratio of nucleotides in a given position that diverge from the reference nucleotide (departure_from_reference), ratio of nucleotides in a given position that diverge from the consensus nucleotide (Supplementary Table 8 -Filtered SNP table (at the inter-sample level).Information is reported as in TableS7for the SNVs identified as Single Nucleotide Polymorphisms (SNP), i.e. within wSCGs, with departure from reference over 0.98, and not identified as coverage outliers.Supplementary Table 9 -Summary of SNP positions in the fiveWolbachia MAGs, grouped by gene cluster.Gene clusters were identified as in TableS5, grouping one gene from each MAG.The variable position is thus identified by the codon number in the gene and base position in codon.Functional information provided by COG20.Supplementary Table 10 -Filtered

SCV tables at the inter-sample level for each metagenome, corresponding to the SNPs reported in Table S9.
Table reports for each SCV:

Supplementary Table 11 -Filtered SAAV tables at the inter-sample level for each metagenome, corresponding to the SNPs reported in Table S9.
https://merenlab.org/2015/07/20/analyzing-variability/.Moreover, the table reports information related to gene clusters where the SCV occurs as in TableS6(including the gene cluster id and the wSCG status of a gene) but also the prophage WO regions assignment Table reports for each SAAV: the unique identifier of the SAAV (entry_id), unique position identifier of the SAAV (unique_pos_identifier), name of the split where SAAV is occurring (split_name), sample where the SAAV has been identified (sample_id), unique gene caller identifier where the SAAV All these SAAV information are also explained at the following link https://merenlab.org/2015/07/20/analyzing-variability/.Moreover, the table reports information related to gene clusters where the SAAV occur as in TableS6(including the gene cluster id and the wSCG status of a gene) but also the prophage WO regions assignment Black layers represent the gene's coverage in each metagenome.A SNP is identified in metagenomes O03 and O11, in 3 rd position in codon 291 [6]occurring (corresponding_gene_call with "-1" if the position is not in a gene call), order of the codon in the gene call (codon_order_in_gene with "-1" if the position is not in a gene call, codon_number), gene length (gene_length), number of recruited reads mapping to this amino acid (coverage), number of mapped reads covering the SAAV corresponding to the different amino acids (Ala, Arg, Asn, Asp, Cys, Gln, Glu, Gly, His, Ile, Leu, Lys, Met, Phe, Pro, STP, Ser, Thr, Trp, Tyr, Val), reference amino acid in the reference genome (reference), most mapped amino acid at the SAAV (consensus), the two most represented amino acids (competing_aas), ratio of amino acids in a given SAAV that diverge from the reference amino acids (departure_from_reference), ratio of amino acids in a given SAAV that diverge from the (cid_genes_from_Bonneau_et_al._2018[6],cid_Bonneau_pct_alignment), MLST and wsp contig 68,699 split 2 in MAG O12.