Movements of transposable elements contribute to the genomic plasticity and species diversification in an asexually reproducing nematode pest

Abstract Despite reproducing without sexual recombination, Meloidogyne incognita is an adaptive and versatile phytoparasitic nematode. This species displays a global distribution, can parasitize a large range of plants, and can overcome plant resistance in a few generations. The mechanisms underlying this adaptability remain poorly known. At the whole‐genome level, only a few single nucleotide variations have been observed across different geographical isolates with distinct ranges of compatible hosts. Exploring other factors possibly involved in genomic plasticity is thus important. Transposable elements (TEs), by their repetitive nature and mobility, can passively and actively impact the genome dynamics. This is particularly expected in polyploid hybrid genomes such as the one of M. incognita. Here, we have annotated the TE content of M. incognita, analyzed the statistical properties of this TE landscape, and used whole‐genome pool‐seq data to estimate the mobility of these TEs across twelve geographical isolates, presenting variations in ranges of compatible host plants. DNA transposons are more abundant than retrotransposons, and the high similarity of TE copies to their consensus sequences suggests they have been at least recently active. We have identified loci in the genome where the frequencies of presence of a TE showed substantial variations across the different isolates. Overall, variations in TE frequencies across isolates followed their phylogenetic divergence, suggesting TEs participate in the species diversification. Compared with the M. incognita reference genome, we detected isolate and lineage‐specific de novo insertion of some TEs, including within genic regions or in the upstream regulatory regions. We validated by PCR the insertion of some of these TEs inside genic regions, confirming TE movements have possible functional impacts. Overall, we show DNA transposons can drive genomic plasticity in M. incognita and their role in genome evolution of other parthenogenetic animal deserves further investigation.


| INTRODUC TI ON
Agricultural pests cause substantial yield loss to the worldwide lifesustaining production (Savary et al., 2019) and threaten the survival of different communities in developing countries. With a constantly growing human population, it becomes more and more crucial to reduce the loss caused by these pests while limiting the impact on the environment. In this context, understanding how pests evolve and adapt both to the control methods deployed against them and to a changing environment is essential. Among Metazoa, nematodes and insects are the most destructive agricultural pests. Nematodes alone are responsible for crop yield losses of ca. 11%, representing up to 100 billion € economic loss annually (Agrios, 2005;McCarter, 2009). The most problematic nematodes to worldwide agriculture belong to the genus Meloidogyne (Jones et al., 2013) and are commonly named root-knot nematodes (RKN) owing to the gall symptoms their infection leaves on the roots. The RKN species showing the wider geographical distribution and infecting the broadest diversity of plants reproduce asexually via mitotic parthenogenesis (Castagnone-Sereno & Danchin, 2014;Trudgill & Blok, 2001). This observation seems counterintuitive as animal species with strictly asexual reproduction are deemed less adaptive than their sexual relatives, are quite rare, and occupy shallow branches in the animal tree of life (Rice, 2002). In the absence of sexual reproduction, the combination of beneficial alleles from different individuals is impossible and the efficiency of selection is reduced due to linkage between conflicting alleles (Glémin et al., 2019;Hill & Robertson, 1966;Kondrashov, 1988;Muller, 1964). Consistent with the theory, population genomic analyses have revealed the efficacy of purifying selection is reduced in M. incognita as compared to two outcrossing species in the Caenorhabditis genus (Koutsovoulos, Marques et al., 2020).
Previous comparative genomic studies have shown the genomes of the most devastating RKN are polyploid because of hybridization events (Blanc-Mathieu et al., 2017;Szitenberg et al., 2017). In the parthenogenetic RKN M. incognita, the gene copies resulting from allopolyploidy diverge not only at the nucleotide level but also in their expression patterns, suggesting this peculiar genome structure could support a diversity of functions and might be involved in their higher parasitic success despite the absence of sexual reproduction (Blanc-Mathieu et al., 2017). This hypothesis seems consistent with the "general-purpose genotype" concept, which proposes successful parthenogens have a generalist genotype with good fitness in a variety of environments (Vrijenhoek & Parker, 2009). An alternative non-mutually exclusive hypothesis is the "frozen niche variation" concept, which proposes parthenogens are more successful in stable environments because they have a frozen genotype adapted to this specific environment (Vrijenhoek & Parker, 2009). Interestingly, the frequency of parthenogenetic invertebrates is higher in agricultural pests, probably because the anthropized environments in which they live are more stable and uniform (Hoffmann et al., 2008).
However, although a general-purpose genotype brought by hybridization might contribute to the wide host range and geographical distribution of these RKNs, this alone cannot explain how these parthenogenetic species evolve and adapt to new hosts or environments. For instance, initially avirulent populations of some of these RKN, controlled by a resistance gene in a tomato, are able to overcome the plant resistance in a few generations, leading to virulent subpopulations, in controlled laboratory experiments (Castagnone-Sereno, 2006;Castagnone-Sereno et al., 1994). Emergence of virulent populations not controlled anymore by resistance genes has also been reported in the field (Barbary et al., 2015).  (Koutsovoulos, Marques et al., 2020). Furthermore, the few identified SNV showed no significant correlation with either the geographical location, the host range, or the currently infected crop species. However, these SNV could be used as markers to confirm the absence of sexual meiotic recombination in M. incognita. Thus, the low nucleotide variability that was observed between isolates is probably not the main driver of the genomic plasticity underlying the adaptability and diversification of

M. incognita.
Consistent with these views, convergent gene copy-number variations were observed following resistance breaking down by two originally avirulent populations of M. incognita from distinct geographical origins (Castagnone-Sereno et al., 2019). The mechanisms supporting these gene copy numbers and other genomic variations possibly involved in the adaptive evolution of M. incognita remain to be described.
Transposable elements (TEs), by their repetitive and mobile nature, can both passively and actively impact genome plasticity. Being repetitive, they can be involved in genomic rearrangements leading to loss of genomic portions or expansion of gene copy numbers.
Being mobile, they can insert in coding or regulatory regions and have a functional impact on the gene expression or gene structure/ function itself. For instance, TE insertions have been shown to affect gene expression in a species-specific manner in amniotes (Zeng et al., 2018) and, in rodents, TE insertions account for ca. 20% of gene expression profile divergence between mice and rats (Pereira et al., 2009). At shorter evolutionary scales, differential presence/absence of TE across Arabidopsis populations revealed rare variants associated with extremes of gene expression (Stuart et al., 2016). TE insertions in coding regions can disrupt a gene, and this disruption might K E Y W O R D S bioinformatics/phyloinformatics, contemporary evolution, evolution of sex, genomics/ proteomics, host-parasite interactions, hybridization, transposons eventually have an adaptive effect. For example, a TE insertion has caused disruption of a Phytochrome A gene in some soybean strains, which caused photoperiod insensitivity and was in turn associated with adaptation to high latitudes in Japan (Kanazawa et al., 2009).
Moreover, in Drosophila, insertion of a TE in the CHKov1 gene caused four new alternative transcripts and this modification is associated with resistance to insecticide and viral infection (Aminetzach et al., 2005;Magwire et al., 2011). In parallel, although TE movements can provide beneficial genomic novelty or plasticity, their uncontrolled activity can also be highly detrimental and put the organism at risk.
For instance, some human diseases such as hemophilia (Kazazian et al., 1988) or cancers (Miki et al., 1992) are caused by TE insertions in coding or regulatory regions.
Concerning agricultural pests themselves, TEs are a major player of adaptive genome evolution by both passively and actively impacting the genome structure and sequence in some fungal phytopathogens (Faino et al., 2016). Whether TEs also play an important role in the genome plasticity and possibly adaptive evolution of parasitic animals, engaged in a continuous arms race with their hosts, remains poorly known. According to the Red Queen hypothesis, host-parasite arms race is a major justification for the prevalence of otherwise costly sexual reproduction (Lively, 2010) and, in the absence of sex, other mechanisms should provide the necessary plasticity to sustain this arms race.
From an evolutionary point of view, the parthenogenetic rootknot nematode M. incognita represents an interesting model to study the activity of TEs and their impact on the genome, including in coding or regulatory regions. Indeed, being a plant parasite, M. incognita is engaged in an arms race with the plant defense systems and point mutations alone are not expected to be a major mechanism supporting adaptation in this species (Koutsovoulos, Marques et al., 2020).
In a broader perspective, little is known yet about the TE dynamics in nematode genomes and their possible impact on adaptive evolution, including in the model Caenorhabditis elegans, despite being the first sequenced animal genome (The C. elegans Genome Sequencing Consortium, 1998). Transposition activity of Tc1 TIR element was shown to be positively linked to the overall mutation rate in C. elegans mutator strains, one of which is characterized by high transposition in the germline, hence constituting a considerable evolutionary force (Bégin & Schoen, 2007). However, these results may be hindered by the fact that, in wild-type C. elegans, although Tc1 excision frequency is substantial in somatic cells, it is negligible in the germ cells (Emmons & Yesner, 1984).
Besides Tc1, a more comprehensive analysis using population genomic approach in C. elegans represents the most advanced study of the TE dynamics in this species to date (Laricchia et al., 2017). By analyzing hundreds of wild populations of C. elegans, the authors have shown a substantial level of activity for multiple TE families in these genomes compared with the N2 reference strain. The study points at a population-wide variability of this activity, and, surprisingly, toward little evident phenotypic effect of this activity, even when TEs were found inserted into coding sequences. Concerning the possible functional impact of TE activity in nematodes, an investigation of TE expression in C. elegans germline in a single-cell framework has shown significant differences between the expression pattern of LTR, non-LTR retroelements, and DNA transposons, associated with differentiated vs. undifferentiated cell types (Ansaloni et al., 2019).
These complex cell-type-specific differential expression patterns suggest TE activity plays an important role in the C. elegans embryonic development, although the exact role remains elusive. Overall, while it is now clearly established that TEs are active in C. elegans and probably contribute to the genome plasticity, their possible functional implication or role in nematode adaptive evolution has not been shown so far.
In this study, we have tested whether movements of TEs could represent a mechanism supporting genome plasticity in M. incognita, a prerequisite for adaptive evolution. We have reannotated the 183.5-Mb triploid genome of M. incognita (Blanc-Mathieu et al., 2017) for TEs, and using stringent filters, we have only retained those harboring the characteristic features of known retro and DNA transposon orders, hence more likely to be active. We analyzed the statistical properties of the TE content, and the distribution of TE sequence identity levels to their consensuses was used as a reporter of the recentness of their activity. We have then tested whether the frequencies of presence/absence of these TEs across the genome varied between different isolates. To test for variations in frequencies, we have used population genomics data from eleven M. incognita isolates collected on different crops and locations and showing distinct ranges of compatible hosts (Koutsovoulos, Marques et al., 2020). From the set of TE loci that presented the most contrasted patterns of presence/absence across the isolates, we investigated whether some could represent isolate or lineage-specific insertions.
To estimate the possible functional impact of TE insertions, we checked whether some were inserted within coding or possible regulatory regions. Finally, we validated by PCR assays several of these insertions in coding or regulatory regions, predicted by population genomics data. Overall, our study represents the first estimation of TE activity as a mechanism possibly involved in the genome plasticity and the associated functional impact in the most devastating nematode to worldwide agriculture. Besides C. elegans, little was known about the role of TE in the genome dynamics of Nematoda, one of the most species-rich animal phylum. Because this study focuses on an allopolyploid and parthenogenetic animal species, it also opens new evolutionary perspectives on the fate and potential adaptive impact of TEs in these singular organisms.

| The genome of Meloidogyne incognita
We used the genome assembly published in (Blanc-Mathieu et al., 2017) as a reference for TE prediction and annotation (ENA assembly accession GCA_900182535, bioproject PRJEB8714) and for read-mapping of the different geographical isolates (Koutsovoulos, Marques et al., 2020), used for prediction of TE presence frequencies.
Briefly, the triploid M. incognita genome is 183.5 Mb long with ~12,000 scaffolds and a N50 length of ~38 kb. Although the genome is triploid, because of the high nucleotide divergence between the genome copies (8% on average), most of these genome copies have been correctly separated during genome assembly, which can be considered effectively haploid (Blanc-Mathieu et al., 2017;Koutsovoulos, Marques et al., 2020). This reference genome originally came from a M. incognita population from the Morelos region of Mexico and was reared on tomato plants from the offspring of one single female in our laboratory.

| The genome of Caenorhabditis elegans
We used the C. elegans genome (The C. elegans Genome Sequencing Consortium, 1998) assembly (PRJNA13758) to perform its repeatome prediction and annotation and compare our results with the literature as a methodological validation.

| Genome reads for 12 Meloidogyne incognita geographical isolates
To predict the presence frequencies at TE loci across different M. incognita isolates, we used whole-genome sequencing data from pools of individuals from 12 different geographical regions ( Figure S1; Table S1). One pool corresponds to the Morelos isolates used to produce the M. incognita reference genome itself, as described above. The 11 other pools correspond to different geographical isolates across Brazil as described in Koutsovoulos, Marques et al. (2020).
All the samples were reared from the offspring of one single female and multiplied on tomato plants. Then, approximately 1 million individuals were pooled and sequenced by Illumina paired-end reads (2 × 150 bp). Library sizes vary between 74 and 76 million reads (Koutsovoulos, Marques et al., 2020).
We used cutadapt-1.15 (Martin, 2011) to trim adapters, discard small reads, and trim low-quality bases in read boundaries (-max-n = 5 -q 20,20 -m 51 -j 32 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT). Then, for each library, we performed a fastqc v-0.11.8 (Andrews, 2010) analysis to evaluate the quality of the reads. FastQC result analyses showed that no additional filtering or cleaning step was needed and no further read was discarded.

| Methods
We performed the statistical analyses and the graphical representations using R' v-3.6.3 and the following libraries: ggplot2, cowplot, reshape2, ggpubr, phangorn, tidyverse, and ComplexUpset. All codes and analysis workflows are publicly available in the INRAE Dataverse (Kozlowski, 2020a(Kozlowski, , 2020d. For experimental validations, see . A diagram recapitulating the main steps of the analysis has been provided in supplementary Figure S2, as well as a decision tree summarizing the polymorphism characterization ( Figure S3).

| Meloidogyne incognita and Caenorhabditis elegans repeatome predictions and annotations
We predicted and annotated the M. incognita and C. elegans repeatomes following the same protocol as thoroughly explained in Koutsovoulos, Poullet et al. (2020). We define the repeatome as all the repeated sequences in the genome, excluding simple sequence repeats (SSR) and microsatellites. Then, following the above-mentioned protocol, we further analyzed each repeatome to retain only annotations with canonical signatures of transposable elements (TEs).
Below, we briefly explain each step and describe protocol adjustments.

| Genome preprocessing
Unknown nucleotides "Ns" encompass 1.81% of the M. incognita reference genome and need to be trimmed before repeatome predictions. We created a modified version of the genome by splitting it at N stretches of length 11 or more and then trimming all N, using dbchunk.py from the REPET package (Flutre et al., 2011;Quesneville et al., 2005). As this increases genome fragmentation and may, in turn, lead to false positives in TE detection, we only kept chunks of length above the L90 chunk length threshold, which is 4891 bp. This modified version of the genome was only used to perform the de novo prediction of the TE consensus library (below). The TE annotation was performed on the original whole reference genome.
The C. elegans reference genome was entirely resolved (no N), at the chromosome scale. Hence, we used the whole assembly as is to perform the de novo prediction analysis.

| De novo prediction: constituting draft TE consensus libraries
For each species, we used the TEdenovo pipeline from the REPET package to generate a draft TE consensus library.
Briefly, TEdenovo pipeline (i) realizes a self-alignment of the input genome to detect repetitions, (ii) clusters the repetitions, (iii) performs multiple alignments from the clustered repetitions to create consensus sequences, and (iv) eventually classifies the consensus sequence following Wicker's classification (Wicker et al., 2007) using structural and homology-based information. One of the most critical steps of this process concerns the clustering of the repetitions as it requires prior knowledge about assembly ploidy and phasing quality.
We ran the analysis considering the modified M. incognita reference assembly previously described as triploid and set the "min-NbSeqPerGroup" parameter to 7 (i.e., 2n+1). As the C. elegans assembly was haploid, we set the same parameter to 3.
All the remaining parameter values set in these analyses can be found in the TEdenovo configuration files (Kozlowski, 2020a).

| Automated curation of the TE consensus libraries
To limit the redundancy in the previously created TE consensus libraries and the false positives, we performed an automated curation step. Briefly, for each species, (i) we performed a minimal annotation (steps 1, 2, 3, 7 of TEannot) of their genome with their respective draft TE consensus libraries, and (ii) only retained consensus sequences with at least one full-length copy (FLC) annotated in the genome. All parameter values are described in the configuration files available in Kozlowski (2020a).

| Repeatome annotation
For each species, we performed a full annotation (steps 1, 2, 3, 4, 5, 7, and 8) of their genome with their respective cleaned TE consensus libraries using TEannot from the REPET package. The obtained repeatome annotations (excluding SSR and microsatellites) were exported for further analyses. All parameter values are described in the configuration files available in Kozlowski (2020a).
2.2.6 | Repeatome postprocessing: identifying annotations with canonical signatures of TEs Using in-house scripts (Kozlowski, 2020a), we analyzed REPET outputs to retain annotations with signatures of canonical transposable elements (TEs) from the rest of the repeatomes. The same parameters were set for M. incognita and C. elegans. Briefly, for each species, we only conserved TE annotations (i) classified as retrotransposons or DNA transposons, (ii) longer than 250 bp, (iii) sharing more than 85% identity with their consensus sequence (below this value, there is uncertainty on the correspondence between a TE annotation and its consensus if closely related consensuses sequences exist), (iv) covering more than 33% of their consensus sequence length Eventually, using in-house script , we selected the best nonoverlapping HMM profiles for each protein and then tagged corresponding genes with TE-related HMM profiles thanks to a knowledge-based function from the REPET tool "profileDB4Repet.py." We kept as genes with TE-related profiles all the genes with at least one TE-related HMM profile identified. The cleaned RNA-seq reads were retrieved from the previous analysis and re-mapped to the M. incognita annotated genome assembly (Blanc-Mathieu et al., 2017) using a more recent version of STAR (2.6.1) (Dobin et al., 2013) and the more stringent end-to-end option (i.e., no soft clipping) in 2-passes. Expected read counts were calculated on the predicted genes from the M. incognita GFF annotation as FPKM values using RSEM (Li & Dewey, 2011)  We then cross-referenced the obtained file with the list of the substantially expressed genes and the list of the TE-related genes previously produced to identify the TEs containing potential transposition machinery genes and their expression levels.

| Evaluation of TE presence frequencies across the different Meloidogyne incognita isolates
We used the PoPoolationTE2 v-1.10.04 pipeline (Kofler et al., 2016) to compute isolate-related support frequencies of both annotated and de novo TE loci across the 12 M. incognita geographical isolates previously described. To that end, we performed a "joint" analysis as recommended by the PoPoolationTE2 manual. Briefly, PoPoolationTE2 uses both quantitative and qualitative information extracted from paired-end (PE) reads mapping on the TE-annotated reference genome and a set of reference TE sequences to detect signatures of TE polymorphisms and estimate their frequencies in every analyzed isolate. Frequency values correspond to the proportion of reads in an isolate supporting the presence of a copy of the TE at a given locus.

| Preparatory work: creating the TE hierarchy and the TE-merged reference files
We used the canonical TE annotation set created above (Kozlowski, 2020a) and the M. incognita reference genome to produce the TEmerged reference file and the TE-hierarchy file necessary to perform the PoPoolationTE analysis (Kozlowski, 2020d).
We used getfasta and maskfasta commands (default parameters) from the bedtools suite to respectively extract and mask the sequences corresponding to canonical TE annotations in the reference genome. Then, we concatenated both resulting sequences in a "TE-merged reference" multi fasta file. The "TE-hierarchy" file was created from the TE annotation file from which it retrieves and stores the TE sequence name, the family, and the TE order for every entry.

| Reads mapping
For each M. incognita isolate library, we mapped forward and reverse reads separately on the "TE-merged-references" genome-TE file using the local alignment algorithm bwa bwasw v-0.7.17-r1188  with the default parameters. The obtained sam alignment files were then converted to bam files using samtools view v-1.2 ).

| Restoring paired-end information and generating the ppileup file
We restored paired-end information from the previous separate mapping using the sep2pe (--sort) tool from PoPoolationTE2-v1.10.03.
Then, we created the ppileup file using the "ppileup" tool from PoPoolationTE2 with a map quality threshold of 15 (--map-qual 15).
For every base of the genome, this file summarizes the number of PE read inserts spanning the position (physical coverage) but also the structural status inferred from paired-end read covering this site.

| Estimating target coverage and subsampling the ppileup to a uniform coverage
As noticed by R. Kofler, heterogeneity in physical coverage between populations may lead to discrepancies in TE frequency estimation.
Hence, we flattened the physical coverage across the M. incognita isolates by a subsampling and a rescaling approach.
We first estimated the optimal target coverage to balance information loss and homogeneity using the "stats-coverage" tool from PoPoolationTE2 (default parameter) and set this value to 15X. We then used the "subsamplePpileup" tool (--target-coverage 15) to discard positions with a physical coverage below 15X and rescale the coverage of the remaining position to that value.
Then, for each identified site, we estimated TE presence frequencies in each isolate using the "frequency" tool (default parameters).
Eventually, we paired up the signatures of TE polymorphisms using "pairupSignatures" tool (--min-distance −200; --max-distance --300 as recommended by R. Kofler), yielding a final list of potential TE polymorphism loci in the reference genome with their associated frequencies for each one of the isolates.

| Evaluation of PoPoolationTE2 systematic error rate in the TE frequency estimation
To estimate PoPoolationTE2 systematic error rate in the TE frequency estimation, we ran the same analysis (from the PE information restoration step) but comparing each isolate against itself (12 distinct analyses).
We then analyzed each output individually, measuring the frequency difference between the two "artificial replicates" in all the detected loci with FR signatures (see below for more explanations).
We tested the homogeneity of the frequency difference across the 12 analyses with an ANOVA and concluded that the mean values of the frequency differences between the analysis were not significantly heterogeneous (p value = 0.102 > 0.05). Hence, we concatenated the 12 analyses' frequency difference and set the systematic error rate in the TE frequency estimation to 2 times the standard deviation of the frequency differences, a value of 0.97%.

| Isolating TE loci with frequency variation across Meloidogyne incognita isolates
We parsed PoPoolationTE2 analysis output to identify TE loci with enough evidence to characterize them as polymorphic in frequency across the isolates.
PoPoolationTE2 output informs for each detected locus (i) its position on the reference genome, (ii) its frequency value for every sample of the analysis (e.g., each isolate), and (iii) qualitative information about the reads mapping signatures supporting a TE insertion.
In opposition to separate forward ("F") or reverse ("R") signatures, "FR" signatures mean both boundaries of locus are supported by significant physical coverage. Entries with such type of signature are more accurate in terms of frequency and position estimation. Hence, we only retained candidate loci with "FR" signatures. Then, for each locus, we computed the maximal frequency variation between all the isolates and discarded the loci with a frequency difference smaller than the PoPoolationTE2 systematic error rate in the TE-frequency estimation we computed (0.97%; see above). We also discarded loci where different TEs were predicted to be inserted. We considered the remaining loci as polymorphic in frequency across the isolates.

| Isolates phylogeny
We inferred the M. incognita isolates phylogeny according to their patterns of polymorphism in TE frequencies.
We first computed a Euclidean distance matrix from the isolate TE frequencies of all the detected polymorphic loci. We then used the distance matrix to construct the phylogenetic tree using the neighbor-joining (NJ) method (R' phangorn package v-2.5.5). We computed nodes' support values with a bootstrap approach (n = 500 replicates) using the boot.phylo function from the ape-v5.4 R package (Paradis & Schliep, 2019). The boot.phylo function performs a resampling of the frequency matrix (here the matrix with loci in columns, isolates in rows, and values corresponding to the TE presence frequencies).
We also created a phylogenetic tree using the SNV identified within coding regions for all isolates with raxml-ng v-0.9.0 (Kozlov et al., 2019) utilizing the model GTR+G+ASC_LEWIS and performing 100 bootstrap replicates. We compared both topologies using Itol v-4.0 viewer (Letunic & Bork, 2019).

| Polymorphism characterization
We exported the polymorphic TE positions as an annotation file, and we used bedtools intersect (-wao) to perform their intersection with the reference canonical TE annotation. We then crossreferenced the results with the filtered PoPoolationTE2 output and defined a decision tree to characterize the TE polymorphisms detected by PoPoolationTE2 as "reference TE polymorphism" (refpolymorphism), "unannotated," or "new" loci, as compared to the reference genome ( Figure S3).
We considered a reference TE annotation as polymorphic (e.g.,

ref-polymorphism locus) if:
(i) The position of the polymorphism predicted by PoPoolationTE2 falls between the boundaries of the reference TE annotation.
(ii) Both the reference TE annotation and the predicted polymorphism belong to the same TE consensus sequence.
(iii) The TE has a predicted frequency >75% in the reference isolate Morelos.
Canonical TE annotations that did not intersect with polymorphic loci predicted by PoPoolationTE2, or that presented frequency variations <1% across the isolates were considered as nonpolymorphic.
We classified as "new TE loci" all the polymorphic loci for which no canonical TE was predicted by REPET in the reference annotation (polymorphism position is not included in a reference TE annotation), but which were detected with a frequency >25% in at least one isolate different from the reference isolate Morelos, in which the TE frequency should be inferior to 1% and thus considered truly absent in the reference genome.
Finally, we classified as "unannotated TE loci" all the polymorphic loci, which did not correspond to a reference annotation but which were detected with a frequency >25% in the reference isolate Morelos (at least). Polymorphic loci having a frequency between 1% and 25% in Morelos isolate were considered ambiguous and were discarded.
Then, for each TE polymorphism, we investigated the homogeneity of the TE frequency between the isolates. We considered TE frequency was homogeneous between isolates when the maximum frequency variation between isolate was <= to 25%. Above this value, we considered the TE presence frequency was heterogeneous between isolates.

| Highly contrasted polymorphic TE loci (HCPTEs):
isolation, characterization, and experimental validation 2.5.1 | HCPTE isolation We considered as highly contrasted all the polymorphic loci for which (i) all the isolates had frequency values either <25% or >75%, and (ii) at least one isolate showed a frequency <25%, while another presented a frequency >75%. Polymorphic loci fitting with these requirements were exported as an annotation file in the bed format.

| HCPTE possible functional impact
We first identified the genes potentially impacted by the HCPTEs by cross-referencing the HCPTE annotation file with the gene annotation file, using the bedtools suite. We used the "closest" program (-D b -fu -io; b being the gene annotation file) to identify the closest (but not intersecting) gene downstream each HCPTE. We only retained the entries with a maximum distance of 1 kb between the HCPTE and gene boundaries. We identified the insertions in the gene using the "intersect" tool (-wo).

| Experimental validation of HCPTE loci
To experimentally validate in silico predictions of TE neo-insertions with potential functional impact, we selected five candidates among the HCPTE loci and performed a PCR experiment. To run this experiment, we used DNA remaining from extractions performed on the M. incognita isolates for a previous population genomics analysis (Koutsovoulos, Marques et al., 2020). We selected loci to be validated based on the following criteria: • The predicted insertion must be in a genic or potential regulatory region (max 1 kb upstream of a gene) as the most evident criterion for a potential functional impact.
• The element must be short enough (2.5 kb max) to be amplified by PCR and SANGER-sequenced using standard techniques and material.
• To validate the predicted impacted gene actually exists, it must be supported by substantial expression data in the reference isolate Morelos.
• To maximize the chances the genes have effects on biological traits characteristic of the root-knot nematodes, the impacted gene must be Meloidogyne-specific.
Once all these criteria were applied, we maximized the diversity of TE orders involved and this resulted in the 5 loci presented in Results section.

Primer design and PCR amplification
We designed primers for the PCR analysis using the Primer3Plus web interface (Untergasser et al., 2007). The set of 10 primers with the corresponding sequences and expected amplicon sizes with, or without TE insertion, is shown in Table S2  Corp.). PCR conditions were as follows: initial denaturation at 95°C for 5 min, followed by 35 cycles of 95°C for 30 s, 56°C for 30 s of annealing, and 72°C for 3 min of extension, and the program ends with a final extension at 72°C for 10 min. Aliquots of 5 µl were migrated by electrophoresis on a 1% agarose gel (Sigma Chemical Co.) for 70 min at 100 V. The size marker used is 1 kb Plus DNA Ladder (New England Biolabs Inc.), containing the following size fragments in bp: 100,200,300,400,500,600,700,900,1000,1200,1500,2000, 3000, 4000, 5000, 6000, 8000, and 10,000.

Purification and sequencing of PCR amplicons
Amplicon bands were revealed using ethidium bromide and exposure to ultraviolet radiation. PCR product bands were excised from the agarose gel with a scalpel and purified using MinElute Gel Extraction Kit (Qiagen) before sequencing, following the manufacturer's protocol. PCR products were sequenced by Sanger sequencing (Eurofins Genomics).
Forward (F) and reverse (R) sequences were blasted individually (https://blast.ncbi.nlm.nih.gov/; optimized for "somewhat similar sequences," default parameters) to the expected TE consensus sequence and to the genomic region surrounding the predicted insertion point (2 kb region: 1 kb upstream the predicted insertion point and 1 kb downstream). When no significant hit was found, the sequence was blasted against the Meloidogyne reference genomes available at (https://meloi dogyne.inrae.fr/), the whole TE consensus library, and the NR database on the NCBI blast website.

| The Meloidogyne incognita TE landscape is diverse but mostly composed of DNA transposons
We used the REPET pipeline (Flutre et al., 2011;Quesneville et al., 2005) to predict and annotate the M. incognita repeatome (see Methods). Here, we define the repeatome as all the repeated sequences in the genome, excluding simple sequence repeats (SSR or microsatellites). The repeatome spans 26.38% of the M. incognita genome length (Table S3). As we wanted to assess whether TE movements contributed to genomic plasticity, we applied a series of stringent filters on the whole repeatome to retain only repetitive elements harboring the characteristic signatures of known retro and DNA transposon orders, hereafter denominated "canonical" TEs (see Methods and Kozlowski, 2020a). We identified 480 different TE consensus sequences that allowed annotation of 9633 canonical TEs, spanning 4.67% of the genome (Table S4). Both retro (Class I) and DNA (Class II) transposons (Wicker et al., 2007)  As a technical validation of our repeatome annotation protocol (see Methods, Figure S2), we performed the same analysis in C. elegans, using the PRJNA13758 assembly (The C. elegans Genome Sequencing Consortium, 1998). We compared our results (Kozlowski, 2020b) with the reference report of the TE landscape in this model nematode (Bessereau, 2006). We estimated that the C. elegans repeatome spans 11.81% of its genome (Table S5), which is close to the 12% described in Bessereau (2006). The same resource also reported that MITEs and LTR respectively compose ~2% and 0.4% of the C. elegans genomes, while we predicted 1.8% and 0.2%.
Predictions obtained using our protocol are thus in the range of previous predictions for C. elegans, which suggest our repeatome prediction and annotation protocol are accurate.
The WormBook resource (Bessereau, 2006) mentioned that most of C. elegans TE sequences "are fossil remnants that are no longer mobile" and that active TEs are DNA transposons. This suggests a stringent filtering process is necessary to isolate TEs that are the most likely to be active (e.g., the "canonical" ones). Using the same postprocessing protocol as for M. incognita, we estimated that canonical TEs span 3.60% of the C. elegans genome, with DNA transposon alone representing 76.6% of these annotations ( Figure S4; Table S6).

| Canonical TE annotations show high similarity to their consensus sequences, and some present evidence for transposition machinery
Canonical TE annotations have a median nucleotide identity of 97% with their respective consensus sequences, but the distribution of identity values varies between TE orders (Figure 2; Table S7). Most of the TEs within an order share a high identity level with their consensuses, the lowest values being observed for Helitron and Maverick elements. Yet, more than half of these elements share above 94% identity with their consensuses ( Figure S5). Although it might be hypothesized the lower identities would be due to bigger length ( Figure S6), we showed no evident correlation between the % identity copies share with their consensus and the proportion of consensus length covered ( Figure S5). Even considering our inclusion threshold at minimum 85% identity (see Methods), the overall distribution of average % identities tends to be asymmetrical and skewed toward higher values (Figure 2).
Among DNA transposons, identity profiles of MITEs and TIRs to their consensuses were the most shifted to high values; one fourth of the TIR annotations share above 99% identity with their consensus (Figure 2; Tables S4 and S7).
Among retrotransposon, SINEs (present in very low numbers) and TRIMs show similar profiles with a quite narrow peak at more than 97% identity. Overall, these results indicate that notwithstanding small differences between orders, the canonical TEs show a high similarity with their consensuses.
High identity of TE annotations to their consensus can be considered a proxy of their recent activity (Bast et al., 2015;Lerat et al., 2019   Kozlowski, 2020b; Figure S2). The Morelos isolate from Mexico was the one used to produce the M. incognita reference genome (Blanc-Mathieu et al., 2017). The 11 other isolates come from different locations across Brazil, and present four different ranges of compatible hosts (referred to as R1, R2, R3, and R4; see Figure S1) and currently infected crop species (Koutsovoulos, Marques et al., 2020). Each isolate was reared from the offspring of a single female, and approximately 1 million individuals per isolate were pooled to gather enough material for DNA extraction and pool-seq paired-end Illumina sequencing. For each locus, each isolate has an associated frequency value representing the proportion of reads in the pool supporting the presence of a TE at this location.
We identified 3514 loci where the amplitude of frequency variation between at least two isolates was above our estimated PoPoolationTE2 error rate (0.00972, i.e., less than 1%; see Methods), and thus likely to represent a biological reality.
Overall, the distribution of within-isolate frequencies is bimodal (Figure 3a), and this pattern is common to all the isolates, including the reference Morelos isolate (Figure 3b)  frequencies might be due to the tendency of PoPoolationTE2 to underestimate high frequencies and to overestimate low frequencies.
However, true within-isolate variations in the presence/absence of a TE at a given locus have been experimentally confirmed via PCR experiments in another complementary study (Kozlowski, 2020c

| Variations in TE frequencies across isolates recapitulate their divergence at the sequence level
We performed a neighbor-joining phylogenetic analysis of M. incognita isolates based on a distance matrix constructed from TE frequencies (3514 loci; see methods). We also performed a maximum-likelihood (ML) analysis based on SNV in coding regions as previously identified in Koutsovoulos, Marques et al. (2020) adding the reference isolate Morelos.
As shown in Figure 4, the TE-based and SNV-based tree topologies are highly similar. In particular, the two trees allowed defining four highly supported clades, with bootstrap values ≥98. The four clades were identical, including branching orders for clades 2 and 4 (the two other clades containing each only two isolates). R1-6 and R2-1 positions slightly differed between the SNV-based (A) and TEbased (B) trees. However, in both trees R1-6 is more closely related to clusters 1 and 2 than the rest of the isolates, and similar observations can be drawn for R2-1 with clusters 3 and 4.

F I G U R E 4
Phylogenetic tree for Meloidogyneincognita isolates. (a) Maximum-likelihood (ML) phylogenetic tree based on SNV present in coding sequences. Branch length not displayed (see Figure S7 for a version with branch length displayed). (b) Neighbor-joining (NJ) phylogenetic tree based on TE frequency Euclidean distances between isolates. Branch length not displayed (see Figure S7 for a version with branch length displayed). In both trees, bootstrap support values are indicated on the branches. Isolates enclosed in the dashed area form a super-cluster composed of the clusters (1) and (2), and the isolate R1-6 (a) (b) Altogether, the similarity between the SNV-based and TE frequency-based trees indicates that most of the phylogenetic signal coming from variations in TE frequencies between isolates recapitulates the SNV-based genomic divergence between isolates and thus genome diversification within the species.

| Although most TE loci are stable, some show substantial variations including differential insertions/ deletions among isolates
As explained below (see also Methods and Figures S2 and S3 Figure 5).
Overall, 73.5% (2584/3514) of the loci with TE frequency variations could be assigned to one of the three categories of TE polymorphisms (b, c, d in Figure 5) and the decomposition per TE order is given in Figure 6 and Table S9. Most polymorphic loci (80.92%; 2091/2584) correspond to an already existing TE annotation in the reference genome whose presence is confirmed (frequency >75%) at least in the reference isolate Morelos but varies in at least another isolate. Note that the ancestral state being unknown, extreme cases of variations in these polymorphic loci could equally represent a TE insertion at least in the reference strain Morelos or a loss in one or several other isolates. These polymorphic loci encompass ~21.6% (2091/9702) of the canonical TE annotations, in total. These loci will be referred to as "polymorphic reference loci" from now on (Figure 5b), and they concern both retro and DNA transposons.
Then, we considered as "new TE loci", TEs present at a frequency >25% in at least one isolate at a locus where no TE was annotated in the reference genome and the frequency of TE presence was lower than the estimated error rate (~1%) in the reference Morelos isolate ( Figure 5d). In total, 11.11% (

| TIR and MITE elements are over-represented among TE polymorphisms
By themselves, MITE and TIR elements encompass 94.58% (2444/2584) of the categorized TE polymorphisms (Figure 6).
We showed that the polymorphism distribution varies significantly between the four categories presented in Figure 5 (chi-square test, p-value <2.2e-16), indicating that some TE orders are characterized by specific polymorphism types.
The analysis of the chi-square residuals ( Figure S8) shows MITEs and TIRs are the only orders presenting a relative lack of nonpolymorphic "stable" TEs. Hence, these two TE orders are significantly

| Some polymorphic loci with contrasted frequency variations between isolates most probably represent TE neo-insertions
We investigated the variability of TE presence frequency per locus between the 12 isolates for all the categorized polymorphic loci in the genome.
In ~3/4 (1911/2584) of the categorized polymorphic TE loci, although variations in presence frequency between isolates were above the estimated error rate (<1%), they remained at relatively low amplitude (maximum frequency variation between isolates ≤25% for a given locus) (see Methods; Figure S3). Most of these cases (97.95%; 1872/1911) concern loci where the TE is present at a high frequency in all isolates (>75%). These loci might be in which the TE is found with high frequencies (>75%) for some isolate(s), while it is absent or rare (frequency <25%) in the other(s).
These loci will be from now on referred to as HCP standing for "highly contrasted polymorphic" TE loci. Because they are highly contrasted, these loci might represent fixed differential TE insertions or deletions across isolates and will be the focus of the following analyses.
HCP TE loci encompass 19 MITE elements, 12 TIRs, and 2 LINEs (Table S10). We can also notice that some consensuses are more involved in HCP TE loci as two TE consensuses alone are responsible for 72.72% (18/33) of these polymorphisms (one MITE consensus involved in 10 HCP loci and one TIR consensus involved in eight such loci).
Interestingly, all the HCP TE loci correspond to new TE loci regarding the reference genome, meaning that no TE was annotated in the reference genome at this location and the TE presence frequency is <1% in the Morelos reference isolate. As described in Figure 7, most of these fixed new TE loci (20/33) are specific to an isolate and most probably represent isolate-specific neo-insertions conserved in the offspring rather than multiple independent losses.
However, we also found new TE loci shared by two (10/33), three (2/33), or even six isolates (1/33). Interestingly, all the shared new TE loci were between isolates present in a same cluster in the phylogenetic trees (TE-based and SNV-based in Figure 4), suggesting F I G U R E 7 Distribution of the 33 HCP new TE loci across isolates. The central plot shows how many and which isolate(s) share common highly contrasted polymorphic (HCP) new TE loci, every line representing an isolate. Columns with several dots linked by a line indicate shared HCP new TE loci between isolates. Each dot represents which isolate is involved. Columns with a single dot design isolate-specific HCP new TE loci. The top bar plot indicates how many HCP new TE loci the corresponding group of isolate shares. The left-side bar plot specifies how many HCP new TE loci are found in a given isolate they have been inherited by a common ancestor, then maintained at high frequency in the descending isolates. For example, two new TE loci are shared by isolates R4-4, R1-2, and R3-2, which belong to the same cluster 1, and one new TE locus is shared by isolates R4-3 and R1-3, which belong to the same cluster 2.
Hence, the phylogenetic distribution reinforces the idea that these new TE loci are more likely to represent branch-specific neoinsertions than multiple independent losses, including in the reference isolate Morelos. Even the new TE locus shared by 6 isolates follows this pattern as all the concerned isolates belong to the same super-cluster composed of the cluster 2 and 3 plus isolate R1-6 (dashed line in Figure 4). However, in this case a neo-insertion in the common ancestor of these isolates is equally parsimonious than a deletion in the ancestor of all the other isolates.
Isolates R1-2, R3-2, and R4-4 show the highest number of new TE loci. However, their profiles are quite different. In R1-2, 10/12 HCP TE loci are isolate-specific and thus likely represent neoinsertions, while most of the HCP TE loci involving R3-2 and R4-4 are new loci shared with closely related isolates and thus probably inherited from a common ancestor. This is also consistent with the topology and branch lengths of the SNV-based and TE-based phylogenies ( Figure S7), which shows that R1-2 is the most divergent isolate with the longest branch length, while R3-2 is quite close to R4-4 and has a relatively short branch.

| Functional impact of TE neo-insertion and validation of in silico predictions
Interestingly, two-thirds ( (Table S11). Conservation of these genes across multiple PPN but exclusion from the rest of the nematodes or other species suggests these genes might be involved in important functions relative to these organisms lifestyle, including plant parasitism itself.
To experimentally validate in silico predictions of TE insertions with potential functional impact, we performed PCR experiments on 5 of the 22 HCP new TE loci falling in coding or possible regulatory regions (see Methods for selection criteria). To perform these PCR validations, we used the DNA remaining from previous extractions performed on the M. incognita isolates for population genomic analysis (Koutsovoulos, Marques et al., 2020). Basically, the principle was to validate whether the highly contrasted frequencies (>75%/<25%) obtained by PoPoolationTE2 actually corresponded to absence/ presence of a TE at the locus under consideration (see Methods).
One isolate (R3-1) presented no amplification in any of the tested loci nor in the positive control. After testing the DNA concentration in the sample, we concluded that the DNA quantity was too low in this isolate and decided to discard it from the analysis.
For four of the five tested HCP new TE loci, we could validate by PCR the in silico predicted differential presence/absence of a sequence at this position, across the different isolates (Figure 8; Kozlowski, Hassanaly-Goulamhoussen et al., 2020).
In one of the five tested loci, named locus 1, we could (i) validate by PCR the presence of a sequence at this position for the isolates presenting a PoPoolationTE2 frequency >75% and absence for those having a frequency <25%; and (ii) also validate by sequencing that the sequence itself corresponded to the TE under consideration (a MITE).
This case is further explained in detail below and in Figure 8.
The PoPoolationTE2-estimated frequencies are higher than 75% in three isolates (R1-2, R3-2, and R4-4) derived from a common ancestor (cluster 4 in Figure 4) for one MITE. Thus, this MITE was probably inserted in this common ancestor and maintained at high frequency in the three descending isolates. We assumed the TE is absent from the rest of the isolates as all of them display frequencies <5%. To validate this differential presence across the isolates, we designed specific primers from each side of the estimated insertion point so that the amplicon should measure 973 bp with the TE insertion and 180 bp without.
The PCR results are consistent with the frequency predictions as only R1-2, R3-2, and R4-4 display a ~1 kb amplicon, while all the other isolates show a ~0.2 kb amplicon ( Figure 8). Hence, as expected, only the three isolates with a predicted TE frequency >75% at this locus exhibit a longer region, compatible with the MITE insertion.
To validate the amplified regions corresponded to the expected MITE, we sequenced the amplicons for the three isolates and aligned the sequences to the TE consensus and the genomic region surrounding the estimated insertion point (Kozlowski, Hassanaly-Goulamhoussen et al., 2020). Amplicon sequences of the three isolates covered a significant part of the TE consensus sequence length (>78%) with high identity (>87%) and only a few gaps (<5%). These results confirm that the inserted sequence corresponds to the pre- We also noticed that the inserted TE sequences slightly diverged between the isolates, while the genomic region surrounding the insertion point remains identical. Interestingly, the level of divergence in the TE sequence does not follow the phylogeny as R-4_4 is closer to R-1_2 than to R-3_2 (Table S12).
Finally, in the Morelos, R-2_1, and R-2_6 isolates, the sequencing of the amplicon validated the absence of insertions. Indeed, the sequences aligned on the genomic region surrounding the insertion point with high identity (99, 97, and 87%, respectively) but not with the MITE consensus.
Hence, we fully validated experimentally the presence/absence profile across isolates predicted in silico at this locus.
In the M. incognita genome, this MITE insertion is predicted to occur in the 3′ UTR region of a gene (Minc3s00026g01668). This gene has no obvious predicted function, as no conserved protein domain is detected and no homology to another protein with an annotated function could be found. However, the gene model is

| TE landscape in nematode genomes and possible recent activity in Meloidogyne incognita
In this analysis, we have annotated TEs in the genome of M. incognita and used variations in TE frequencies between geographical isolates across loci as a reporter of their activity. The M. incognita TE landscape is more abundant in DNA than in retrotransposons, and using the same methodology, we confirmed a similar trend in the genome of C. elegans, despite the Caenorhabditis and Meloidogyne genus being separated by >200 millions of years of evolution (Kumar et al., 2017). Interestingly, even if the methodology used was different, a similar observation was made at the whole Nematoda level (Szitenberg et al., 2016), suggesting a higher abundance of DNA transposons might be a general feature of nematode genomes.
We have shown 75% of the polymorphic TE loci in M. incognita display moderate-frequency variations between isolates (<25%), a It should be noted here that the total TE activity in the M. incognita genome is probably underestimated, in part because of the stringent filters we applied to eliminate false positives as much as possible, and in another part because of the intrinsic limitations of the tools, such as the incapacity of PoPoolationTE2 to detect nested TEs (Kofler et al., 2016).
We then evaluated how recent this activity could be, using % identity of the TE copies with their respective consensuses as a proxy for their age as previously proposed in other studies (Bast et al., 2015;Lerat et al., 2019). We showed that a substantial proportion of the canonical TE annotations were highly similar to their consensus, indicating most of these TE copies were recent in the genome. The probable recent hybrid origin of M. incognita (Blanc-Mathieu et al., 2017) is consistent with a recent TE burst in the genome. Indeed, as further explained in the last section of the discussion, it is well established that hybridization events can lead to a relaxation of the TE silencing mechanisms and consequently to a TE expansion (Belyayev, 2014;Guerreiro, 2014;Rodriguez & Arkhipova, 2018).
However, as suggested in Bourgeois and Boissinot (2019), the extent of this phenomenon might differ depending on the TE order. In M. incognita, MITEs and TIRs alone account for ~2/3 of the canonical TE annotations, but their fate in the genome seems to have followed different paths. Indeed, as illustrated in Figure 2, MITEs show a wide range of identity rates with their consensus, which suggests they might have progressively invaded the genome being uncontrolled or poorly controlled as suggested for the rice genome (Lu et al., 2017). On the opposite, almost all the TIR copies share high percentage identity with their consensuses, which could be reminiscent of a rapid and recent burst. Nevertheless, this burst could have quickly been under control as, according to chi-square residuals ( Figure S8), new TIR loci are significantly less numerous than expected owing to their abundance in the genome. These differences between the distribution of MITE and TIR identities to their consensuses are possibly linked to differences in the TE length itself. Indeed, TIRs are usually expected to be at least twice longer than MITEs and their accumulation may be subject to higher counter-selection than MITEs as length-dependent selection on TE persistence has previously been observed in different animals (Bourgeois & Boissinot, 2019). Interestingly, in C. elegans, the Tc1/ Mariner TIR DNA element was shown to be the most active while, so far, no evidence for active retrotransposition was shown in this species (Bessereau, 2006;Laricchia et al., 2017).
Because no molecular clock is available for M. incognita, it is impossible to evaluate more precisely when TE bursts would have happened and how fast each TE from each order would have spread in the genome. Such bursts can be very recent, including in animal genomes as exemplified by the P-element, which invaded the genome of some Drosophila populations in just 40 years (Anxolabéhère et al., 1988).
While an absolute dating of TE activities in M. incognita is currently not possible, a relative timing of the events regarding population diversification can still be deduced from the distribution of TE locus frequencies across isolates. Indeed, we have shown (Figure 7) that some new TE loci were shared between isolates and that in each case, the concerned isolates belonged to a same monophyletic cluster ( Figure 4).

| Functional impact of TE activity in Meloidogyne incognita and other nematodes
Meloidogyne incognita is a parthenogenetic mitotic nematode of major agronomic importance. This pest shows no sign of sexual recombination and only a few genome variations at the SNP level (Koutsovoulos, Marques et al., 2020). The molecular mechanisms underlying the genome plasticity necessary for adaptive evolution remain poorly known. In this study, we investigated whether TE movements could contribute to the M. incognita genome plasticity.
In M. javanica, a closely related root-knot nematode, comparison between an avirulent line unable to infect tomato plants carrying a nematode resistance gene and another virulent line that overcame this resistance led to the identification of a gene present in the avirulent nematodes but absent from the virulent ones. Interestingly, the gene under consideration is present in a TIR-like DNA transposon and its absence in the virulent line suggests this is due to excision of the transposon and thus that TE activity plays a role in M. javanica adaptive evolution (Gross & Williamson, 2011).
In  In M. incognita, almost 23% of the genes have been described as specific to plant-parasite species, without any recognizable homology in other species (Grynberg et al., 2020). Interestingly, TE movements can be involved in the emergence of species or genusspecific "orphan" genes (Jin et al., 2019;Ruiz-Orera et al., 2015;Wu & Knudson, 2018). Because some of the M. incognita genes impacted by TE insertions are specific to plant-parasite species and yet widely conserved among these parasites, a role in plant parasitism is possible.

| Ploidy, (a)sexuality, and hybridization: a complex interplay influencing TE load and composition
Meloidogyne incognita is an asexual (mitotic parthenogenetic), polyploid, and hybrid species. These three features are expected to impact TE load in the genome with various intensities and possibly conflicting effects.
Contradictory theories exist concerning the activity/proliferation of TEs as a function of the reproductive mode. The higher efficacy of selection under sexual reproduction can be viewed as an efficient system to purge TEs and control their proliferation. Supporting these views, in parasitoid wasps, TE load was shown to be higher in asexual lineages, induced by the endosymbiotic Wolbachia bacteria, than in sexual lineages (Kraaijeveld et al., 2012). However, whether this higher load is a consequence of the shift in reproductive mode or of Wolbachia infection remains to be clarified.
In an opposite theory, sexual reproduction can also be considered as a way for TEs to spread across individuals within the population, whereas in clonal reproduction, the transposons are trapped exclusively in the offspring of the holding individual. Under this view, asexual reproduction is predicted to reduce TE load as TEs are unable to spread in other individuals, and are thus removed by genetic drift and/or purifying selection in the long term (Wright & Finnegan, 2001).

Consistent with this theory, comparison of sexual and asexual
Saccharomyces cerevisiae populations showed that the TE load decreases rapidly under asexual reproduction (Bast et al., 2019).
Hence, whether the TE load is expected to be higher or lower in clonal species compared with sexual relatives remains unclear and other conflicting factors such as TE excision rate and the effective size of the population probably blur the signal (Glémin et al., 2019).
The breeding system has been shown to constitute an important factor influencing TE distribution in Caenorhabditis genomes (Dolgin et al., 2008): TEs in self-fertilizing populations seem to be selectively neutral and segregate at higher frequency than in outcrossing populations, where they are submitted to purifying selection.
Interestingly, at a broader scale, a comparative analysis of different lineages of sexual and asexual arthropods revealed no evidence for differences in TE load according to the reproductive modes (Bast et al., 2015). Similar conclusions were drawn at the whole nematoda phylum scale (Szitenberg et al., 2016), although only one apomictic asexually reproducing species (i.e., M. incognita) was present in the comparative analysis.
Polyploidy, in contrast, is commonly accepted as a major event initially favouring the multiplication and activity of TEs. This is clearly described with numerous examples in plants (Vicient & Casacuberta, 2017), and some examples are also emerging in animals (Rodriguez & Arkhipova, 2018). When hybridization and polyploidy are combined, this can lead to TE bursts in the genome. As originally proposed by Barbara McClintock, allopolyploidization produces a "genomic shock," a genome instability associated with the relaxation of the TE silencing mechanisms and the reactivation of ancient TEs (McClintock, 1984;Mhiri et al., 2019).
Hybridization, polyploidy, and asexual reproduction are com- Hence, further sampling of Meloidogyne species with diverse ploidy levels and reproductive modes will be necessary to disentangle the relative contribution of these three features on the TE abundance and composition.

| Concluding remarks
In this study, we used population genomics technique and statisti-

CO N FLI C T O F I NTE R E S T
The authors of this paper declare that they have no financial conflict of interest with the content of this article.

DATA AVA I L A B I L I T Y S TAT E M E N T
All the raw and filtered data generated in this study, as well as details of the experimental procedures, scripts, and datasets, have been deposited and made publicly available in the institutional INRAE Data Portal at this URL: https://data.inrae.fr/datav erse/TE-mobil ity-in-MiV3 and cited throughout the text where appropriate, with DOIs available in the references.