The impact of transposable elements on tomato diversity

Domínguez, Marisol; Dugas, Elise; Benchouaia, Médine; Leduque, Basile; Jiménez-Gómez, José M; Colot, Vincent; Quadrana, Leandro

doi:10.1038/s41467-020-17874-2

Download PDF

Article
Open access
Published: 13 August 2020

The impact of transposable elements on tomato diversity

Nature Communications volume 11, Article number: 4058 (2020) Cite this article

16k Accesses
76 Citations
79 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 21 May 2021

This article has been updated

Abstract

Tomatoes come in a multitude of shapes and flavors despite a narrow genetic pool. Here, we leverage whole-genome resequencing data available for 602 cultivated and wild accessions to determine the contribution of transposable elements (TEs) to tomato diversity. We identify 6,906 TE insertions polymorphisms (TIPs), which result from the mobilization of 337 distinct TE families. Most TIPs are low frequency variants and TIPs are disproportionately located within or adjacent to genes involved in environmental responses. In addition, genic TE insertions tend to have strong transcriptional effects and they can notably lead to the generation of multiple transcript isoforms. Using genome-wide association studies (GWAS), we identify at least 40 TIPs robustly associated with extreme variation in major agronomic traits or secondary metabolites and in most cases, no SNP tags the TE insertion allele. Collectively, these findings highlight the unique role of TE mobilization in tomato diversification, with important implications for breeding.

Super-pangenome analyses highlight genomic diversity and structural variation across wild and cultivated tomato species

Article Open access 06 April 2023

Identifying genetic variants underlying phenotypic variation in plants without complete genomes

Article 13 April 2020

Exploiting the diversity of tomato: the development of a phenotypically and genetically detailed germplasm collection

Article Open access 01 May 2020

Introduction

Tomatoes are the highest-value fruit and vegetable crop worldwide. Despite the recurrent genetic bottlenecks that have occurred since its domestication^1,2, tomato exhibits extensive phenotypic variation, and the diversity we see today among cultivars is thought to result mainly from selection of rare alleles with large effects³. Nonetheless, while genomics-enabled genetics has revolutionized our ability to identify loci underlying domestication and improvement traits in virtually any crop^4,5,6, our understanding of the genetic basis of crop diversity is still limited. This situation stems in part from the fact that, with few notable exceptions^7,8,9,10,11, most genome-wide association studies (GWAS) consider only single-nucleotide polymorphisms (SNPs) and short indels^12,13, when structural variants, which include gene presence/absence variants and typically segregate at low frequency, account for the largest amount of DNA sequence differences between individuals and cultivars^3,10,11,14. Furthermore, the majority of structural variants result from the mobilization of transposable elements (TEs), which by themselves are potentially an important source of large-effect alleles¹⁵. Indeed, many TEs insert near or within genes¹⁶, and because of their epigenetic control as well as through the transcription factor-binding sites they harbor, TEs have the ability to alter gene expression and rewire gene expression networks^16,17. Although numerous domestication and agronomic traits have been associated with particular TE insertions^{15,18,19,20,21,22}, the specific contribution of TEs to the phenotypic diversification of crop species is still poorly documented. Here, we assess through a systematic analysis of 602 resequenced genomes the prevalence and impact of TE insertion polymorphisms (TIPs) among wild and cultivated tomatoes. We show that TIPs tend to have large transcriptional effects when located within or near genes and long-read Nanopore transcriptomics reveals that intronic TE insertions can generate multiple transcript isoforms with potential phenotypic consequences. Furthermore, GWAS detects numerous TIPs associated with variations in major agronomic traits or secondary metabolites. Importantly, these TIPs tend to affect loci that are distinct from those tagged by SNPs, illustrating the interest of incorporating TIPs into genomic-assisted breeding programs. Collectively, our approaches and findings provide a framework to study the implication of TIPs to crop diversity.

Results

Tomato mobilome composition

The tomato reference genome (Solanum lycopersocum cv. Heinz 1706, release SL2.5) contains 665,122 annotated TE sequences belonging to 818 families²³. The vast majority of these annotations correspond to ancestral TE copies that have degenerated to different degrees and potentially lost their ability to transpose²⁴. To investigate the composition of the tomato mobilome, i.e., the set of TE families with recent mobilization activity, we analyzed short-read whole-genome resequencing data available for 602 tomato accessions^2,25,26. This dataset contains wild tomato relatives (Wild, Fig. 1a) and spans the Lycopersicon clade, which regroups wild tomatoes (S. pimpinellifolium, SP), early domesticated tomatoes (S. lycopersicum cerasiforme, SLC), and cultivated tomatoes (S. lycopersicum lycopersicum vintage and modern, SLL). To detect additional, i.e., non-reference, TE insertions in each genome sequence, we deployed a refined version of the SPLITREADER pipeline²⁷ (Fig. 1b, see “Methods”). We restricted our analysis to the 467 TE families with annotated copies longer than 1 kb in the reference genome. These families represent the full range of Class I LTR and non-LTR retroelements (i.e., GYPSY, COPIA, and LINE superfamilies) and Class II DNA transposons (i.e., MuDR, hAT, and CACTA superfamilies), which move through copy-and-paste and cut-and-paste mechanisms, respectively. After filtering low-quality calls (see “Methods”), 6906 non-reference TE insertions remained for downstream analysis (Supplementary Data 1). Most TE insertions were present in one or a few tomato accessions only (Fig. 1c), suggesting that they occurred recently. Nonetheless, cluster analysis based on these 6906 TIPs recapitulated the phylogenetic relationship between accessions previously determined using SNPs (Fig. 1d)^2,3.

TIPs were contributed by 337 TE families in total, which likely represent the near-complete composition of the tomato mobilome. Indeed, most TE families with TIPs could be detected using only ~200 of the 602 resequenced genomes (Fig. 1e), and the majority (84%) of TIPs resulted from the mobilization of GYPSY and COPIA LTR retrotransposons (Fig. 1f; Supplementary Fig. 1a). The COPIA RIDER family, which generated insertion mutations with important agronomic implications^21,22,28,29, contributes the highest number (507) of TIPs overall. Mobilome composition varies substantially among tomato groups and, as expected, is the richest in the genetically diverse SP group (~230 TE families, Fig. 1g). However, despite the loss of genetic diversity associated with domestication (Supplementary Fig. 1b)^2,3, the mobilome composition of early domesticated SLC is only marginally reduced compared with that of SP (210 vs. 230 TE families, Fig. 1g). This last observation is consistent with the recurrent hybridization between SLC and SP¹, and the unique ability of TEs to invade new genomes³⁰. In contrast, vintage and modern SLL have a more reduced mobile composition (~150 TE families, Fig. 1g), in keeping with the strong genetic bottleneck caused by the post-Columbian introduction of tomato to Europe.

TIP landscape and transcriptional impact

Whereas TE sequences present in the reference genome are enriched in pericentromeric regions²³, TIPs are distributed more equally along chromosomes (Fig. 2a). Nonetheless, superfamily-specific integration patterns are evident. For instance, TIPs corresponding to COPIA and many other TE superfamilies are found preferentially within or near genes, while GYPSY TIPs cluster in pericentromeric regions (Fig. 2a, b). Importantly, genes harboring TIPs are overrepresented in functions related to response to pathogens or other environmental stresses (Fig. 2c). This overrepresentation is driven by COPIA insertions and likely reflects integration preferences rather than relaxed purifying selection or detection biases, which should affect all types of TIPs. Indeed, experimental evidence indicates that COPIA integrates preferentially within environmentally responsive genes in Arabidopsis and rice³¹.

**Fig. 2: Landscape and transcriptional impact of TIPs.**

In many organisms, including plants and animals, TIPs have been associated with large transcriptomic changes^{14,27,32,33,34}. To assess the impact of TIPs on gene expression in tomato, we used RNA-seq data obtained from breaker fruits for 400 accessions³⁵. We considered all genes harboring a TIP within 1 kb, and compared transcript levels between accessions carrying or lacking the insertion. TIPs associated with two-fold or more changes in gene expression are proportionally more frequent when located in exons and introns (43% and 37%, respectively) than in other gene compartments (Fig. 2d). Furthermore, changes are either positive or negative, consistent with the notion that TE insertions can affect gene expression in multiple ways. To explore further these transcriptional effects, we compared RNA-seq coverage upstream and downstream of insertion sites (Fig. 2d). This analysis uncovered additional TIPs affecting gene expression, and revealed that between 20% and 28% of genic TIPs interfere with transcript elongation when exonic or intronic, respectively. Taken together, these results indicate that TIPs residing within the transcribed part of genes have pervasive and complex effects.

Consistent with the observed overrepresentation of TIPs within specific gene ontology categories, expression of immune- and stress-responsive genes was particularly affected by TIPs (Fig. 2e). For instance, we uncovered a rare MuDR-containing allele of the gene slDCL2a (Solyc06g048960), which is involved in resistance against RNA viruses³⁶. As the intronic insertion is associated with a severe reduction in transcript level, accessions carrying the rare allele could be more susceptible to viral attacks. Likewise, an exonic COPIA insertion within the CC-NB-LRR gene Ph-3 (Solyc09g092310), which confers broad resistance to Phytophthora infestans^37,38, is associated with transcript truncation and could therefore cause increased susceptibility to this pathogen. We also identified TE insertions with potential beneficial effects. For example, the exonic COPIA insertion in the gene slXTH9 (Solyc12g011030), which encodes a xyloglucan endotransglucosylase/hydrolase preferentially expressed during fruit ripening³⁹, is associated with a near-complete loss of expression. Given the key role of slXTH9 in fruit softening²⁴, this natural loss-of-function allele could potentially be harnessed to breed tomato fruits with harder texture and longer shelf life⁴⁰.

TIPs as an unregistered source of phenotypic variants

To assess more systematically whether TIPs are a potentially important source of phenotypic variation, we first measured the proportion of TIPs in high-linkage disequilibrium (LD, r² > 0.4) with SNPs. This proportion was much lower than for SNPs in high LD with other SNPs (Fig. 3a). This result was confirmed using a set of 56 visually validated TIPs (Supplementary Fig. 2a), indicating that the lower LD observed for TIPs compared with SNPs cannot be fully explained by reduced sensitivity and specificity of TIP detection. Conversely, and in agreement with previous findings in Arabidopsis³³, maize¹¹, grapevine¹⁰, and humans¹⁴, rare TIPs (MAF < 1%) tend to have lower LD with nearby SNPs than more common TIPs (Supplementary Fig. 2b, c). In addition, most TIPs in high LD with SNPs are located on chromosome 9 (Supplementary Fig. 2d), consistent with modern tomatoes harboring on that chromosome a large introgressed segment from wild tomatoes². Based on these observations and because TE insertions tend to generate large-effect alleles, we reasoned that even when low frequency variants, TIPs could still be used for TIP-GWAS⁹. We considered TIPs with MAF > 1% and with less than 20% of missing data in GWAS for 17 important agronomic traits in tomato, including determinate or indeterminate growth, simple or compound inflorescences, leaf morphology, as well as fruit color, shape, and taste. Importantly, given the reduced sensitivity and specificity of TIP detection, which can increase the probability of finding false associations, we curated all putatively associated TIPs by visual inspection. These TIP-GWAS uncovered a total of nine high-confidence loci associated with five traits, including fruit color and leaf morphology (Supplementary Fig. 3a). These two traits were previously linked to TE insertions^28,41, thus validating our TIP-GWAS approach. Moreover, association with leaf morphology is much stronger for the TIP than for any SNP (Fig. 3b–d), suggesting that TIP-GWAS was able to pinpoint the causal variant. In addition, most TIP associations could not be identified using SNPs (Supplementary Fig. 3b), demonstrating the interest of considering TIPs in addition to SNPs in GWAS. For instance, our TIP-GWAS revealed a strong association between a RIDER insertion within the gene PSY1, which encodes a fruit-specific phytoene synthase, and yellow fruit (Fig. 3e–h). Incidentally, our SNP-GWAS revealed another variant of PSY1 associated with yellow fruit (Fig. 3g). Local assembly using short reads indicated that this alternative allele, which we named r^Del to distinguish it from the previously identified r^TE allele, contains an ~6-kb deletion that bridges the last exon of PSY1 with the next gene (Solyc03g031870) downstream (Fig. 3i). Together, r^TE and r^Del account for 60% of yellow tomato accessions, and those carrying the r^TE allele display lower expression levels of PSY1 and yellower fruit than accessions with the r^Del allele (Fig. 3j, k). Moreover, we detected the r^TE and r^Del alleles in several SLC and SLL vintage accessions but in none of the wild tomatoes (S. pimpinellifolium) and wild relatives (Fig. 3l), which suggests that r^TE and r^Del arose after domestication. Also, while the RIDER insertion affected a common haplotype of PSY1 shared among early domesticated and improved tomatoes, the ~6-kb deletion affected a rare haplotype containing numerous SP-derived sequences (Supplementary Fig. 4). Together, these results suggest that the first tomato cultivar introduced in Europe during the sixteenth century, which was reported to be yellow⁴², harbored the r^TE allele.

To investigate further the specific contribution of TIPs to trait variation in tomato, we conducted SNP- and TIP-GWAS on 1012 metabolic phenotypes measured for more than 397 accessions^25,35. In total, 846 and 41 associations with 369 and 30 metabolites were identified by SNP- and TIP-GWAS, respectively (Fig. 4a). Of the 41 associations, 31 were confirmed by visual inspection of the underlying TIPs and were considered further. Remarkably, except in one case, the TE-containing allele is not tagged by any SNP, and 14 TIPs affect loci not identified by SNP-GWAS. Moreover, TIPs unlike SNPs are predominantly associated with variation in volatiles (Fig. 4b), a class of secondary metabolites that are implicated in defense response and interaction with other organisms⁴³. This skewing of TIP associations is readily explained if one considers that constraints are lower on secondary than on primary metabolism, and that on average, the effect size of TIPs is much larger than that of SNPs (Fig. 4c). Finally, because almost all of the TE-containing alleles detected using our TIP-GWAS are present in SLC accessions (Fig. 4d), their contribution to phenotypic diversification is higher among early domesticated tomatoes.

**Fig. 4: TIP associations with secondary metabolism.**

A key TIP for tomato flavor

Our TIP-GWAS revealed a COPIA LTR-retrotransposon insertion that is absent in modern cultivars and which is associated with high levels of 2-phenylethanol (Fig. 5a–d), a volatile that gives a pleasant flowery aroma to heirloom tomatoes⁴⁴. This TE insertion is located in the single intron of gene Solyc02g079490, which is preferentially expressed in ripe fruits and encodes a protein with high similarity (63% aa identify) with a cinnamyl alcohol Acyl-CoA transferase⁴⁵ (Supplementary Fig. 5a, b). Consistent with a potential role of Solyc02g079490 in the accumulation of 2-phenylethanol, the introgression line (IL) 2.3⁴⁶, which harbors the lowly expressed S. pennellii allele of Solyc02g079490⁴⁷, also accumulates more 2-phenylethanol compared with the modern cultivar M82 (Supplementary Fig. 5c)⁴⁸. Thus, Solyc02g079490 likely encodes a putative 2-phenylethanol Acyl-CoA transferase (PPEAT) involved in the esterification of 2-phenylethanol, which otherwise accumulates in fruits.

Although the intronic COPIA insertion does not appear to affect the expression levels of Solyc02g079490 (Fig. 5e), hereafter referred to as PPEAT, we noted numerous transcript isoforms in accessions carrying the insertion compared to a single predominant transcript otherwise (Supplementary Fig. 5d). To characterize these additional transcripts further, we performed full-length cDNA Nanopore sequencing of ripe fruit samples from two accessions, one carrying and one lacking the intronic COPIA insertion. We uncovered in this manner at least three additional transcript isoforms, all of which result from alternative splicing (Fig. 5f). Moreover, the COPIA-containing intron (>5 kb), which is one of the largest intron genome-wide based on our Nanopore sequencing (Supplementary Fig. 5e), is spliced out in most cases. Nonetheless, this large intron is retained in one isoform, thus leading to an unusually long transcript (Fig. 5g; Supplementary Fig. 5e). All of the alternative isoforms incorporate premature stop codons or encode proteins that lack highly conserved catalytic domains (Fig. 5g; Supplementary Fig. 5f). Based on these findings, we reanalyzed the RNA-seq data obtained for 400 accessions. This reanalysis confirmed that truncated isoforms are almost exclusively associated with the intronic COPIA insertion, and revealed that they make up around 60% of all PPEAT transcripts (Fig. 5h). Based on these additional findings, we propose that the COPIA insertion generated a hypomorphic PPEAT allele, which would explain the overaccumulation of 2-phenylethanol. This could be formally demonstrated in the future by removing the COPIA insertion through genome editing. Also, because the insertion is absent from wild relatives, but present at intermediate frequency in wild (S. pimpinellifolium) and SLC tomatoes (Fig. 5i), we speculate that the COPIA-containing allele of PPEAT predated domestication, and that it was selected in early domesticated tomatoes but not in modern varieties, which are notorious for their poor flavor⁴⁴.

Discussion

Cultivated tomato has a complex history of domestication and improvement, characterized by two successive genetic bottlenecks, followed since modern breeding by several introgression events from wild tomatoes and relatives to replenish the limited pool of disease-resistance genes^1,10,49. Despite a relatively narrow genetic diversity, the more than 25,000 cultivars grown around the world today exhibit an extraordinary phenotypic diversity, and the underlying allelic variants are being progressively identified, thanks to the advent of high-throughput genome sequencing. Here, we show that TIPs, which to date have been ignored from population genomic studies in tomato, are an important diversifying force to consider, as has been proposed for other plant species^{27,33,34,50,51,52,53,54,55}. For instance, GWAS in rice for grain length and width using respectively structural variants and TIPs uncovered associations that could not be detected using SNP data^8,9. Moreover, in the case of grain width, the associated TIP is very rare and in low LD with nearby SNPs. Likewise, we found that most TE insertions in tomato are low-frequency variants rarely tagged by SNPs. Thus, our findings reinforce the notions that TIPs and SNPs contribute distinct phenotypic variants, and that TIPs identified in GWAS as leading variants are likely causal^7,8,9,10, which opens up the way for their use for future breeding.

We based our study on TIPs using 467 TE families that represent most of the retrotransposons and DNA transposon families present in the reference tomato genome. However, we did not consider small-length TE families, such as SINEs and MITEs, which are also likely to contribute to phenotypic diversity⁵³, but that are difficult to analyze using short reads because of their small size and very high copy number in many genomes. The implementation of long-read sequencing should remedy this problem, as well as that of the low sensitivity and specificity of TIP detection using short-read sequencing technologies. Indeed, a landmark analysis of 100 tomato genomes based on long-read Nanopore sequencing revealed thousands of structural variants, mostly involving TEs that intersect genes and cis-regulatory regions⁵⁶. This and other studies using long-read sequencing^10,11,57 suggest that we will soon be in a position to assess comprehensively the contribution of TE insertions, as well as of the structural variants they can generate through recombination or other means, to crop diversity.

The composition of the tomato mobilome, as defined here, appears to be substantially reduced following the post-Columbian introduction of tomato to Europe. This observation may reflect the strong genetic bottleneck this introduction created, as well as the increased levels of inbreeding that ensued, as the latter favors the accumulation of deleterious mutations, and is therefore expected to compromise the long-term survival of accessions with high mobilome activity¹⁵. Whether introgression from wild germplasms used in modern breeding can alter this picture by enabling new TE mobilization remains to be determined.

Methods

Detection of TIPs

Illumina resequencing data from 602 tomato accessions were downloaded from EBI-ENA and aligned to the tomato genome reference (version SL2.5) using Bowtie2 v.2.3.2 (arguments –mp 13 –rdg 8,5 –rfg 8,5 –very-sensitive), and PCR duplicates were removed using Picard. The detection of TIPs was performed using an improved version of SPLITREADER⁵⁸. Our SPLITREADER (vbeta2.5) pipeline uses the information of both split- and discordant reads to call non-reference insertions. In addition, we genotyped the absence of non-reference TE insertions by analyzing local coverage around the insertion sites. SPLITREADER has four steps: (i) extraction of reads mapping discordantly or not at all to the reference tomato genome, (ii) mapping to a collection of reference TE sequences and selection of the reads aligning partially or discordantly, (iii) remapping selected reads to the reference genome sequence, and (iv) identification of a cluster of split- and/or discordant reads indicating the presence of a non-reference TE insertion. Specifically, for each tomato accession, we retrieved reads that did not map to the reference genome sequence (containing SAM flag 4) or that mapped discordantly (paired reads mapping to different chromosomes or to positions separated by more than ten times the average library size). These reads were then aligned (using Bowtie2 v.2.3.2 in-local mode to allow for soft clip alignments) to a joint TE library assembled from TE annotations²³ belonging to 467 TE families longer than 1 kbp and spanning the full range of Class I (Gypsy, Copia, and LINE) and Class II TEs (MuDR, hAT, and CACTA). Next, we selected all reads mapping to a TE sequence either partially (≥20 nt) or fully but with an unmapped mate. These reads were remapped to the tomato reference genome sequence (using Bowtie2 v.2.3.2 in-local mode to allow for soft clip alignments). Read clusters composed of at least two reads mapping in the right orientation (i.e., at least one discordant read in the +orientation upstream of the discordant read in the -orientation, or one 3′ soft-clipped read upstream of a 5′ soft-clipped read, or any combination of the cases described above) were taken to indicate the presence of a bona fide non-reference TE insertion. These sites were intersected across all accessions to identify those shared and supported in at least one individual by a minimum of three reads, including at least one upstream and one downstream. Negative coverage, as defined by the minimum WGS read depth over the upstream and downstream boundaries of a putative TE insertion site, was then calculated for each accession across all putative TE insertion sites. Accessions with negative coverage of more than five reads and lacking discordant or split-reads supporting the non-reference insertion were considered as noncarriers. Accessions with negative coverage of less than five reads and lacking discordant or split-reads supporting the non-reference insertion were considered as missing information or NA.

Validation of TIPs

Six hundred randomly chosen TIPs detected in S_habCGN157592 (ERR418101), S_pimLYC2740 (ERR418081), S_lycPI303721 (ERR418064), S_lycEA00325 (ERR418043), and S_lycLA2706 (ERR418039) and spanning the six TE superfamilies were inspected visually using IGV, and 82% TIPs were confirmed in at least one accession. Moreover, visual inspection across 516 accessions of 56 TE insertions confirmed visually in at least one carrier genome (i.e., 28,896 visual inspections) identified 287 cases of insertions being missed (i.e., false negatives) and none being wrongly called (i.e., false positives). Also, the presence/absence of two TIPs were assessed by PCR using gDNA extracted from 22 tomato accessions, and their status was confirmed in each case (Supplementary Fig. 6). gDNA was extracted using the CTAB method, and PCRs performed with Taq DNA polymerase (NEB) using the following primers: PPEAT-For1 (GGACACCGCGGAGTAAGAAA) + PPEAT-Rev1 (GACTAGACCACGTCAAGCCC), PPEAT-For2 (TTGGAGGCGCCTGATTTCTT) + PPEAT-INS-Rev1 (TCAAGGCATTCAACAGTTGTTTTG), PSY1-For1 (ACTCCATCTGGAGAACGGAC) + PSY1-Rev1 (CATGGAATCAGTCCGGGAGG), and PSY1-For2 (CATGGAATCAGTCCGGGAGG) + PSY1-INS-Rev3 (GACCCCCGTCCTTTCTGTTT). To further assess the specificity of our SPLITREADER pipeline, we evaluated the presence of TIPs detected in the M82 cultivar on the high-quality assembled genome sequence recently obtained using Nanopore long-reads available for this accession⁵⁹. Specifically, 1-kb sequence upstream and downstream of TIPs detected in M82 by our SPLITREADER pipeline were extracted from the Heinz 1706 reference genome (version SL2.5) and aligned using BLAT to the high-quality genome of M82. Consistent with the validation based on visual inspection, more than 70% of TIPs detected in M82 were also found in the reference M82 genome, with COPIA insertions showing the highest specificity (77%) (Supplementary Fig. 7). Furthermore, this estimated rate is similar to the one we obtained using the same pipeline to analyze numerous A. thaliana resequenced genome data and which we could validate experimentally using TE sequence capture^27,58. Using this last dataset, we estimated that the false-negative rate of our SPLITREADER approach is about 20% overall, the highest sensitivity being achieved for TIPs belonging to the COPIA, MuDR, and CACTA families⁵⁸. These rates of FP and FN are similar to those reported by others using multiple software developed to detect TIPs based on Illumina short reads⁶⁰.

SNP calling and phylogenetic analyses

Illumina resequencing data were aligned to the tomato genome reference v.2.50 using Bowtie2 v.2.3.2 with default parameters. The resulting alignment files were filtered to remove reads mapping to multiple locations using samtools with parameter -q 5, and to remove duplicated reads with Picard MarkDuplicates with default parameters (parameter REMOVE_ DUPLICATES = true). Finally, indels were realigned using GATK v4.1.8.0 RealignerTargetCreator and IndelRealigner successively with default parameters. Alignment files were used to call SNPs. For this, we ran GATK’s UnifiedGenotyper with default parameters in all 602 accessions simultaneously. We extracted SNPs at 8,760 positions genotyped in the SolCAP Infinium Chip SNP microarray as indicated in the tomato annotation (ITAG2.4_solCAP.gff3). We obtained a final matrix of 1,812 SNPs after removing ambiguous SNPs and SNPs in high-linkage disequilibrium using PLINK v1.90b6.9 with parameters—mind 0.1—geno 0.1—indep 50 5 3.5. A phylogenetic tree was estimated from the final matrix using the ape package in R v.3.4.4 and the neighbor-joining method including S. pennellii LA0716 as an outgroup. The resulting tree was plotted using the ggtree package v.1.4.11 in R v.3.4.4. Tomato accessions in the tree were classified manually taking into account previously described classifications and their positions in the tree relative to known classifications of species and type.

TIP-based population differentiation

A principal component analysis (PCA) using 6906 TIPs was performed using the prcomp function from the stats package v.3.2.3 in R v.3.4.4. The first two eigenvectors were retained to create a two-dimensional plot.

Genomic localization of TIPs and genes

A circos plot was constructed to represent the chromosomal distributions of genes and TEs, as well as the mappability of Illumina short reads. The number of genes and TEs annotated in the reference genome, as well as TIPs for the six superfamilies (GYPSY, COPIA, LINE, MuDR, hAT, and CACTA) were calculated in 500-kb windows using bedtools. To determine mappability, we aligned Heinz 1706 short-read resequencing data (SRA: SRR1572628) on the reference genome (version SL2.5). Mappability was defined as the fraction of uniquely mapped reads (MAPQ > = 10) in 10-kb windows. Gene ontology (GO) analyses were performed using AGRIGO v.1.2 [http://bioinfo.cau.edu.cn/agriGO/] and as input the Solyc ID of genes that contain a TE insertion within the limits of their annotation. The random expectation based on mappability bias was obtained by sampling a random set of 6906 uniquely mapped reads (MAPQ ≥ 10), and using as input for GO analysis the Solyc ID of genes that contain uniquely mapped reads within the limits of their annotation.

Impact of TIPs on gene expression

Raw RNA-Seq data of tomato fruit pericarp on orange stage were obtained from ref. ³⁵. Expression level per gene was calculated by mapping reads using STAR v2.5.3a63 on the tomato reference genome (version SL2.5) with the following arguments –outFilterMultimapNmax 50 –outFilterMatchNmin 30 –alignSJoverhangMin 3 –alignIntronMax 50000. Duplicated pairs were removed using picard MarkDuplicates. Counts over annotations (version ITAG2.4) were normalized using DESeq2⁶¹. To determine the transcriptomic impact of TIPs on nearby genes (located within 1 kb), the normalized transcript levels were compared between carriers and noncarrier accessions. Analysis was restricted to 1477 genes showing expression greater than 0 in at least one sample. Variation in full-length transcripts was calculated by comparing the ratio between the normalized number of reads that mapped downstream and upstream of a given TIP. This ratio was then compared between carriers and noncarrier accessions and binned by log2 fold changes (0.5–1.5], (1.5–2.5], [3.5–4.5], >4.5.

Linkage-disequilibrium analyses

For each TIP, we calculated the pairwise r² between the TIP and 300 SNPs located upstream and downstream, as well as the pairwise r² between all the 600 SNP–SNP polymorphic sites around the TIP using PLINK v2. We then contrasted the percentage of TIP–SNPs and SNP–SNP comparisons that are in high LD (r² > 0.4). Similar results were obtained when using r² > 0.2 as a threshold to define polymorphisms with high LD (Supplementary Fig. 8).

Genome-wide association studies

Phenotypic information for 17 important agronomic traits in tomato, including determinate or indeterminate growth, simple and compound inflorescences, leaf morphology, fruit color, shape, and taste for more than 150 accessions was retrieved by web data extraction. To this end, we performed a systematic web data scraping using Google search engine (googler, https://github.com/jarun/googler) followed by text pattern matching. We noted that the World Tomato Society webpage (https://worldtomatosociety.com/) compiles, in a consistent manner, phenotypic information for a large number of varieties commercialized by numerous seed banks. We thus focused our web data scraping on this webpage. Phenotypes were transformed either in boolean (1 or 0) or quantitative variables (Supplementary Data 2–17). For fruit color phenotype, accessions with red, purple–black, or pink fruits were considered as high lycopene-containing, and those with green, white, yellow, or orange fruits were considered as low lycopene-containing fruits and codified as 1 or 0, respectively. Metabolomic and volatile quantitative data from ripe fruits of 397 accessions were obtained from refs. ^25,35. SNPs and TIPs with MAF < 1% or more than 20% of missing data were excluded. SNP-GWAS was restricted to biallelic SNPs and LD-pruned using PLINK v1.90b6.9⁶² option-indep-pairwise 50 5 0.2. GWAS was performed using linear mixed models (LMM) encoded in the software EMMAX⁶³. SNP-based Kinship matrix was calculated (emmax-kin-intel64 -v -d 10) and included in the models as a random effect to control population structure and minimize false positives. Manhattan and qq plots for genome-wide association studies were performed using qqman package v.0.1.4⁶⁴. r² between the leading associated variant and all other associated variants in Fig. 3g was calculated using PLINK v1.90b6.9⁶² and represented by color code. Effect sizes of associations presented in Fig. 4c correspond to the beta values of the leading variant from each associated locus identified by the LMM. Given the lower sensitivity and specificity of TIP calling compared with SNPs, which could affect GWAS results, we inspected visually all associated TIPs, and we removed or corrected false-positive TIPs and negative insertion calls, respectively. Following these corrections, TIP-GWAS was performed again, and only associations with manually curated, high-confidence, TIPs were retained. In addition, the presence/absence of the two key TE insertions studied in detail (i.e., insertions within PSY1 and PPEAT) was assessed by PCR using genomic DNA extracted from 22 tomato accessions, and their status was confirmed in each case (Supplementary Fig. 6). Finally, we tested the robustness of our TIP-GWAS approach by randomizing 1000 times the set of carriers and non-carrier accessions, and running GWAS for the two traits with validated associations (i.e., fruit color and potato leaf)^28,41. In both cases, the number of permuted datasets with associations was below the family-wise significance threshold (p < 0.001 and p < 0.038, respectively; Supplementary Fig. 9).

Local assembly of r ^del allele

Visual inspection of RNA-seq coverage of accessions harboring the r^Del allele suggested a complex rearrangement. To assemble r^Del locus, WGS reads mapping concordantly or discordantly over PSY1 locus from accessions carrying the r^Del locus were extracted and locally assembled using SPAdes V3.13.1⁶⁵.

Haplotype analysis

SNPs within 10 kb of the PSY1 locus were retrieved for 602 accessions and used as input into fastPHASE⁶⁶ version 1.4.0. Default parameters were kept, except for the -Pzp option. For each SNP, haplotype membership with the highest likelihood was assigned.

Plant materials and growth conditions

Tomato seeds from S. lycopersicum cv. Heinz 1706 and TS-666 were grown in a growth chamber (Percival) using 20-l pots, 16/8-h photoperiod, 24 ± 3 °C, 60% humidity, and 200 ± 100 mmol m⁻² s⁻¹ incident irradiance. Pericarp for at least four ripe fruits from two plants was harvested 60 days after anthesis, immediately frozen in liquid N₂, and kept at −80 °C until use.

Full-length cDNA nanopore sequencing

Total RNA was extracted from 100 mg of ripe fruits using the Nucleo-spin RNA Plant mini kit (Macherey-Nagel). Library preparation and Nanopore sequencing were performed at the Ecole normale superieure genomic core facility (Paris, France). After checking RNA quality by Fragment Analyzer, 10 ng of total RNA was amplified and converted into cDNA using SMART-Seq v4 Ultra Low Input RNA kit (Clontech). About 17 fmol of cDNA was used for library preparation using the PCR Barcoding kit (SQK-PBK004 kit, ONT) and cleaned up with 0.6× Agencourt Ampure XP beads. About 2 fmol of the purified product was amplified during 18 cycles, with a 17-min elongation step, to introduce barcodes. Samples were multiplexed in equimolar quantities to obtain 20 fmol of cDNA, and the rapid adapter ligation step was performed. Multiplexed library was loaded on an R9.4.1 flowcell (ONT) according to the manufacturer’s instructions. A standard 72-h sequencing was performed on a MinION MkIB instrument. MinKNOW software (version 19.12.5) was used for sequence calling. Long-reads were mapped on the tomato reference genome (SL2.5) using minimap2⁶⁷ V2.11-r797 and visualized with IGV.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Data supporting the findings of this work are available within the paper and its Supplementary Information files. A reporting summary for this article is available as a Supplementary Information file. All datasets generated and analyzed during the current study are available from the corresponding authors upon request. Long-read nanopore sequencing data have been deposited in the European Nucleotide Archive (ENA) under project PRJEB37834. Short-read sequencing data of tomato genomes reanalyzed in this study have been obtained from ENA under projects PRJNA259308, PRJEB5235, and PRJNA353161. The tomato reference genome (Solanum lycopersicum cv. Heinz, release SL2.5) used in this study was obtained from SOL genomics [ftp://ftp.solgenomics.net/tomato_genome]. The source data underlying Figs. 1a, f, 2c, d, 3d, e, j, 4a, b, d, 5d, e, h, i as well as Supplementary Fig. 3a are provided as a Source Data file. Source data are provided with this paper.

Code availability

Codes used to detect TE insertions are available at GitHub [https://github.com/baduelp/public]. Source data are provided with this paper.

Change history

21 May 2021
A Correction to this paper has been published: https://doi.org/10.1038/s41467-021-23578-y

References

Blanca, J. et al. Genomic variation in tomato, from wild ancestors to contemporary breeding accessions. BMC Genomics 16, 257 (2015).
PubMed PubMed Central Google Scholar
Lin, T. et al. Genomic analyses provide insights into the history of tomato breeding. Nat. Genet. 46, 1220–1226 (2014).
CAS PubMed Google Scholar
Gao, L. et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat. Genet. 51, 1044–1051 (2019).
CAS PubMed Google Scholar
Huang, X. & Han, B. Natural variations and genome-wide association studies in crop plants. Annu. Rev. Plant Biol. 65, 531–551 (2014).
CAS PubMed Google Scholar
Eathington, S. R., Crosbie, T. M., Edwards, M. D., Reiter, R. S. & Bull, J. K. Molecular markers in a commercial breeding program. Crop Sci. 47, S–154 (2007).
Google Scholar
Meyer, R. S. & Purugganan, M. D. Evolution of crop species: genetics of domestication and diversification. Nat. Rev. Genet. 14, 840–852 (2013).
CAS PubMed Google Scholar
Song, J.-M. et al. Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napus. Nat. Plants 6, 34–45 (2020).
CAS PubMed PubMed Central Google Scholar
Fuentes, R. R. et al. Structural variants in 3000 rice genomes. Genome Res. 29, 870–880 (2019).
CAS PubMed PubMed Central Google Scholar
Akakpo, R., Carpentier, M.-C., Ie Hsing, Y. & Panaud, O. The impact of transposable elements on the structure, evolution and function of the rice genome. N. Phytol. 226, 44–49 (2020).
Google Scholar
Zhou, Y. et al. The population genetics of structural variants in grapevine domestication. Nat. Plants 5, 965–979 (2019).
PubMed Google Scholar
Yang, N. et al. Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement. Nat. Genet. 51, 1052–1059 (2019).
CAS PubMed Google Scholar
Varshney, R. K., Nayak, S. N., May, G. D. & Jackson, S. A. Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol. 27, 522–530 (2009).
CAS PubMed Google Scholar
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
ADS CAS PubMed PubMed Central Google Scholar
Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).
CAS PubMed PubMed Central Google Scholar
Lisch, D. How important are transposons for plant evolution? Nat. Rev. Genet. 14, 49–61 (2013).
CAS PubMed Google Scholar
Sultana, T., Zamborlini, A., Cristofari, G. & Lesage, P. Integration site selection by retroviruses and transposable elements in eukaryotes. Nat. Rev. Genet. 18, 292–308 (2017).
CAS PubMed Google Scholar
Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory activities of transposable elements: from conflicts to benefits. Nat. Rev. Genet. 18, 71–86 (2017).
CAS PubMed Google Scholar
Studer, A., Zhao, Q., Ross-Ibarra, J. & Doebley, J. Identification of a functional transposon insertion in the maize domestication gene tb1. Nat. Genet. 43, 1160–1163 (2011).
CAS PubMed PubMed Central Google Scholar
Bhattacharyya, M. K., Smith, A. M., Ellis, T. H., Hedley, C. & Martin, C. The wrinkled-seed character of pea described by Mendel is caused by a transposon-like insertion in a gene encoding starch-branching enzyme. Cell 60, 115–122 (1990).
CAS PubMed Google Scholar
Kawase, M., Fukunaga, K. & Kato, K. Diverse origins of waxy foxtail millet crops in East and Southeast Asia mediated by multiple transposable element insertions. Mol. Genet. Genomics 274, 131–140 (2005).
CAS PubMed Google Scholar
Soyk, S. et al. Bypassing negative epistasis on yield in tomato imposed by a domestication gene. Cell 169, 1142–1155 (2017).
CAS PubMed Google Scholar
Xiao, H., Jiang, N., Schaffner, E., Stockinger, E. J. & van der Knaap, E. A retrotransposon-mediated gene duplication underlies morphological variation of tomato fruit. Science 319, 1527–1530 (2008).
ADS CAS PubMed Google Scholar
Jouffroy, O., Saha, S., Mueller, L., Quesneville, H. & Maumus, F. Comprehensive repeatome annotation reveals strong potential impact of repetitive elements on tomato ripening. BMC Genomics 17, 624 (2016).
PubMed PubMed Central Google Scholar
The Tomato Genome Consortium. The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485, 635–641 (2012).
ADS Google Scholar
Tieman, D. et al. A chemical genetic roadmap to improved tomato flavor. Science 355, 391–394 (2017).
ADS CAS PubMed Google Scholar
Aflitos, S. et al. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing. Plant J. 80, 136–148 (2014).
PubMed Google Scholar
Quadrana, L. et al. The Arabidopsis thaliana mobilome and its impact at the species level. Elife 5, e15716 (2016).
PubMed PubMed Central Google Scholar
Busch, B. L. et al. Shoot branching and leaf dissection in tomato are regulated by homologous gene modules. Plant Cell 23, 3595–3609 (2011).
CAS PubMed PubMed Central Google Scholar
Roldan, M. V. G. et al. Natural and induced loss of function mutations in SlMBP21 MADS-box gene led to jointless-2 phenotype in tomato. Sci. Rep. 7, 4402 (2017).
ADS PubMed PubMed Central Google Scholar
Cavalier-Smith, T. How selfish is DNA? Nature 285, 617–618 (1980).
ADS CAS PubMed Google Scholar
Quadrana, L. et al. Transposition favors the generation of large effect mutations that may facilitate rapid adaption. Nat. Commun. 10, 3421 (2019).
ADS PubMed PubMed Central Google Scholar
Cridland, J. M., Thornton, K. R. & Long, A. D. Gene expression variation in Drosophila melanogaster due to rare transposable element insertion alleles of large effect. Genetics 199, 85–93 (2015).
CAS PubMed Google Scholar
Stuart, T. et al. Population scale mapping of transposable element diversity reveals links to gene regulation and epigenomic variation. eLife 5, e20777 (2016).
PubMed PubMed Central Google Scholar
Uzunović, J., Josephs, E. B., Stinchcombe, J. R. & Wright, S. I. Transposable elements are important contributors to standing variation in gene expression in Capsella grandiflora. Mol. Biol. Evol. 36, 1734–1745 (2019).
PubMed Google Scholar
Zhu, G. et al. Rewiring of the fruit metabolome in tomato breeding. Cell 172, 249–261 (2018).
CAS PubMed Google Scholar
Wang, Z. et al. A novel DCL2-dependent miRNA pathway in tomato affects susceptibility to RNA viruses. Genes Dev. 32, 1155–1160 (2018).
CAS PubMed PubMed Central Google Scholar
Zhang, C. et al. Fine mapping of the Ph-3 gene conferring resistance to late blight (Phytophthora infestans) in tomato. Theor. Appl. Genet. 126, 2643–2653 (2013).
CAS PubMed Google Scholar
Andolfo, G. et al. Defining the full tomato NB-LRR resistance gene repertoire using genomic and cDNA RenSeq. BMC Plant Biol. 14, 120 (2014).
PubMed PubMed Central Google Scholar
Saladié, M., Rose, J. K. C., Cosgrove, D. J. & Catalá, C. Characterization of a new xyloglucan endotransglucosylase/hydrolase (XTH) from ripening tomato fruit and implications for the diverse modes of enzymic action. Plant J. 47, 282–295 (2006).
PubMed Google Scholar
Seymour, G. B., Chapman, N. H., Chew, B. L. & Rose, J. K. C. Regulation of ripening and opportunities for control in tomato and other fruits. Plant Biotechnol. J. 11, 269–278 (2013).
CAS PubMed Google Scholar
Fray, R. G. & Grierson, D. Identification and genetic analysis of normal and mutant phytoene synthase genes of tomato by sequencing, complementation and co-suppression. Plant Mol. Biol. 22, 589–602 (1993).
CAS PubMed Google Scholar
Jenkins, J. A. The origin of the cultivated tomato. Economic Bot. 2, 379–392 (1948).
Google Scholar
Baldwin, I. T. Plant volatiles. Curr. Biol. 20, 392–397 (2010).
Google Scholar
Tadmor, Y. et al. Identification of malodorous, a wild species allele affecting tomato aroma that was selected against during domestication. J. Agric. Food Chem. 50, 2005–2009 (2002).
CAS PubMed Google Scholar
Kim, S.-J. et al. Allyl/propenyl phenol synthases from the creosote bush and engineering production of specialty/commodity chemicals, eugenol/isoeugenol, in Escherichia coli. Arch. Biochem. Biophys. 541, 37–46 (2014).
CAS PubMed Google Scholar
Eshed, Y. & Zamir, D. An introgression line population of Lycopersicon pennellii in the cultivated tomato enables the identification and fine mapping of yield-associated QTL. Genetics 141, 1147–1162 (1995).
CAS PubMed PubMed Central Google Scholar
Bolger, A. et al. The genome of the stress-tolerant wild tomato species Solanum pennellii. Nat. Genet. 46, 1034–1037 (2014).
CAS PubMed PubMed Central Google Scholar
Tieman, D. M. et al. Identification of loci affecting flavour volatile emissions in tomato fruits. J. Exp. Bot. 57, 887–896 (2006).
CAS PubMed Google Scholar
Razifard, H. et al. Genomic evidence for complex domestication history of the cultivated tomato in Latin America. Mol. Biol. Evol. 37, 1118–1132 (2020).
CAS PubMed PubMed Central Google Scholar
Carpentier, M.-C. et al. Retrotranspositional landscape of Asian rice revealed by 3000 genomes. Nat. Commun. 10, 24 (2019).
ADS CAS PubMed PubMed Central Google Scholar
Baduel, P., Quadrana, L., Hunter, B., Bomblies, K. & Colot, V. Relaxed purifying selection in autopolyploids drives transposable element over-accumulation which provides variants for local adaptation. Nat. Commun. 10, 5818 (2019).
ADS CAS PubMed PubMed Central Google Scholar
Eichten, S. R., Stuart, T., Srivastava, A., Lister, R. & Borevitz, J. O. DNA methylation profiles of diverse Brachypodium distachyon align with underlying genetic diversity. Genome Res. 26, 1520–1531 (2016).
CAS PubMed PubMed Central Google Scholar
Macko-Podgórni, A., Stelmach, K., Kwolek, K. & Grzebelus, D. Stowaway miniature inverted repeat transposable elements are important agents driving recent genomic diversity in wild and cultivated carrot. Mob. DNA 1, 1–17 (2019).
Google Scholar
Rogivue, A. et al. Genome-wide variation in nucleotides and retrotransposons in alpine populations of Arabis alpina (Brassicaceae). Mol. Ecol. Resour. 19, 773–787 (2019).
CAS PubMed Google Scholar
Ågren, J. A., Huang, H.-R. & Wright, S. I. Transposable element evolution in the allotetraploid Capsella bursa-pastoris. Am. J. Bot. 103, 1197–1202 (2016).
PubMed Google Scholar
Alonge, M. et al. Major Impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182, 145–161 (2020).
CAS PubMed PubMed Central Google Scholar
Michael, T. P. et al. High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell. Nat. Commun. 9, 541 (2017).
ADS Google Scholar
Baduel, P., Quadrana, L. & Colot, V. Efficient detection of transposable element insertion polymorphisms between genomes using short-read sequencing data. Preprint at https://doi.org/10.1101/2020.06.09.142331v1 (2020).
Alonge, M. et al. RaGOO: fast and accurate reference-guided scaffolding of draft genomes. Genome Biol. 20, 224 (2019).
PubMed PubMed Central Google Scholar
Vendrell-Mir, P. et al. A benchmark of transposon insertion detection tools using real data. Mob. DNA 10, 53 (2019).
CAS PubMed PubMed Central Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
PubMed PubMed Central Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
CAS PubMed PubMed Central Google Scholar
Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).
CAS PubMed PubMed Central Google Scholar
Turner, S. D. qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots. Preprint at https://doi.org/10.1101/005165 (2014).
Nurk, S., Bankevich, A. & Antipov, D. Assembling genomes and mini-metagenomes from highly chimeric reads. Res. Comput. Mol. Biol. 10, 158–170 (2013).
Google Scholar
Scheet, P. & Stephens, M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644 (2006).
CAS PubMed PubMed Central Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank members of the Colot group and especially Pierre Baduel for discussions and critical reading of the paper. We thank Zachary Lippman for sharing his high-quality genome assembly of M82 before publication. We also thank the World Tomato Society for making available phenotypic information of tomato cultivars. Support was from the Agence National de la Recherche (ANR-17-tomaTE to V.C. and J.J.G.), the Centre National de la Recherche Scientifique (MOMENTUM program to L.Q.), and France Génomique national infrastructure, funded as part of the Investissements d’Avenir program managed by the Agence Nationale de la Recherche (ANR-10-INBS-09).

Author information

Authors and Affiliations

Institut de Biologie de l’Ecole Normale Supérieure (IBENS), Centre National de la Recherche Scientifique (CNRS), Institut National de la Santé et de la Recherche Médicale (INSERM), Ecole Normale Supérieure, PSL Research University, 75005, Paris, France
Marisol Domínguez, Elise Dugas, Basile Leduque, Vincent Colot & Leandro Quadrana
Genomic facility, Institut de Biologie de l’Ecole Normale Supérieure (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, 75005, Paris, France
Médine Benchouaia
Institut Jean-Pierre Bourgin, INRAE, AgroParisTech, Université Paris-Saclay, 78000, Versailles, France
José M Jiménez-Gómez

Authors

Marisol Domínguez
View author publications
You can also search for this author in PubMed Google Scholar
Elise Dugas
View author publications
You can also search for this author in PubMed Google Scholar
Médine Benchouaia
View author publications
You can also search for this author in PubMed Google Scholar
Basile Leduque
View author publications
You can also search for this author in PubMed Google Scholar
José M Jiménez-Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Colot
View author publications
You can also search for this author in PubMed Google Scholar
Leandro Quadrana
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.J.-G., V.C., and L.Q. conceived the project. E.D. and L.Q. performed the detection of TIPs. J.J.-G. performed the SNP calling. M.B., B.L., and L.Q. performed the ONT experiments. M.D. and L.Q. analyzed the data. V.C. and L.Q. wrote the paper with additional input from M.D. All the authors read and approved the paper.

Corresponding authors

Correspondence to Vincent Colot or Leandro Quadrana.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemnetary Information

Peer Review File

Reporting Summary

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Supplementary Data 7

Supplementary Data 8

Supplementary Data 9

Supplementary Data 10

Supplementary Data 11

Supplementary Data 12

Supplementary Data 13

Supplementary Data 14

Supplementary Data 15

Supplementary Data 16

Supplementary Data 17

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Domínguez, M., Dugas, E., Benchouaia, M. et al. The impact of transposable elements on tomato diversity. Nat Commun 11, 4058 (2020). https://doi.org/10.1038/s41467-020-17874-2

Download citation

Received: 29 February 2020
Accepted: 23 July 2020
Published: 13 August 2020
DOI: https://doi.org/10.1038/s41467-020-17874-2

This article is cited by

Pangenome analysis reveals transposon-driven genome evolution in cotton
- Xin He
- Zhengyang Qi
- Maojun Wang
BMC Biology (2024)
Whole-genome sequencing and comparative genomics reveal candidate genes associated with quality traits in Dioscorea alata
- Ana Paula Zotta Mota
- Komivi Dossa
- Hana Chaïr
BMC Genomics (2024)
Taming transposable elements in livestock and poultry: a review of their roles and applications
- Pengju Zhao
- Chen Peng
- George E. Liu
Genetics Selection Evolution (2023)
A pangenome analysis pipeline provides insights into functional gene identification in rice
- Jian Wang
- Wu Yang
- Junliang Zhao
Genome Biology (2023)
Selection signatures and population dynamics of transposable elements in lima bean
- Daniela Lozano-Arce
- Tatiana García
- Jorge Duitama
Communications Biology (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.