Oxford Nanopore and Bionano Genomics technologies evaluation for plant structural variation detection

Structural Variations (SVs) are genomic rearrangements derived from duplication, deletion, insertion, inversion, and translocation events. In the past, SVs detection was limited to cytological approaches, then to Next-Generation Sequencing (NGS) short reads and partitioned assemblies. Nowadays, technologies such as DNA long read sequencing and optical mapping have revolutionized the understanding of SVs in genomes, due to the enhancement of the power of SVs detection. This study aims to investigate performance of two techniques, 1) long-read sequencing obtained with the MinION device (Oxford Nanopore Technologies) and 2) optical mapping obtained with Saphyr device (Bionano Genomics) to detect and characterize SVs in the genomes of the two ecotypes of Arabidopsis thaliana, Columbia-0 (Col-0) and Landsberg erecta 1 (Ler-1). We described the SVs detected from the alignment of the best ONT assembly and DLE-1 optical maps of A. thaliana Ler-1 against the public reference genome Col-0 TAIR10.1. After filtering (SV > 1 kb), 1184 and 591 Ler-1 SVs were retained from ONT and Bionano technologies respectively. A total of 948 Ler-1 ONT SVs (80.1%) corresponded to 563 Bionano SVs (95.3%) leading to 563 common locations. The specific locations were scrutinized to assess improvement in SV detection by either technology. The ONT SVs were mostly detected near TE and gene features, and resistance genes seemed particularly impacted. Structural variations linked to ONT sequencing error were removed and false positives limited, with high quality Bionano SVs being conserved. When compared with the Col-0 TAIR10.1 reference genome, most of the detected SVs discovered by both technologies were found in the same locations. ONT assembly sequence leads to more specific SVs than Bionano one, the latter being more efficient to characterize large SVs. Even if both technologies are complementary approaches, ONT data appears to be more adapted to large scale populations studies, while Bionano performs better in improving assembly and describing specificity of a genome compared to a reference.

number variations (CNV) including insertions/deletions (Indels) and presence/absence variations (PAV), and balanced events like inversions and translocations [1][2][3][4]. Several mechanisms explain the SVs formation, such as recombination errors generated by non-homologous end-joining and non-allelic homologous recombination, genome duplication and transposition [1,2]. The structural variations in humans were largely studied and Ho et al. reviewed the impact of the SVs in human diseases [4]. In plants, it has been shown that the SVs play a key role in the evolution of genomes and are responsible for phenotypic variations by affecting Transposable Elements (TEs) and genes [3,[5][6][7][8]. In particular, SVs were found in stress related and resistance genes [9][10][11][12][13], leading to local adaptation [14,15], or linked to other traits of agronomical interest such as tomato fruit flavor, rice grain size or poplar wood formation [16][17][18].
Nowadays, the identification of SVs contributes to the construction of the pangenome reference sequence or super pangenome [19,20]. This new approach to build a reference will better reflect the genetic diversity of the species, and at the same time expand the understanding of genome evolution, as well as enhance the knowledge on adaptive traits [21][22][23][24][25].
The development of new sequencing technologies has boosted studies of SVs present in a genome, which were detected until recently only by Comparative Genomic Hybridation (CGH) arrays or single nucleotide polymorphism (SNP) [26][27][28][29]. The 3rd generation sequencing offers new opportunities to identify SVs at a larger scale with two approaches. One approach is based on linked short reads, as in 10x Genomics and Hi-C approaches [30], and the second by generating long reads, as proposed by Pacific Biosciences [31] and Oxford Nanopore Technologies (ONT) [32,33]. These approaches provide access to complex regions, increasing their uses to improve genome assemblies and to detect structural variations in human [4,[34][35][36][37], in Arabidopsis thaliana ecotypes [24,38,39] and T-DNA insertion lines [40,41] and in other plants [42][43][44]. In parallel, a technology based on physical map and developed by Bionano Genomics [45], generates information on very large DNA molecules. These maps, named optical maps, are frequently generated to improve and validate sequencing assembly, to detect SVs in animals genomes [36,[46][47][48][49] and more recently in plants [7,42,43,50]. These 3rd generation technologies with combination possibilities made possible the identification of genetic rearrangements between individuals at intra specific levels [50,51].
Comparisons between sequencing technologies or SV detection software are no longer uncharted territory [24,36,38,52]. However, the comparison of ONT and Bionano was only performed in animals (Chimpanzee [49] and Drosophila [53]), but not yet in plants. Here, we investigated the genomes of two most studies ecotypes of A. thaliana (Col-0 and Ler-1) obtained by both ONT and Bionano optical maps to compare the advantages of these two fundamentally different technologies, sequencing-based and physical map, to provide information on detection and characterization of SVs in plants.
Cleaned Evry.Ler-1 ONT reads were aligned against the Ler reference genome with Minimap2 to estimate ONT data completeness [38,55]. In total, 98.9% of the Ler reference genome was covered by the ONT Evry. Ler-1 reads. The cleaned Evry.Ler-1 reads were also mapped against the Col-0 TAIR10.1 reference genome achieving 95.2% of total genome coverage (Additional file 1: Table S3) [56]. Samtools depth tool [57] was then used on the Evry.Ler-1 ONT reads mapping against the Col-0 TAIR10.1 reference genome to estimate the coverage at each position. The average coverage of 100 kb windows was 46.9X, with depth fluctuations in centromeric regions (Fig. 1).
To select the assembler that could produce a better output for our data, de novo assemblies for Evry.Col-0 and Evry.Ler-1 were performed with Canu [54], RA [59] and SMARTdenovo (SDN, [60]). Based on general statistics (assembly size, contig number, N50 size), SMARTdenovo software generated better assemblies for both ecotypes compared to Canu or RA. (Additional file 1:  Tables S4 and S5). Indeed, the SDN assemblies resulted in 79 contigs for Evry.Col-0 (cumulative size =117 Mb, N50 = 12.5 Mb with L50 = 5 contigs) and 101 contigs for Evry.Ler-1 (cumulative sizes = 117 Mb, N50 = 10.7 Mb with L50 = 5 contigs). Assemblies using RA were more fragmented and chimeric contigs were identified with Canu assembler after MUMmer alignments on the reference chromosomes (Additional file 2: Figs. S1A-C and S2A-C). For all assemblers tested, centromeric regions were covered by many small contigs. These results were also supported by the alignments of the Evry.Col-0 and Evry.Ler-1 assemblies on the respective reference chromosomes Col-0 TAIR10.1 and Ler. The SDN were used to perform the subsequent SV analyses.

Optical maps generation
Genomic DNA was labeled using staining protocol with DLE-1 enzyme according to the manufacturer's protocol. One run per ecotype on the Saphyr device was performed resulting to 577.5 Gb and 610.9 Gb of molecules for Evry. Col-0 and Evry.Ler-1 respectively. Molecules larger than 150 kb were selected leading to about 600-fold final coverage based on the theoretical 130 Mb Arabidopsis   Tables S8 and S9).
The average label density of the Evry.Ler-1 optical maps was estimated at 18.47 per 100 kb (Additional file 1: Table S7). However, the DLE-1 density decreases in the centromeric regions due to molecule depth diminution and optical map breaks (Fig. 1, Additional file 2: Fig.  S3A-E).

Structural variations detection
Structural variations detections were performed independently using the ONT and Bionano technologies data and were carried out in two ways: 1) Evry.Ler-1 versus Col-0 TAIR10.1 reference genome and 2) Evry.Col-0 versus Ler reference genome. The different types of structural variations detected in our study are described in Additional file 2: Fig. S4. We observed that general SVs characteristics (number, types and location) are similar in both ways, then only SV detection results from the Evry.Ler-1 assembly and optical maps against the Col-0 TAIR10.1 reference genome will be presented in detail. Description of SVs detected by comparing the SDN assembly and optical maps Evry.Col-0 with Ler reference genome are provided in Additional file 1: Tables S10-S14 and Additional file 2: Fig. S5A-E.
The sequence comparison of Evry.Ler-1 assembly to Col-0 TAIR10.1 reference genome using MUMmer showdiff utility [61] revealed 2186 potential SVs. A total of 119 SVs, called reference sequence junction (SEQ), break (BRK) and jump (JMP), found in centromeric, telomeric and nearby rDNA clusters, were considered to correspond to unresolved assembly regions into Evry.Ler-1 assembly compared to Col-0 TAIR10.1 reference genome and were filtered out (Additional file 1: Table S15).
The estimation of the ONT error sequencing rate was 4.0 and 4.9% for the Evry.Col-0 and Evry.Ler-1 of the trimmed corrected sequences respectively. Even if these error sequencing rates are inferior than previously described [62], to avoid false positive SV detection and to be comparable to Bionano technology, a filter on query ONT structural variations size (> 1 kb, SV detection size limit for high quality Bionano technology) was applied. On the 1184 SVs > 1 kb (54.2%), 591 insertions (INS), 581 deletions (DEL), 12 inversions (INV) were detected but no duplication (Table 1 and Fig. 2A).
A 5 Mb insertion in the Evry.Ler-1 assembly was detected on Chr3 Col-0 TAIR10.1 reference genome (14,272,986..14284724) due to a detection error of MUMmer in a complex region associated with a rDNA cluster. Thereby, this insertion was removed from the final data and not considered in the result. The Evry.Ler-1 ONT median size of the structural variations was 3455 bp and the cumulated size of 7.7 Mb. The SVs were equally distributed in size and number between INS and DEL. The INV categories had higher median and average sizes than INS and DEL. With a cumulated size of 0.3 Mb, INV represented 3.9% of the ONT variation size (Table 1). Structural variations were detected on all chromosomes, with a preferential location on chromosome arms and with no confident SV on the Chr1, 3 and 4 centromeres (Fig. 1).
Optical maps construction and SVs detection based on physical maps comparison was carried out on the Bionano Solve ™ interface (Bionano Genomics, version 3.3). A total of 797 SVs were highlighted by comparing Evry.Ler-1 optical maps to in silico Col-0 TAIR10.1 reference genome labeling with DLE-1(Additional file 1: Table S15). When Bionano Solve tools detected one SV embedded in a second one, the largest SV was kept. This case was found on two Chr1 independent locations (INS:19432310..19468513 and DEL:24688666..24736849). A 1 kb size filter was applied on the Bionano SVs, which was equivalent to remove deletions and insertions with a Bionano quality score < 10 (defined as poor quality by the manufacturer) (Additional file 1: Table S16). Additionally, on Chr2, the INV SV (3,433,371..3490731) with no quality score was discarded. Thereby, 591 SVs representing 74.2% of total optical map Evry.Ler-1 SVs were further considered in this analysis. INS and DEL types constituted the main part of the optical map Evry.Ler-1 SVs (48.9 and 49.9% of the SVs respectively), the remaining  Fig. S3D). SVs were distributed preferentially along the chromosome arms and their detection was limited in centromeric regions due to decrease in labeling in these regions (Fig. 1).

SVs comparison
SVs comparison was based on their absolute startand end-positions on the Col-0 TAIR10.1 reference genome. We considered that structural variations locations were comparable in both technologies when their locations on Col-0 TAIR10.1 reference genome overlapped by at least 1 bp.
SVs comparison metrics are presented in Table 2 and the numbers of overlapping locations in Fig. 2B. A total of 563 common locations were identified representing 948 (80.1%) of Evry.Ler-1 ONT SVs and 563 (95.3%) of optical map Evry.Ler-1 SVs. The cumulated sizes of these common SVs were respectively 5.9 Mb and 6.9 Mb for ONT and Bionano detection representing 5.3% of the size of the Col-0 TAIR10.1 reference genome (based on 130 Mb) for ONT and 4.5% for Bionano. ONT SVs tended to be smaller than Bionano SVs ( Table 2, Additional file 1 Tables S17 and S18).
To compare the median sizes of the ONT and Bionano variations (> 1 kb), we made notched boxplots including or not the large events (> 50 kb) (Fig. 4). Using the oriented Wilcoxon rank-sum test as it was performed by Dixon at al. (2018), p-values of the tests are all less than the significance level alpha = 0.05 therefore the median sizes of SV ONT are significantly smaller than the median sizes of SV Bionano. In addition, the sizes of the medians of all insertions and those of deletions detected using the Bionano technology were respectively 30.5 and 24.6% larger than with ONT. This last point is related to the fact that we applied a filter for ONT SVs (> 1 kb), thus increasing the median sizes for all categories.
To go further, SVs identified by ONT and Bionano technologies were assigned to a two letters svID code. The first letter used for ONT SVs and the second for Bionano SVs, leading to common (svID UU and MU) and specific (svID UN and NU) locations (with "U" for "Unique location", "M" for "Multiple locations" and "N" for "No location", Additional file 1: Tables S17 and S18).
These structural variations had a svID MU ranging to 2 (representing 59.5%) to 22 ONT SVs for one Bionano SV. The cumulative size of this SVs category was approximately 4 Mb for both technologies although the number of ONT variants is 3.5 times higher than in Bionano (538 vs 153). The size distribution of these SVs started from 1 kb (due to the filter applied) to 87 kb and 1.1 Mb respectively for ONT and Bionano. Furthermore, Bionano median and average sizes were 2 and 4 fold larger respectively. Unlike the svID UU, the type of the svID MU was "conforming" for only 68 (44.5%) locations of which 58 (85.3%) corresponded to 2 ONT SVs for 1 Bionano SV. The remaining 10 (14.7%) locations comprised 3 or 4 ONT SVs for one Bionano SV.
The largest ONT SV was included in a complex SV (svID MU_102) consisting of four contiguous deletions located on Chr4. These four deletions coincided with one Evry. Ler1 optical map deletion (Fig. 3C, Additional file 1: Tables S17 and S18). The largest Evry.Ler1 optical map SV (svID MU_097) was an inversion on Chr4 of 1,143,224 Mb overlapping 22 Evry.Ler-1 ONT SVs (corresponding to INS and DEL) (Fig. 3B, Additional file 1: Tables S17 and S18). To   Tables S17 and S18). Specific locations were more abundant with the ONT technology (236 SVs -svID UN, SV detected with ONT only -19.9%) than with Bionano (28 SVs -svID NU, SV detected with Bionano only -4.7%) leading to a cumulated size of 1.8 Mb and 0.3 Mb respectively, and with a median size twice larger (2656 bp for Evry.Ler-1 ONT SVs vs 1374 bp for Bionano Evry.Ler-1 optical map SVs). The distribution of the specific Evry.Ler-1 ONT SVs onto the Col-0 TAIR10.1 reference chromosomes led to a clear trend to locate on NOR and centromeres (Fig. 1). The largest specific ONT variant was located on Chr3 and corresponded to a DEL (svID UN_124, Additional file 1: Table S17). The largest specific Bionano SV was spotted on the Chr3 and corresponded to an INV type (svID NU_017, Additional file 1:  Table 4). Focusing on Evry.Ler-1 ONT specific SVs, their overlap with the Col-0 TAIR10.1 reference annotation showed a similar percentage compared to the common SVs.
To better characterize the genes affected by ONT SVs in common locations, a GO-terms overrepresentation test was performed with the PANTHER's tool [63] available on TAIR website (https:// www. arabi dopsis. org/ tools/ go_ term_ enric hment. jsp). Among the 1764 genes identified in common locations, 47.2% (832) genes were uniquely assigned to a GO term and used in PANTHER (Additional file 1: Tables S19 and S20). Overrepresentations in defense response and ADP-binding terms were detected (Additional file 1: Table S21), but no enrichment for GO-terms in genes in specific ONT locations was highlighted (Additional file 1: Tables S22-S24).
A summary of the main comparison criteria between the two technologies is presented in Table 5. It appears that the ONT and Bionano technologies (with DLS labeling) were equally effective in detecting SVs of less than 50 kb and those in gene regions. In our study, Bionano was more efficient on large events while additional analyzes for the detection of these variations with ONT are necessary.

Discussion
Herein, we compare the performance of Oxford Nanopore and Bionano Genomics technologies for structural variation detection. For this, we performed long read sequencing and optical mapping of two A. thaliana ecotypes, namely Columbia-0 (Col-0) and Landsberg erecta 1 (Ler-1). Long read de novo assemblies were constructed using three different assemblers and optical maps were assembled with Bionano Solve tools.  Structural variations detected using the Col-0 TAIR10.1 [56] and Ler [38] genomic sequences as references, were described and compared to each other, to reveal the relative strengths of the two technologies in detecting SVs.

Assemblies based on ONT and Bionano data for SV analyses
To obtain the best assembly based on only long reads data we used three different assemblers. After comparison of assembly metrics, calculation time and collinearity against reference genomes, SDN provided the best assembly even if some collinearity breaks were observed, especially in centromeric regions. The metrics of Evry. Col-0 and Evry.Ler-1 SDN assemblies were comparable to such assemblies in previous studies [24,38,39,64] but remained underestimated. Continuous improvement in protocols and new developments in genome assembly strategies and algorithms resulted in a higher quality of genomic sequences used in subsequent analyses. Previously published Bionano A. thaliana optical map (KBS-Mac-74 genome [39]) used a BspQI staining protocol for labeling, generating about 10 times more maps to cover the entire genome of the KBS-Mac-74 ecotype than in our study (DLE-1 Bionano staining protocol), highlighting enhancement in Bionano's protocol. In addition, no optical map was previously available for the Columbia (Col-0) and Landsberg erecta 1 (Ler-1), making our map assemblies especially valuable for further studies.
Our high quality optical map allowed us to define centromeric and nucleolar organizer regions (NOR), despite lower molecules density and even if label concordance loss were observed between Evry.Ler-1 maps compared to the Col-0 TAIR10.1 in silico reference maps. Moreover, fluctuations in ONT coverage density and accumulation of repetitive alignments in the same regions are reinforcing evidence of the approximate locations of the centromeres and NOR. However, we identified several misassemblies in the course of our SVs analyses between the Evry.Ler-1 SDN assembly and Col-0 TAIR10.1 reference genome, highlighting how difficult it can be to get a reliable assembly, and thus detecting SVs, in these complex regions.

SV detection and comparison between the two technologies
We compared structural variations in Evry.Ler-1 and the reference genome Col-0 TAIR10.1. We chose this reference because of its high quality and the richness of the associated studies [24,38,39].
The cumulated SVs sizes obtained for ONT and Bionano in our study are smaller than in previous studies [24,38]. Filtering on SVs size (SVs > 1 kb) vs no size filter could explain this difference. In addition, the lack of duplications detection in ONT assembly could depend on MUMmer's ability to detect this type of SV, reflecting the detection complexity of the duplication events, as mentioned in Goel et al (2019). In contrast, the absence of duplication detected by Bionano could be explained by polymorphic duplications between Evry.Ler-1 maps and Col-0 TAIR10.1 reference genome, which would break the collinearity, as described in Jiao and Schneeberger (2020), and by the size of duplications (< 5 kb, [24,64]) identified as the limit of Bionano detection.
Analyzes by the two technologies revealed a predominance of insertion, deletion and inversion with larger median and average sizes for Bionano SVs. The distribution of these types of SV is homogeneous along the chromosome arms. Most of the specific ONT SVs are located in the centromeric and pericentromeric regions. Nonetheless, a decreased coverage of the SVs in these regions was observed, in opposition previous report by Pucker et al. (2019), it can be assumed that this is probably due to technical problems such as assembly errors (for ONT SMARTdenovo) [65]. This diminution in SV coverage is also observed with Bionano technology, showing a lower density labeling in these complex regions. This contrasts previous results identifying more SVs in regions where the recombination meiotic rate decreases [24]. The filtering of SV  [41,56,65,67,68]. A combination of the best Col-0 TAIR10.1 sequence and the new high quality optical map data obtained in this study will provide valuable resources to re-evaluate complex region assembly. The svID MU structural variations result either from a too low density of the DLE-1 sites, or from a high divergence of genomic regions between the two ecotypes. In both cases, experimental validations are essential to validate the number and type of SVs. Nevertheless, the fact that the largest events described (MU_097 (Chr4 INV), MU_102 (Chr4 DEL), MU_153 (Chr2 ONT SVs matching the second Bionano translocation) and MU_138 (Chr5 large inversion)) were retrieved in our study, is like a proof of concept of the ONT and Bionano technologies and the parameters used for the analysis.
Comparing locations of the Evry.Ler-1 ONT SVs with Araport11 annotations, we found that common and specific ONT SVs were preferentially linked to TE features and genes, as reported in Jiao and Schneeberger (2020). Looking at the GO-term enrichment in genes overlapping common ONT SVs, an overrepresentation in defense response and ADP-binding terms corresponding to resistance genes was observed. This result is concordant with previous studies [13,24,38,[69][70][71] in which an association between structural variations and the cluster organization of resistance genes was described.

General conclusion
Because analyses of SV locations and their consequences heavily rely on the quality of their identification and the underlying assembly/mapping data, we aimed to compare the performance of ONT and Bionano technologies for structural variation detection. Applying stringent filters on ONT assembly mapping approach and size filters on SVs, we have shown this methodology is an easy and efficient way to detect reliable SVs. Most of the detected SVs were also identified with Bionano optical maps with high concordance despite different characteristics (average, size, median). Nevertheless, long read sequencing technologies make it possible to detect SVs more accurately, while Bionano offers a broad overview of structural rearrangements. Thereby, the choice of technology has to be based on the characteristics of the locations to be studied. If these locations are known to be gene regions without repeated sequences, the analysis of an ONT assembly will be reliable and provide more confidence in the SVs locations. Bionano's interest will then be minimal. In contrast, if these regions are identified as being complex (areas rich in transposable elements for example) the analysis of structural variations from an ONT assembly will be more delicate since the assembly itself and the alignments of the detection will be less reliable in these locations. ONT analyzes from these regions cannot be taken at face value and will require validation (targeted experimentation by labeling, PCR, detection of these SVs by other technology, progeny analysis …). On the other hand, Bionano technology is effective in validating variation in these large complex regions. Combined with Bionano analyzes which provide an overview and point to these areas, ONT analyzes and associated results gain in value. The major limit to Bionano is the lack of access to the sequence information. In addition, whole genome SVs analyses are currently mostly limited to model organisms. However, Oxford Nanopore long reads and Bionano Genomics optical maps assemblies do not require previous knowledge on the genomic architecture or the sequence of the studied organism, this approach expands the field of suitable plant species or species complexes where in-depth SVs analyses can be performed. Unlike in animals, in plants, the heterogeneity and size of genomes, polyploidy, heterozygosity and the sequence references of species which are sometimes very different and potentially of low quality make population analyzes difficult if not impossible. Therefore, population analysis using Bionano is only possible when the reference is of very high quality and genomically very close to other ecotypes. On the other hand, these plant characteristics have less impact on the detection of variations by ONT, which is much more local with this technology.
ONT appears to be especially suitable to carry out plant population analyses and Bionano more relevant to study plasticity of genome structure, leading to an obvious complementarity of these two technologies in SVs analysis.

Plants
Arabidopsis thaliana Columbia-0 (accession number 186AV) and Landsberg erecta-1 (accession number 213AV) seeds were provided by the Versailles Arabidopsis Stock Center (National Research Institute for Agriculture, Food and Environment, Versailles, France, http:// publi clines. versa illes. inra. fr/). They were sown directly in soil and transplanted after 10 days. Plantlets were grown under a 16 h light/8 h night photoperiod in a growth chamber at 20 °C for 4-5 weeks. Before to harvest, the plants were dark-treated for 3 days.

Oxford Nanopore sequencing (MinION) HMW DNA extraction
High Molecular Weight (HMW) DNA extraction was performed using a modified salting-out protocol. A total of 5 g of freshly harvested leaves was ground in liquid nitrogen with a mortar and pestle and transferred to 10 ml of 50 °C prewarmed extraction buffer in a 50 ml tube containing 1.25% SDS, 100 mM Tris-HCl, pH 8, 50 mM EDTA, 0.01% w/v PVP40. Then 37.5 μl of beta-mercaptoethanol (0.375% final) and 10 μl RNAse A (Qiagen ® 100 mg/mL) were added. This solution was incubated for 30 min at 50 °C, under agitation (10 s at 300 rpm every 10 min). After incubation, 20 ml TE (10:1) were added, slowly homogenized then 10 ml of KAc 5 M. The tube was kept on ice for 5 min, then centrifuged at 4 °C during 10 min at 500 g. The solution was transferred in two 15 ml tubes and centrifuged again as previously. The supernatant was transferred in a 50 ml tube containing 1 volume of Isopropanol, slowly inverted 10 times, then centrifuged at 4 °C for 10 min at 5000 g. Pellets were washed with 20 ml ethanol 70% then centrifuged at 4 °C for 5 min at 5000 g. Supernatant was removed and pellets were not completely dried before solubilization in 100 μl of TE (10:1) prewarmed at 50 °C. The DNA solution was then incubated at 50 °C for 10 min. Field Inverted Gel Electrophoresis (Program 50-150 kb on Pipin Pulse from Sage Science) was used for DNA size estimation and DNA samples with molecule size above 50 kb were kept. Purity of DNA was evaluated by spectrophotometry (OD260/280 and OD260/230 ratio).

Bionano optical maps ultra HMW DNA extraction
We performed the DNA extraction using the Base protocol n°30,068 vD (Bionano Genomics) with minor adaptations. Three grams of very young fresh leaves from each genotype were harvested from the dark-treated rosettes. The samples were placed on aluminium foil on ice then transferred to a 50 ml tube surrounded by a screened cap allowing pouring without loss of samples (Bio-Rad) The tubes were kept on ice during the nuclear isolation. Samples were treated in a fixing solution containing 2% formaldehyde under a fume hood then rinsed with fixing solution without formaldehyde. Fixed-leaves were transferred to a square Petri dish with 4 ml of Plant Homogenization Buffer plus (HB+ is HB supplemented with 1 mM spermine tetrahydrochloride, 1 mM spermidine trihydrochloride, and 0.2% 2-mercaptoethanol). Entire leaves were chopped with a razor blade in 2x2mm pieces then transferred to a new tube on ice and 7.5 ml HB+ is added. Using TissueRuptor (Qiagen) the 2x2mm pieces were blended for a total of four cycles (20 s at maximum speed then resting 30 s). Plant homogenates were filtered, first through a 100 μm then to a 40 μm cell strainer and volumes were adjusted to 45 ml. Nuclei were centrifuged at 3840 g at 4 °C during 20 min, supernatants were discarded. Nuclei were gently re-suspended in residual buffer, 3 ml of HB+ were added, then tubes were swirled on ice and the volumes were adjusted to 35 ml. Homogenates were centrifuged at 60 g at 4 °C during 3 min using minimum deceleration. Solutions were very carefully transferred to a new tube in order to avoid carry-over of debris, and filtered again through a 40 μm cell strainer. Nuclei were centrifuged at 3840 g at 4 °C during 20 min, 3 ml of HB+ were added and tubes were swirled on ice. Using Bionano Nuclei Purification by Density Gradient, nuclei homogenate was laid on the top of two solutions with different densities. After a 4500 g centrifugation at 4 °C during 40 min, the nuclei are at the interface of the two solutions. There are recovered with a wide-bore tip in about 1 ml solution and transferred in a 15 ml tube and adjusted to 14 ml with HB+. Nuclei were centrifuged at 2500 g at 4 °C during 15 min. All the buffer was removed and nuclei were re-suspended in 60 μl HB+.
The nuclei solution was adjusted to 43 °C for 3 min and melted 2% agarose from CHEF Genomic DNA Plug Kits (Bio-Rad) was added to reach a 0.82% agarose plug concentration. Plugs were cooled on aluminum blocks refrigerated on ice. Purification of the plugs was performed with Bionano Lysis Buffer adjusted to pH 9 and supplemented with proteinase K and 0.4% 2-mercaptoethanol. Plugs were digested during 2 h at 50 °C in Thermomixer then the solution was refreshed and incubated again overnight. Plugs were treated with RNAse for 1 h at 37 °C in the remaining solution. Plugs were washed three times in Wash Buffer (Bionano Genomics) then four times in TE 10:1. DNA retrieval was performed as recommended by Bionano Genomics, as follow: plugs were melted at 70 °C during 2 min then transferred immediately at 43 °C and incubated 45 min at 43 °C with 2 μl Agarase (0.5 unit/ μl). The melted plugs were recovered with wide-bore tips and dialyzed on a 0.1 μm membrane disk (Millipore) floating on 10 ml TE for 1 h. DNA was quantified in triplicates with Qubit according to Bionano protocol. Two methods were used to estimate the size of DNA molecules: Pipin Pulse and the Qcard Argus System (Opgen) which allows DNA combing on a lane and visualization of molecules after staining under fluorescent microscope. Samples with molecules above 150 kb were kept for labeling. Protocols were performed according to Bionano Genomics with 600 ng of DNA for both Col-0 and Ler-1 ecotypes. The direct label and stain (DLS) labeling consisted of a single enzymatic labeling reaction with DLE-1 enzyme following by DNA staining with a fluorescent marker. It was performed with 750 ng DNA. Chip loading was performed as recommended by Bionano Genomics.

ONT sequencing (MinION) and assembly
ONT libraries were prepared according to the following protocol, using the Oxford Nanopore SQK-LSK109 kit. Genomic DNA or DNA previously fragmented to 50 kb with a Megaruptor (Diagenode S.A., Liege, Belgium) was first size-selected using a BluePippin (Sage Science, Beverly, MA, USA). The selected DNA fragments were end-repaired and 3′-adenylated with the NEBNext ® Ultra ™ II End Repair/dA-Tailing Module (New England Biolabs, Ipswich, MA, USA). The DNA was then purified with AMPure XP beads (Beckmann Coulter, Brea, CA, USA) and ligated with sequencing adapters provided by Oxford Nanopore Technologies (Oxford Nanopore Technologies Ltd., Oxford, UK) using Blunt/TA Ligase Master Mix (NEB). After purification with AMPure XP beads, the library was mixed with Running Buffer with Fuel Mix (ONT) and Library Loading Beads (ONT) and loaded on 4 MinION R9.4 SpotON Flow Cells per Arabidopsis thaliana ecotypes. The resulting FAST5 files were base-called using albacore (versions 2.1.10 and 2.3.1) and FASTA produced as described in Istace et al (2017). Canu version 1.5 (github commit ae9eecc), was used for initial read correction and trimming with the parameters minMemory = 100G, corOutCoverage = 10,000. The corrected sequences were merged in one final FASTA file per ecotype that was later used as assemblers'input.
Assemblies were performed with the relevant genome size parameter set to, or coverage calculation based on, a 130 Mb genome size. Assemblers used with default parameters were Canu version 1.5 ([54], github commit 69b5f32), Rapid Assembler (RA, [59], https:// github. com/ lbcb-sci/ ra commit 07364a1) and SMARTdenovo version 1.0 (with the option -c 1 to run the consensus step) ( [60], https:// github. com/ ruanj ue/ smart denovo commit 61cf13d). The MUMmer suite version 3.0 [61] was run with the parameters used in Zapata et al. 2016 [38]. To analyze the assemblies, they were aligned to the reference genome of Arabidopsis thaliana using nucmer with the options -c 100 -b 500 -l 50 -g 100 -L 50. The TAIR10.1 reference genome for A. thaliana Columbia 0 (Col-0, GCF_000001735.4) was chosen as it is the available sequence with the latest annotation. As Pucker et al.
(2019) hightlighted, the nuclear sequence is the same as the TAIR9 reference genome but chloroplastic and mitochondrial sequences were added that were necessary to detect translocation with Bionano technology. The reference genome of Arabidopsis thaliana Landsberg erecta was the one published by Zapata et al. in 2016 (Ler, Genbank LUHQ00000000.1, [38]). The alignments were filtered with delta-filter (options − 1 -l 10,000 -i 0.95) and visualized with the mummer-plot (options --fat --large --layout -png) or DNAnexus (github commit 78e3317). These MUMmer parameters [38] allowed conserving exact matches larger than 50 bp and alignments longer than 10 kb with a minimal identity of 95%. To check assemblies completeness and fragmentation, they were compared to each other based on the metrics (Number of contigs, N50, cumulative genome sizes) and the genome alignments to the references generated with MUMmer viewed with the DNAnexus dot (https:// dnane xus. github. io/ dot/).
To evaluate the completeness of our ONT data, mapping of the corrected ONT reads on the Col-0 TAIR10.1 reference genome were performed with Minimap2/2.15 aligner [55] with -a -x map-ont parameters. The Samtools/1.6 depth tool with -a option [57] gave us the alignment depth at each Col-0 TAIR10.1 reference position. The error sequencing rate was inferred from the identity rate percent obtained by aligning the Evry.Col-0 and Evry.Ler-1 trimmed corrected ONT reads on the Col-0 TAIR10.1 and Ler reference genomes respectively.

Bionano optical map assembly
As it can be beneficial for assembly steps, molecules subsampling was conducted when flowcells yielded more than 90 Gb and 600X of data. This adapted selection of molecules was made on each run with the Bionano RefAligner tool in command line (version 1.3.8041.8044 with -minlen 180 -randomize 1 -subset 1 nb_molec options) or with Bionano Access (version Solve3.3 with Filter Molecule Object utility) (Additional file 1: Tables S6 and S7).
Maps were then constructed with the tool Generate de novo Assembly of the Bionano Solve ™ (Bionano Genomics, version 3.3) using the options recommended by Bionano (With pre-assembly, Non haplotype without extend and split) and a 0.115 Gb genome size. The pre-assembly step calculates noise parameters that optimize the quality of the assembly (less and larger maps). When a reference FASTA file is added, noise parameters are calculated in aligning the molecules to the reference. Otherwise, the noise parameters are estimated thanks to a first rough assembly of the molecules. For Col-0 and Ler-1 ecotypes, three maps were obtained, one without reference, one with the Col-0 TAIR10.1 reference genome and one with the Ler reference genome (Additional file 1: Tables S8 and S9). In our study, the metrics of these assemblies are very similar. This stability reflects that noise parameters estimated either with references fasta sequences or our data, were comparable. This is a guaranty of the quality of Bionano data and assemblies.

ONT variation detection
Structural variations were obtained with MUMmer's show-diff utility on the filtered alignments of SMARTdenovo assemblies against the reference genomes Col-0 TAIR10.1 and Ler. One DIFF file per comparison was obtained. Six SV types (Gap, Duplication, Break, Jump, Inversion, Sequence) were described in the Additional file 2: Fig. S4.

Bionano variation detection
SVs detections were performed on the optical maps built with the public reference and our SMARTdenovo ONT assemblies using the tool Convert SMAP to VCF file. VCF files were recovered, describing all the structural variations between the optical maps and the considered reference. The variations were classified into four types: deletion, insertion, translocation and inversion. SVs detection stringency is intrinsic, based on the number of aligned molecules (at least nine by default) and the number of labels across each variants breakpoint on the genome map (at least two by default) (Bionano tutorial: https:// biona nogen omics. com/ suppo rt-page/ data-analy sis-docum entat ion/). The technology gave an interval with uncertainty about breakpoint positions (CIPOS and CIEND in VCF files). In this study, these values were used to calculate the most extended positions for the Bionano SVs and avoid the effect of label fuzz.
The low number of structural variations between Evry. Col-0 optical maps and the Col-0 TAIR10.1 reference genome (as Evry.Ler-1 optical maps and Ler reference genome) reflects the good collinearity between the maps and the references (Additional file 1: Table S25). SVs gave us an indication of the location of conflicts that could be due to mis-assemblies or intra-ecotype variations. Interecotype detection allowed us to describe the variations between Evry.Col-0 and Evry.Ler-1.
Quality and length characteristics were used to better describe and filter SVs. Bionano Solve associates a quality score to each INS and DEL based on sensitivity and the fraction of alternative calls in mix assemblies that were called in the alternative genome assembly [from no quality (.) or poor (0) to confident quality (20)]. We observed that this indicator follows the same trend as the SVs size (Additional file 1: Tables S11 and S16). Moreover, size range values where SVs abundances are very different between both technologies at the extremes: the smallest (< 1 kb), where ONT technology detected much more SVs and the highest (> 5 kb) where Bionano technology detected proportionally more SVs. In our comparison analysis, to remove poor quality Bionano SVs, ONT sequencing errors and high sensitivity, a filter on query SV size (> 1 kp) was applied. Confidence scores for translocation and inversion breakpoints were computed as p-values, giving true confidence (in Mahalanobis distance) to positive calls. The recommended cutoffs are 0.1 and 0.01 for translocation and inversion breakpoints calls respectively and were used to eliminate uncertain inversion on Chr2.

SV description
Custom-made R and Perl scripts were used to edit other tools outputs, describe ONT and Bionano SVs (types, size), locate SVs along the chromosomes and filter them. For ONT technology, SVs identified as assemblies'discordances were quickly described and discarded before comparison. Those included sequences (SEQ), breaks (BRK) and jumps (JMP) ONT SV because they correspond to assembly or reference artifacts. Finally, size filters (more than 1 kb) were applied to take into account ONT high sequencing error rate, and low quality Bionano SVs. For Bionano SVs the largest absolute positions of the SV were conserved, taking into account the uncertainty around breakpoints due to the distance between two labels.
Comparison of SV obtained with both ONT and Bionano technologies were based on the overlap of their absolute positions.
ONT SV and Bionano SVs files were used after conversion to BED format to identify overlapping regions with BEDtools (version 2.27.1, github commit cd82ed5, "bedtools intersect -wa -wb -a INPUT1.bed -b INPUT2.bed -loj > OUTPUT.bed"). Raw comparisons were then compared, compiled and formatted in one final output file using custom-made R scripts. For each SVs location, this file contained descriptors (SVs size, type, quality) for both technologies, information on the type of conflict and a 2 letter code. This code characterized the SVs location as follows: the first letter corresponds to the ONT SV characterization, the second to the Bionano SV. M ("Multiple") means more than one SV locations, U ("Unique") one SV location, N ("No") no SV location. For example, the code "MU" means that this location harbored multiple ONT SV corresponding to a unique Bionano location. No UM localization (corresponding to an ONT localization overlapping several Bionano SV localizations) was detected in our study. The landscapes and SVs occurrences visualization was performed with Circos/0.69.9 tool (perl/5.16.3 [72]).

SV and annotation
SVs overlapping a gene and/or TE were identified with the bedtools intersect by comparing their absolute positions to A. thaliana Col-0 annotations (July 11th 2019 release, TAIR10_GFF3_genes_transposons. gff ). Lists of genes impacted by SV for both technologies were extracted and a GO-term enrichment analysis performed using Fisher's Exact test with a Bonferroni correction in PANTHER (released 20,200,407 with GO Ontology database DOI: https:// doi. org/ 10. 5281/ zenodo. 38734 05 Released 2020-06-01, [63], http:// go. panth erdb. org/). Significance was evaluated based on a P-value ≤10-5 and an FDR value ≤0.01 [73].