Genome Assembly and Structural Variation Analysis of Luffa acutangula Provide Insights on Flowering Time and Ridge Development

Luffa spp. is an important worldwide cultivated vegetable and medicinal plant from the Cucurbitaceae family. In this study, we report a high-quality chromosome-level genome of the high-generation inbred line SG261 of Luffa acutangula. The genomic sequence was determined by PacBio long reads, Hi-C sequencing reads, and 10× Genomics sequencing, with an assembly size of 739.82 Mb, contig N50 of 18.38 Mb, and scaffold N50 of 56.08 Mb. The genome of L. acutangula SG261 was predicted to contain 27,312 protein-coding genes and 72.56% repetitive sequences, of which long terminal repeats (LTRs) were an important form of repetitive sequences, accounting for 67.84% of the genome. Phylogenetic analysis reveals that L. acutangula evolved later than Luffa cylindrica, and Luffa is closely related to Momodica charantia. Comparing the genome of L. acutangula SG261 and L. cylindrica with PacBio data, 67,128 high-quality structural variations (SVs) and 55,978 presence-absence variations (PAVs) were identified in SG261, resulting in 2424 and 1094 genes with variation in the CDS region, respectively, and there are 287 identical genes affected by two different structural variation analyses. In addition, we found that the transcription factor FY (FLOWERING LOCUS Y) families had a large expansion in L. acutangula SG261 (flowering in the morning) compared to L. cylindrica (flowering in the afternoon), which may result in the early flowering time in L. acutangula SG261. This study provides valuable reference for the breeding of and pan-genome research into Luffa species.

Plants 2024, 13 L. acutangula and L. cylindrica are two vegetable species commonly found in South and Southeast Asia.L. acutangula is widely grown; however, L. cylindrica is considered an underutilized crop [7].There are obvious phenotypic differences between L. cylindrica and L. acutangula.There is a profound variation in fruit size, length, shape, and color [7].The smooth gourd fruit is smooth and cylindrical, and the ridge gourd fruit has a tapering neck with some prominent longitudinal ridges [8].In addition, the two species can also be distinguished according to color and flowering time: L. cylindrica has bright yellow flowers that bloom in the early morning (4-8 a.m.), while L. acutangula has pale cream flowers that bloom in the late afternoon (5-8 p.m.) [2,8].It is very meaningful to study the difference in flowering time between these two species, which can contribute to exploring the different flowering mechanisms in these two species.
The process of introgressing desired traits acquired from interspecific hybridization into elite cultivars can be made more efficient using molecular breeding approaches.With the development of sequencing technology, genome assembly has also developed rapidly.The genome of L. cylindrica was sequenced and assembled for the first time using nextgeneration sequencing (NGS) technology (small insert (220 bp) library) in 2017, and a genome of 885.01 Mb was obtained [9].Subsequently, three genome assemblies of L. cylindrica have been completed utilizing the PacBio long read single-molecule real-time (SMRT) sequencing platform [10][11][12], while only one genome assembly of L. acutangula has been completed [10].
To explore genetic variability and diversity that exist within a species, structural variation (SV) studies have been recently reported for some important crop plants, including corn, soybean, and rice [13].Structural variations (SVs), including insertion, deletion, tandem repeat, inversion, translocation, copy number variations (CNVs) with a length of more than 50 bp, and chimerism variations present a more complicated situation [14].Compared with single-nucleotide polymorphisms (SNPs), SVs occupy a larger proportion in the variation base and have significant impact on variations in the genome [15].Numerous studies have indicated that SVs play a critical role in genome evolution and genetic control of agronomic traits such as flowering time, fruit size, and stress resistance [13,16] and gradually have become an increasingly important research field.Likewise, presence/absence variations (PAVs) also can contribute to trait variation [17].
The first attempt to perform a genome survey sequencing of Luffa cylindrica using next-generation sequencing (NGS) technology was carried out by An et al. [9].Recently, Zhang et al. (2020) reported a de novo assembly of the L. cylindrica genome, utilizing the Pacific Biosciences (PacBio) sequencing platform [12].Pootakham et al. previously assembled acutangula and cylindrica genomes and investigated alternative splicing events in Luffa [10], but little research has been conducted on the structural variability of L. acutangula and L. cylindrica.The narrow genetic and genomic resources obviously limited the breeding improvement of Luffa.In this study, we de novo assembled the genome of the high-generation inbred line L. acutangula SG261 by combining the PacBio long reads, Hi-C data, and 10× Genomics short reads.Genome annotation, comparative genome analysis, and structure variation identification were conducted based on the assembled genome sequences.PacBio long-read sequencing data were used to identify SVs between L. acutangula and L. cylindrica.The present study provides a valuable genetic resource for deciphering the genome evolution of Luffa species and is positioned to serve as the reference genome guiding the molecular breeding of Luffa crops.

Plant Materials
The highly inbred line SG261 of L. acutangula was used in this study.This line has distinctive ridges and large flowers (blooming at night) (Figure 1).The seedlings were given adequate sunlight, nutrients, and water for normal growth in a greenhouse at Jiangmen Institute of Agricultural Sciences, in Jiangmen, Guangdong, China.

Plant Materials
The highly inbred line SG261 of L. acutangula was used in this study.This line has distinctive ridges and large flowers (blooming at night) (Figure 1).The seedlings were given adequate sunlight, nutrients, and water for normal growth in a greenhouse at Jiangmen Institute of Agricultural Sciences, in Jiangmen, Guangdong, China.

Section Observation
The ridges of young (the day after pollination) and mature fruits (eighth day after pollination) of L. acutangula and L. cylindrica were sampled to make paraffin sections by Henan Honglin Educational Instrument Co., Ltd.(Henan, China).After the material was taken, it was fixed, rinsed, dehydrated, permeabilized, embedded, sectioned, spread, baked, and dehydrated.Then, the sections were stained with plant tissue staining solution Safranin O-fast green and observed on an Olympus microscope [18].

Genome Sequencing
Young fresh leaves of SG261 were collected for genomic DNA extraction and sequencing.Total DNA was extracted using the cetyltrimethylammonium bromide

Section Observation
The ridges of young (the day after pollination) and mature fruits (eighth day after pollination) of L. acutangula and L. cylindrica were sampled to make paraffin sections by Henan Honglin Educational Instrument Co., Ltd.(Henan, China).After the material was taken, it was fixed, rinsed, dehydrated, permeabilized, embedded, sectioned, spread, baked, and dehydrated.Then, the sections were stained with plant tissue staining solution Safranin O-fast green and observed on an Olympus microscope [18].

Genome Sequencing
Young fresh leaves of SG261 were collected for genomic DNA extraction and sequencing.Total DNA was extracted using the cetyltrimethylammonium bromide (CTAB) method to construct PacBio and short-read libraries.The short-read libraries with an insertion size of 350 bp were prepared using VAHTS Universal DNA library preparation kit (Vazyme, Nanjing, China) by the Beijing Genomics Institute (BGI), and the insertion fragments of the library were detected using the Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA).The library was sequenced on the MGI-SEQ 2000 sequencing platform to produce pair-end sequence data (2 × 150 bp) [19].For the PacBio library, high-molecularweight DNA was extracted as above.About 30 ug of high-molecular-weight DNA was Plants 2024, 13, 1828 4 of 18 used to prepare template library of 30-40 kb using the BluePippin Size Selection system (Sage Science, Beverly, MA, USA).The library was then prepared by adding specific splice sequences to both ends of the fragmented DNA.The library was sequenced on the PacBio Sequel II sequencing platform using standard sequencing for continuous long reads (CLRs) to generate long reads by SMRT sequencing chips [20].
To obtain Hi-C sequencing data, the chopped young leaves of SG61 were vacuumfixed with 2% fresh formaldehyde in NIB buffer for 45 min, and glycine was added to a concentration of 0.375 M. The reaction was carried out for 5 min, and then liquid nitrogen was added and ground to powder.Then, it was filtered through a layer of Miracloth membrane.Isolated cells were lysed, and proteins were broken down using proteinase K at 65 degrees centigrade.DNA was purified by QIAamp DNA Mini Kit (Qiagen), and then dotted flat-end junctions were removed using Dynabeads ® MyOne™ Streptavidin C1 (Thermofisher).Then, sequencing library was prepared using the NEBNext ® Ultra™ II DNA library Prep Kit and sequenced on the BGI MGI-SEQ 2000 platform (San Diego, CA, USA) using the 150 PE mode.

RNA-Seq Library Construction and Sequencing
Fresh, disease-free, big flower buds before flowering were selected.The mixed flower buds from 5 individual plants were used for RNA extraction and transcriptome sequencing in three biological replicates.Total RNA was extracted using Trizol reagent (Invitrogen, Waltham, MA, USA), and the RNA purity and integrity were assessed by NanoDrop 2000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA) and by Agilent 2100 Bioanalyzer (Agilent Technologies, CA, USA).After passing the quality inspection, the sequencing library was constructed using the VAHTS Universal V6 RNA-seq Library Kit for MGI (Vazyme, Nanjing, China) and quantified by Qubit 3.0 Fluorometer (Life Technologies, Carlsbad, CA, USA).The library was then assessed for size and quality by the Agilent 2100 Bioanalyzer (Agilent Technologies, CA, USA) and sequenced on the MGI-SEQ 2000 platform.
After assembling the genome contigs of SG261, Hi-C data were used to assign genome assembly to chromosome level.Hi-C clean reads were aligned onto the assembled genome by Juicer v1.5.7 [27], and then software 3D-DNA v180114 [28] was used to preliminarily cluster and orient the data.Then, the assembly genome was adjusted, reset, and aggregated by JuiceBox v1.11.08 [29] to improve the quality of chromosome assembly.The chromosome assembly genome was mapped to a published L. acutangula genome [10] using minimap2 v2.24 [30], and the final genome was obtained by reverse complementation and direction adjustment according to the mapped results.
To evaluate the assembly results, BWA-mem v0.7.17 [31] was used to map the Hi-C data to the assembled genome, and a Hi-C contact map was generated by HiCExplorer v3.5.3 software [32] with a 100 kb window and a threshold of −1.5 to 5. Finally, BUSCO v3.0.2 [33] was used to estimate the integrity of the assembly genome based on the Embryophyta_odb9 database (n = 1440).
Finally, a circular genome visualization map was constructed by Circos v0.69-8 [44] software to show the results of genome assembly and genome annotation.
The software minimap2 v2.21 was used to compare the SG261 assembly genome with other Luffa genomes (L.acutangula [10] and L. cylindrica [11]).The alignment results of mapq60 were retained, and the PAFR v0.0.2 software was used to visualize the alignment results and obtain the synteny dot plots.

Structural Variants Analysis
Short reads were used to identify SNP and INDEL between SG261 genome and L. cylindrica genome (downloaded from NCBI, SRR10818295).Short reads were mapped to the assembled genome using BWA v0.7.17 with parameters "-R '@ rg\tid: lae518\TPU: lae518\TSM: lae518'".Sambamba v0.8.0 [59] was used to mark and filter PCR duplication, and then SNP and InDel were called using bcftools v1.13 software [60].Finally, densities of SNPs and InDels were calculated with a window size of 400 kb and displayed on the Circos plot.
Long reads were used to identify structural variants (SVs) between the L. acutangula SG261 genome and the L. cylindrica genome [11].Minimap2 v2.21 was used to map the PacBio long reads of SG261 to the L. cylindrica genome with parameter "--MD -ax map-pb", and samtools v1.13 [61] was used for sorting and indexing.Then, SVs were called using Sniffles v1.0.12 [62] with parameter "-l 50 -t 45".The long reads of the L. cylindrica genome were downloaded from NCBI (SRP239503).Then, the long reads were mapped to SG261 genome, and SVs were called using the same method above.
Plants 2024, 13, 1828 6 of 18 We then filtered the SVs using the SG261 genome as a reference genome according to a previous study [13].First, we removed imprecise SVs and SVs on scaffolds.Then, SVs longer than 100 kb in length and genotype of "0/0" were removed.Finally, we identified regions prone to false SVs and removed SVs that intersected with them.To identify these regions, PaSS [63] was used to simulate long reads based on the L. acutangula genome [10].PacBio long reads of SG261 and long reads simulated by L. acutangula were mapped to the SG261 genome using minimap2, and SVs were called using Sniffles.SVs larger than 100 kb and imprecise SVs were then removed to obtain the final vcf file.Bedtools v2.30.0 [64] was used to extract the SVs between L. acutangula SG261 and L. cylindrica, which were demonstrated to intersect with the 2 vcf files.In addition, according to the genome annotation file, the genes affected by SVs were identified with Bedtools.
We identified PAVs between L. acutangula SG261 and the L. cylindrica genome using a sliding window method [17].We divided the L. cylindrica genome into 500 bp windows with 100 bp as the step size and then mapped it with the assembled genome using BWA to identify PAV.According to the genome annotation file, the genes affected by PAVs were identified with Bedtools.

GO Enrichment and Functional Annotation of Genes with SVs and PAVs
Genes with SVs and PAVs were submitted to the NT database, NR database, and Swissprot database for functional annotation using BLASTN v2.11 and diamond v2.0.9 software, respectively.In addition, we used SG261 genome as the background and SVs and PAV-associated genes as foreground for GO enrichment.Interproscan v5.52-86 was used to annotate the SG261 genome genes with parameters "-f TSV -iprlookup -goterms -pa".ClusterProfiler v4.0.0 [65] was used to enrich SV-related genes and PAV-related genes.p < 0.05 was a significant enrichment.

Analysis of Expansion and Contraction for Flowering-Time-Related Genes
According to the known flowering-related genes [66], we searched the annotation results of SV-related genes and found that there were 16 genes that might be related to flowering time.Then, we performed gene family expansion and contraction for some flowering-related gene families, including FT [67] and CO-like gene families [68], as well as unidentified FY (FLOWERING LOCUS Y) and EFM (EARLY FLOWERING MYB PROTEIN) families and some MYB [69], AP2/ERF [70], bZIP [71], NAC [72], and WAKY transcription factor families.We downloaded the sequences of these families in cucumber based on previous gene family analysis, and the unidentified FY and EFM were retrieved according to the NCBI database.The heatmap was obtained using TBtools [73].

Genome Sequencing and Assembly
A high-quality chromosome-level genome of the inbred line SG261 was assembled in the present study.Firstly, 74.96 Gb clean reads were obtained by the MGI-SEQ 2000 sequencing system (Table S1).Based on 17-mer frequency, the estimated genome size was 879.83 Mb with a heterozygosity rate of 0.49% and 73.00% repetitive sequences (Figure S1, Table S2).Secondly, a total of 142.75 Gb in PacBio clean reads was generated and used for contig assembly (Table S1).The assembled contig genome was composed of 283 contigs with a total length of 739.31 Mb, of which contig N50 was 18.38 Mb in length, with an average length of 2.61 Mb (Tables 1 and S4).Thirdly, 70.24 Gb of Hi-C data was obtained by the BGI MGI-SEQ 2000 sequencing system for assembly (Table S1).Based on Hi-C data, the contig sequences were anchor-corrected to obtain a scaffold-level genome.The 283 contigs were anchored to 31 scaffolds, including 13 pseudo-chromosomes and 18 scaffolds, of which the shortest was 50.01 Mb (chr12) and the longest was 64.19 Mb (chr01) (Table S5).The final pseudo-chromosomal level genome had a total length of 739.82 Mb, and the scaffold N50 was 56.08 Mb (Table 1).Finally, the Hi-C data were mapped to the assembled genome, and the Hi-C correlation heatmap obtained by HiCExplorer showed that the Hi-C-assisted assembly was high quality (Figure S2).We also aligned short reads to the assembled genome, and 99.51% of the data could be mapped to the assembled genome, indicating that the assembled genome contained most of the genome information.The completeness of the assembled genome was 91.25%, evaluated by BUSCO (Table S3), which indicated that the SG261 genome assembly was complete and could be used for subsequent analysis.Compared with other Luffa genomes [10][11][12], the assembled genome of SG261 had the highest N50.In addition, we found that the SG261 genome and another L. acutangula genome are larger than the three L. cylindrica genomes, which confirmed that the genome of L. acutangula is larger than that of L. cylindrica (Table 2).

Genome Annotation
EDTA was used to annotate the complete repetitive sequences of the assembled genome, and RepeatMasker was used to annotate the incomplete repetitive sequences based on the results of EDTA.The results showed that 536.83 Mb (72.56% of the assembled genome) of the sequences was annotated as repetitive sequences, including 67.84% of the retrotransposon long terminal repeats (LTRs) and 4.72% of the DNA type transposon elements (Figure 2).Copia and Gypsy elements are the main elements in LTRs, 276,809 and 261,064 in number, covering 178.81 Mb and 227.38 Mb, accounting for 24.17% and 30.73% of the genome, respectively (Table S6).In addition, we also annotated the tandem repeats (TRs) using TRF, which are enriched at centromeric regions (Figure 2).The accumulation of repetitive elements, especially LTRs, may be the reason for the difference in k-mer size and assembly size of the genome [74] and also lead to the large genome of Luffa [10].Comparing several Luffa genomes, it was found that although the genome of L. acutangula was generally larger than that of L. cylindrica, the length of non-repetitive sequences was less than that of L. cylindrica (Table S6).Therefore, a large number of repetitive sequences may lead to the larger genome of Luffa as compared to other Cucurbitaceae plants.
Plants 2024, 13, x FOR PEER REVIEW 9 of 20 24.17% and 30.73% of the genome, respectively (Table S6).In addition, we also annotated the tandem repeats (TRs) using TRF, which are enriched at centromeric regions (Figure 2).The accumulation of repetitive elements, especially LTRs, may be the reason for the difference in k-mer size and assembly size of the genome [74] and also lead to the large genome of Luffa [10].Comparing several Luffa genomes, it was found that although the genome of L. acutangula was generally larger than that of L. cylindrica, the length of non-repetitive sequences was less than that of L. cylindrica (Table S6).Therefore, a large number of repetitive sequences may lead to the larger genome of Luffa as compared to other Cucurbitaceae plants.A total of 27,312 protein-coding genes were annotated in the assembled genome, with an average transcript length of 1305.67 bp (Tables 2 and S6).Compared with other Luffa genomes [10][11][12], there are fewer protein-coding genes in SG261 than other Luffa genomes (Tables 2 and S6).In addition, we functionally annotated all protein-coding genes, and a total of 25,356 genes were annotated, accounting for 92.84% of all protein-coding genes (Table S7).A total of 27,312 protein-coding genes were annotated in the assembled genome, with an average transcript length of 1305.67 bp (Tables 2 and S6).Compared with other Luffa genomes [10][11][12], there are fewer protein-coding genes in SG261 than other Luffa genomes (Tables 2 and S6).In addition, we functionally annotated all protein-coding genes, and a total of 25,356 genes were annotated, accounting for 92.84% of all protein-coding genes (Table S7).

Synteny and Phylogenetic Analysis
Synteny analysis indicated that the SG261 assembled genome showed good linearity with the published L. acutangula genome [10] (Figure 3a).The SG261 assembled genome showed relatively low linearity with the L. cylindrica genome [11] with many mismatches of chromosomes.Among them, chromosomes 1, 9, 10, 11, and 13 of SG261 corresponded to chromosomes 4, 7, 6, 2, and 13 of the L. cylindrica genome, respectively, with same genome directions; chromosomes 2, 3, 4, 5, 7, and 8 of SG261 corresponded to chromosomes 12, 3, 5, 11, 10, and 9 of the L. cylindrica genome, respectively, with some reverse complementary strands (Figure 3b).In addition, we also found that the SG261 assembled genome has chromosome translocation between chromosomes 6 and 12 (chromosomes 1 and 8 of L. cylindrica), and there were small segments of chromosome inversion in chromosomes 7, 8, and 11 of the SG261 genome (chromosomes 10, 9, and 2 of L. cylindrica) (Figure 3b).Three Luffa genomes and 10 other Cucurbitaceae genomes were used for phylogenetic analysis, and the homologous gene clusters of 13 species were identified (Figure 3c).Through homologous gene cluster analysis, we found that the number of single-copy and two-copy genes in the three Luffa genomes were similar (Figure 3c, Table S8).Phylogenetic analysis indicated that the three Luffa genomes were clustered to a  Three Luffa genomes and 10 other Cucurbitaceae genomes were used for phylogenetic analysis, and the homologous gene clusters of 13 species were identified (Figure 3c).Through homologous gene cluster analysis, we found that the number of single-copy and two-copy genes in the three Luffa genomes were similar (Figure 3c, Table S8).Phylogenetic analysis indicated that the three Luffa genomes were clustered to a branch, and the evolution time of L. acutangula was later than that of L. cylindrica, which confirmed the close relationship of the three Luffa genomes.Bitter gourd (M.charantia) was the closest cucurbit vegetable to the Luffa species, followed by Cucurbita, Cucumis, Benincasa, Lagenaria, and Citrullus, which confirmed the results in the previous study [10,75].

Variation between Luffa acutangula SG261 and Luffa cylindrica
Generally, genomic variation can be mainly divided into single-nucleotide polymorphisms (SNPs), small insertion or deletions (InDels, ≤50 bp), and structural variations (SVs, >50 bp) [76].To explore the differences between L. acutangula and L. cylindrica, we used short-read data and long-read data to identify variants.Taking the L. acutangula SG261 genome as the reference genome, 11,505,354 variants were identified by the short reads mapped from the L. cylindrica genome, including 11,017,265 SNPs and 488,089 InDels, with an average of 14.89 SNPs and 0.66 InDels per kb.Interestingly, the density distribution of SNPs and InDels shared the same genomic regions with the gene density distribution (Figure 2).The density of SNPs and InDels was lower in the regions with dense repetitive elements but higher in the regions with high gene density.SNPs and InDels have the same density distribution in the genomes of pak choi, Chinese cabbage, and oilseed yellow sarson [17].Most SNPs and InDels in the gene region come from the intergenic region, and the gene sequence is relatively conservative [77].The high variation density of chromosome arm regions may be caused by the variation in intergenic regions.
Structural variations were identified using the PacBio long reads of L. acutangula SG261 and L. cylindrica.Analysis of SVs between the genomes of SG261 and L. cylindrica produced 154,594 SVs identified with SG261 as the reference genome and 181,460 SVs identified with L. cylindrica as the reference genome.Among the different types of SVs, there are more DUP and INVDUP (complex chimeric variants with both inversions and duplications on the chromosome) SVs identified in L. acutangula SG261 and more DEL, INS, INV, and BND SVs in L. cylindrica (Figure 4a, Tables S9 and S10).The length and number of different types of SVs and their distribution in the two genomes were the same (Figure 4c,f).We mainly focused on the SVs identified with L. acutangula SG261 as the reference genome.A total of 67,128 SVs were obtained after filtering (Table S11).The DEL and INS were the main types of variation in SVs, accounting for 43.8% and 35.6% of the total SVs before filtering and 60.75% and 38.17% of the total SVs after filtering (Figure 4d-h).The lengths of DEL and INS variants were mostly between 100 and 1000 bp (Figure 4f-i).For INV and DUP variants, the SVs before filtering were mainly distributed in the two ranges (1000-5000 bp and >10,000 bp).The distribution of SVs after filtering was similar to that of before filtering (Figure 4d,f,g,i and Table S12).Finally, we extracted 2424 genes with SVs in CDS regions based on the genome annotation results and the SV data.

Functional Annotation and GO Enrichment of SV and PAV Genes
In order to explore whether structural variation affects the phenotypic difference between L. acutangula and L. cylindrica, we performed functional annotation and GO enrichment for 2424 SV genes and 1094 PAV genes.We aligned SV and PAV genes to NT, NR, and Swissprot databases for functional annotation.A total of 2054 genes can be annotated in the 2424 SV genes.A total of 870 genes can be annotated in the 1094 PAV genes.

Functional Annotation and GO Enrichment of SV and PAV Genes
In order to explore whether structural variation affects the phenotypic difference between L. acutangula and L. cylindrica, we performed functional annotation and GO enrichment for 2424 SV genes and 1094 PAV genes.We aligned SV and PAV genes to NT, NR, and Swissprot databases for functional annotation.A total of 2054 genes can be annotated in the 2424 SV genes.A total of 870 genes can be annotated in the 1094 PAV genes.

Structural Variation Genes Involved in Ridge Development
Ridges are the most important phenotype in L. acutangula in contrast to the absence of ridges in L. cylindrica.Compared with L. cylindrica, the cortical cells in L. acutangula grow faster longitudinally, resulting in longer and larger cortical cells.In addition, there are more cortical cells in L. acutangula than L. cylindrica at the ridge position (Figure 5a,b).
GO enrichment of SV-and PAV-related genes enriched for pectinesterase activity and cell wall modification can be seen in Figure 5c-f.For example, the DEL (49,297,508-49,311,124 bp) of chromosome 9 affected the pectinesterase PPME1-like gene

Structural Variation Genes Involved in Ridge Development
Ridges are the most important phenotype in L. acutangula in contrast to the absence of ridges in L. cylindrica.Compared with L. cylindrica, the cortical cells in L. acutangula grow faster longitudinally, resulting in longer and larger cortical cells.In addition, there are more cortical cells in L. acutangula than L. cylindrica at the ridge position (Figure 5a,b).
GO enrichment of SV-and PAV-related genes enriched for pectinesterase activity and cell wall modification can be seen in Figure 5c-f.For example, the DEL (49,297,508-49,311,124 bp) of chromosome 9 affected the pectinesterase PPME1-like gene (Lac_SG261_ V1bChr09g017340.1)and the pectinesterase 63 gene (Lac_SG261_V1bChr09g017350.1); the DEL (50,615,779-50,616,960 bp) of chromosome 10 leads to the deletion of a large fragment of the pectinesterase/pectinesterase inhibitor 7 gene (Lac_SG261_V1bChr10g019340.1)and so on (Table S14).Pectin gels are capable of large changes in hydration and stiffness, which can alter the behavior of cells and tissues.For example, increasing the stiffness of cell wall pectin gel may result in decreased cell growth or if stiffened enough may cause cell-cell separation by gel fracture [78].Variation in pectinesterase may lead to variation in cell growth, thus affecting the growth of fruits.Therefore, these genes affected by SVs and PAVs may play a role in the development of the ridges of L. acutangula.

Analysis of Flowering-Time-Related Genes
Flowering is an important process that determines the success of plant reproduction.Plants must accurately combine internal and environmental signals to initiate the flowering process [66].Flowering time has a very important influence on plant fitness and yield [79].There are different flowering times between L. acutangula and L. cylindrica.The regulatory pathway of plant flowering has a relatively formed framework, and the flowering regulatory pathway map has been integrated in previous studies [66].Compared to SNPs, SVs can cause large-scale perturbations of cis-regulatory regions and are therefore more likely to quantitatively change gene expression and phenotypes [13].Based on the flowering regulatory pathway map and functional annotation of SV genes, we found 17 genes with SVs that may affect the flowering of Luffa (Table S15).These 17 genes are mainly affected by DEL-, INS-, and DUP-type SVs, among which a 78 bp insertion on chromosome 1 leads to an insertion of the PIE1 (Lac_SG261_V1bChr01g011540.1) gene; a 17,284 bp DEL of SG261 on chromosome 2 leads to the deletion of three adjacent FLOWERING LOCUS T (FT) genes (Lac_SG261_V1bChr02g021160.1-Lac_SG261_V1bChr02g021180.1)(Figure 6b, Table S15), the FT receives feedback from the upstream regulatory mechanism and has great influence on flowering [80]; a DUP on chromosome 8 leads to four duplications of three FLOWERING LOCUS Y (FY) genes (Lac_SG261_V1bChr08g012820.1, Lac_SG261_V1bChr08g012830.1, and Lac_SG261_V1bChr08g012880.1) (Figure 6a, Table S15), and the FY gene mutant blooms later in both long-day and short-day conditions [81,82].In addition, some transcription factors may affect flowering time (Table S15).These genes with SVs may affect the flowering time in Luffa.
To explore the quantitative changes in flowering-related gene families in evolution, we performed expansion and contraction analysis of some gene families and genes.We downloaded these gene families and their protein sequences in cucumber, constructed homologous gene clusters of all genes based on protein sequences using Orthofinder v2.5.4,and checked the expansion and contraction of gene families according to the gene clusters (Figure 7).There are more FY genes in SG261 than other species, which further verified the authenticity of DUP variation at the position of FY genes.However, there was no significant difference in the number of genes in some flowering-related gene families, such as CO-like, FT, and EFM gene families, between SG261 and L. cylindrica (Table S16).
influence on flowering [80]; a DUP on chromosome 8 leads to four duplications of three FLOWERING LOCUS Y (FY) genes (Lac_SG261_V1bChr08g012820.1, Lac_SG261_V1bChr08g012830.1, and Lac_SG261_V1bChr08g012880.1) (Figure 6a, Table S15), and the FY gene mutant blooms later in both long-day and short-day conditions [81,82].In addition, some transcription factors may affect flowering time (Table S15).These genes with SVs may affect the flowering time in Luffa.To explore the quantitative changes in flowering-related gene families in evolution, we performed expansion and contraction analysis of some gene families and genes.We downloaded these gene families and their protein sequences in cucumber, constructed homologous gene clusters of all genes based on protein sequences using Orthofinder v2.5.4,and checked the expansion and contraction of gene families according to the gene clusters (Figure 7).There are more FY genes in SG261 than other species, which further verified the authenticity of DUP variation at the position of FY genes.However, there was no significant difference in the number of genes in some flowering-related gene families, such as CO-like, FT, and EFM gene families, between SG261 and L. cylindrica (Table S16).

Conclusions
In this study, we reported a chromosome-scale reference genome assembly of L. acutangula.Our ribbed loofah genome enriches the loofah gene pool, and the sequenced varieties in this study are the best varieties selected and bred by our organization, which lays the data foundation for the subsequent research of our organization.In the structural variation analysis, the two ribbed loofah genomes allowed us to exclude the interference of intraspecific variation to a certain extent.The evolution time of L. acutangula was later than that of L. cylindrica, and there were some chromosomal translocations and large sequence inversions between L. acutangula and L. cylindrica.A total of 2424 and 1094 genes were affected by SVs and PAVs, respectively, in which 17 FT and FY genes may be related to the differences in flowering time observed in these Luffa species.This

Conclusions
In this study, we reported a chromosome-scale reference genome assembly of L. acutangula.Our ribbed loofah genome enriches the loofah gene pool, and the sequenced varieties in this study are the best varieties selected and bred by our organization, which lays the data foundation for the subsequent research of our organization.In the structural variation analysis, the two ribbed loofah genomes allowed us to exclude the interference of

Figure 2 .
Figure 2. Circos graph of the L.acutangula SG261 genome's characteristics.A, physical map of 13 chromosomes (Mb scale); B, gene density, the number of genes in 400 kb windows; C, SNP density, the number of SNPs between L. acutangula SG261 and L. cylindrica in 400 kb windows; D, InDel density, the number of InDels between L. acutangula SG261 and L. cylindrica in 400 kb windows; E, DNA type transposon elements (DNA-TEs) density, the coverage of DNA-TEs in 400 kb windows; F, tandem repeat (TR) density, the coverage of TRs in 400 kb windows; G, long terminal repeat (LTR) density, the coverage of LTRs in 400 kb windows; H, syntenic blocks.

Figure 2 .
Figure 2. Circos graph of the L. acutangula SG261 genome's characteristics.A, physical map of 13 chromosomes (Mb scale); B, gene density, the number of genes in 400 kb windows; C, SNP density, the number of SNPs between L. acutangula SG261 and L. cylindrica in 400 kb windows; D, InDel density, the number of InDels between L. acutangula SG261 and L. cylindrica in 400 kb windows; E, DNA type transposon elements (DNA-TEs) density, the coverage of DNA-TEs in 400 kb windows; F, tandem repeat (TR) density, the coverage of TRs in 400 kb windows; G, long terminal repeat (LTR) density, the coverage of LTRs in 400 kb windows; H, syntenic blocks.

Figure 3 .
Figure 3. Comparative genome analysis of L. acutangula SG261.(a) Comparisons between the genomes of L. acutangula SG261 and another L. acutangula.(b) Comparisons between the genomes of L. acutangula SG261 and L. cylindrica.(c) Phylogenetic tree of L. acutangula SG261 and other representative Cucurbitaceae genomes based on single-copy orthologous protein sequences.Bar charts display distribution of orthologous in L. acutangula SG261 and 12 other sequenced Cucurbitaceae genomes, multiple copies (brown), two copies (green), and single copy (blue).

Figure 3 .
Figure 3. Comparative genome analysis of L. acutangula SG261.(a) Comparisons between the genomes of L. acutangula SG261 and another L. acutangula.(b) Comparisons between the genomes of L. acutangula SG261 and L. cylindrica.(c) Phylogenetic tree of L. acutangula SG261 and other representative Cucurbitaceae genomes based on single-copy orthologous protein sequences.Bar charts display distribution of orthologous in L. acutangula SG261 and 12 other sequenced Cucurbitaceae genomes, multiple copies (brown), two copies (green), and single copy (blue).

Figure 4 .
Figure 4. Statistics of structural variations (SVs) between L. acutangula SG261 and L. cylindrica.(a) Statistical graph of the quantitative differences in different structural variation types between L. acutangula SG261 as reference and L. cylindrica as reference.(b) Statistical graph of the quantitative difference in different structural variation types between before and after filtering detected with L. acutangula SG261 as reference.(c) Violin plot of distribution law of different structural variation types detected with L. cylindrica as reference genome.(d) Bar chart of the proportion of different SV lengths in different structural variation types detected with L. acutangula SG261 as reference genome.(e) Percent pie chart of different structural variation types detected with L. acutangula SG261 as reference.(f) Violin plot of distribution law of different structural variation types detected with L. acutangula SG261 as reference genome.(g) Bar chart of the proportion of different SV lengths in different structural variation types after filtering detected with L. acutangula SG261 as reference genome.(h) Percent pie chart of different structural variation types after filtering detected with L. acutangula SG261 as reference genome.(i) Violin plot of distribution law of different structural variation types after filtering detected with L. acutangula SG261 as reference genome.

Figure 4 .
Figure 4. Statistics of structural variations (SVs) between L. acutangula SG261 and L. cylindrica.(a) Statistical graph of the quantitative differences in different structural variation types between L. acutangula SG261 as reference and L. cylindrica as reference.(b) Statistical graph of the quantitative difference in different structural variation types between before and after filtering detected with L. acutangula SG261 as reference.(c) Violin plot of distribution law of different structural variation types detected with L. cylindrica as reference genome.(d) Bar chart of the proportion of different SV lengths in different structural variation types detected with L. acutangula SG261 as reference genome.(e) Percent pie chart of different structural variation types detected with L. acutangula SG261 as reference.(f) Violin plot of distribution law of different structural variation types detected with L. acutangula SG261 as reference genome.(g) Bar chart of the proportion of different SV lengths in different structural variation types after filtering detected with L. acutangula SG261 as reference genome.(h) Percent pie chart of different structural variation types after filtering detected with L. acutangula SG261 as reference genome.(i) Violin plot of distribution law of different structural variation types after filtering detected with L. acutangula SG261 as reference genome.

Figure 5 .
Figure 5. Section observation of Luffa angular slices and GO enrichment of SV and PAV genes in L. acutangula SG261.(a) Section observation of angular of L. acutangula.(b) Section observation of angular of L. cylindrica.(c) Venn diagram shows the overlap of genes between SV and PAV.(d) GO enrichment map of SV-related genes.(e) GO enrichment map of PAV-related genes.(f) GO enrichment map of the same genes for SV and PAV.E, epidermal; Cc, collenchymatous cell; S, sclerenchyma; P, phloem; X, xylem; Pc, parenchyma cell.

Figure 5 .
Figure 5. Section observation of Luffa angular slices and GO enrichment of SV and PAV genes in L. acutangula SG261.(a) Section observation of angular of L. acutangula.(b) Section observation of angular of L. cylindrica.(c) Venn diagram shows the overlap of genes between SV and PAV.(d) GO enrichment map of SV-related genes.(e) GO enrichment map of PAV-related genes.(f) GO enrichment map of the same genes for SV and PAV.E, epidermal; Cc, collenchymatous cell; S, sclerenchyma; P, phloem; X, xylem; Pc, parenchyma cell.

Figure 6 .
Figure 6.Schematic diagram of structural variation in major flowering-related genes.(a) Schematic diagram of DUP variation at the location of FY gene.(b) Schematic diagram of DEL variation at the location of FT gene.

Figure 6 .
Figure 6.Schematic diagram of structural variation in major flowering-related genes.(a) Schematic diagram of DUP variation at the location of FY gene.(b) Schematic diagram of DEL variation at the location of FT gene.

Figure 7 .
Figure 7. Expansion analysis of flowering-related gene families and some transcription factor gene families.

Figure 7 .
Figure 7. Expansion analysis of flowering-related gene families and some transcription factor gene families.

Table 2 .
Comparison of several Luffa genomes.