Chloroplast genome data of Luffa acutangula and Luffa aegyptiaca and their phylogenetic relationships

Luffa acutangula and Luffa aegyptiaca are domesticated plants in the family Cucurbitaceae. They are mainly cultivated in the tropical and subtropical regions of Asia. The chloroplast genomes of many Cucurbitaceae species were sequenced to examine gene content and evolution. However, the chloroplast genome sequences of L. acutangula and L. aegyptiaca have not been reported. We report the first complete sequences of L. acutangula and L. aegyptiaca chloroplast genomes obtained from Pacific Biosciences sequencing and use them to infer evolutionary relationships. The chloroplast genomes of L. acutangula and L. aegyptiaca are 157,202 and 157,275 bp, respectively. Both genomes possessed the typical quadripartite structure and contained 131 genes, including 87 coding genes, 36 tRNA genes and 8 rRNA genes. We identified simple sequence repeats (SSR) and single nucleotide polymorphisms (SNP) from both chloroplast genomes. Polycistronic mRNA was examined in L. acutangula and L. aegyptiaca using RNA sequences from Isoform sequencing to identify co-transcribed genes. IR size and locations were compared to other species and found to be relatively unchanged. Phylogenetic analysis confirmed the close relationship between L. acutangula and L. aegyptiaca in the Cucurbitaceae lineage and showed separation of the Luffa monophyletic clade from other species in the subtribe Sicyocae. The results obtained from this study can be useful for studying the evolution of Cucurbitaceae plants.

The chloroplast genomes of many Cucurbitaceae species were sequenced to examine gene content and evolution. However, the chloroplast genome sequences of L. acutangula and L. aegyptiaca have not been reported. We report the first complete sequences of L. acutangula and L. aegyptiaca chloroplast genomes obtained from Pacific Biosciences sequencing and use them to infer evolutionary relationships. The chloroplast genomes of L. acutangula and L. aegyptiaca are 157,202 and 157,275 bp, respectively. Both genomes possessed the typical quadripartite structure and contained 131 genes, including 87 coding genes, 36 tRNA genes and 8 rRNA genes. We identified simple sequence repeats (SSR) and single nucleotide polymorphisms (SNP) from both chloroplast genomes. Polycistronic mRNA was examined in L. acutangula and L. aegyptiaca using RNA sequences from Isoform sequencing to identify co-transcribed genes. IR size and locations were compared to other species and found to be relatively unchanged. Phylogenetic analysis confirmed the close relationship between L. acutangula and L. aegyptiaca in the Cucurbitaceae lineage and showed separation of the Luffa monophyletic clade from other species in the subtribe Sicyocae. The results obtained from this study can be useful for studying the evolution of Cucurbitaceae plants. ©

Value of the Data
• L. acutangula and L. aegyptiaca chloroplast genomes are sources of molecular data that confirm complex evolutionary relationships and support the need for phylogenetic research in various plant groups.
• The complete chloroplast genome data could be utilized in the genetics, biotechnology, plant breeding, and ecology fields. • The sequence variation among the chloroplast genomes of Luffa sp. and other representatives of the family Cucurbitaceae enhances the understanding of their phylogenetic relationships. • Polymorphisms in the chloroplast genome (e.g., simple sequence repeats (SSRs) or single nucleotide polymorphisms (SNPs)) can be used to develop potential molecular markers and study evolutionary patterns of Luffa sp. and closely related species.

DNA extraction, sequencing and assembly
Young leaves of L. acutangula (ridge gourd) and L. aegyptiaca (smooth gourd) plants from Chia Tai Company Limited were collected at National Omics Center, Thailand Science Park, Pathum Thani, Thailand in March 2019 for DNA extraction. Genomic DNA was extracted using a CTAB method [2] . Total DNA was examined using a NanoDrop One spectrophotometer (Thermo Scientific, Wilmington, USA) and visualized by pulsed-field gel electrophoresis (PFGE). High quality DNA was used to construct PacBio libraries according to the 'Procedure & Checklist-20 Kb Template Preparation Using Bluepippin Size Selection System' protocol and sequenced on the PacBio RSII system. The short PacBio reads were used to correct the long PacBio reads and the corrected long reads were assembled using CANU version 1.4 software [3] . The resulting contigs were blasted against the plastid genome database to identify any chloroplast contigs, which were used to construct full chloroplast genomes. Young leaves of L. acutangula and L. aegyptiaca seedlings (Chia Tai Co, Ltd) were harvested and genomic DNA isolated using the High Pure PCR Template Preparation kit of Roche. Genomic DNA was examined using a NanoDrop One spectrophotometer (Thermo Scientific, Wilmington, USA). High quality DNA was used to prepare Illumina Hiseq X Ten libraries and 150 bp pair-end sequencing was performed by Novogene, Singapore according to standard Illumina protocols.

Chloroplast genome annotation
The assembled chloroplast genomes of L. acutangula and L. aegyptiaca were annotated using GeSeq MPI-MP CHLOROBOX tool [4] , specifically HMMER, tRNAscan and ARAGORN. An annotated genome map was generated using Organellar Genome DRAW (OGDRAW) [5] . Finally, the preliminary annotations were corrected manually to ensure that the correct start and stop positions were reported.

Codon usage analysis
L. acutangula and L. aegyptiaca coding sequences were used to calculate relative synonymous codon usage (RSCU) value using CodonW version 1.4.2 software [6] . Codon usage frequency was calculated and expressed as the number of codons encoding the same amino acid divided by the total number of codons [7] .

Simple sequence repeat (SSR) analysis
L. acutangula and L. aegyptiaca chloroplast genomes were scanned for simple sequence repeats (SSRs) using MIcroSAtellite (MISA) identification tool [10] . The length threshold of minimum repetitive units were set to ten repeats for mono-nucleotide repeats, four repeats for diand tri-nucleotide repeats, and three repeats for tetra-, penta-and hexa-nucleotide repeats according to the method of Ivanova and co-workers [11] .

RNA editing analysis and polycistronic mRNA in chloroplast genomes
RNA sequencing of L. acutangula [SRA accession number: SRR11445640] and L. aegyptiaca [SRA accession number: SRR11452010] from isoform sequencing (Iso-seq) were obtained from a previous study of Pootakham et al. (2020) [1] . These long-read sequences were mapped to their corresponding chloroplast genomes using BWA-MEM software [12] . Subsequently, RNA editing sites were checked by calling SNPs using GATK and comparing to the genomic SNP data [13] . The RNA reads were mapped against their respective chloroplast genome sequence using blastN version 2.2.28 to identify single reads that spanned more than one gene to identify gene clusters that are co-transcribed.