Complete chloroplast genome data for Cryptocoryne elliptica (Araceae) from Peninsular Malaysia

The aquatic plant genus Cryptocoryne, a popular plant genus in the aquarium industry, is made up of more than 50 described species and some 15 naturally occurring named and unnamed interspecific hybrids. Cryptocoryne elliptica has a restricted distribution in the north part of Peninsular Malaysia. Destruction of its natural habitats for various human activities has led to a decline in numbers. Here, we report the complete chloroplast genome of C. elliptica and establish a molecular dataset for a maternally inherited genome. Here, we utilized an Illumina NovaSeq 6000 protocol to sequence the partial genome of C. elliptica and used bioinformatic tools to reconstruct the chloroplast genome in de novo mode. The assembled chloroplast genome is a circular DNA molecule 159,968 bp in length. The chloroplast genome has a quadripartite structure composed of a large single-copy region of 96,273 bp and a small single-copy (SSC) region of 15,205 bp, separated by a pair of inverted repeats (IRa and IRb), each of which is 24,245 bp. The chloroplast genome of C. elliptica encodes a total of 108 genes, comprising 74 protein-coding genes, 30 tRNA genes and 4 rRNA genes. In total, 204 SSR loci were identified, most of which were located within intergenic regions.

The aquatic plant genus Cryptocoryne , a popular plant genus in the aquarium industry, is made up of more than 50 described species and some 15 naturally occurring named and unnamed interspecific hybrids. Cryptocoryne elliptica has a restricted distribution in the north part of Peninsular Malaysia. Destruction of its natural habitats for various human activities has led to a decline in numbers. Here, we report the complete chloroplast genome of C. elliptica and establish a molecular dataset for a maternally inherited genome. Here, we utilized an Illumina NovaSeq 60 0 0 protocol to sequence the partial genome of C. elliptica and used bioinformatic tools to reconstruct the chloroplast genome in de novo mode. The assembled chloroplast genome is a circular DNA molecule 159,968 bp in length. The chloroplast genome has a quadripartite structure composed of a large single-copy region of 96,273 bp and a small single-copy (SSC) region of 15,205 bp, separated by a pair of inverted repeats (IRa and IRb), each of which is 24,245 bp. The chloroplast genome of C. elliptica encodes a total of 108 genes, comprising 74 protein-coding genes, 30 tRNA genes and 4 rRNA genes. In total, 204 SSR loci were identified, most of which were located within intergenic regions.
© 2022 The Author(s

Value of the Data
• Cryptocoryne elliptica habitat is disappearing due to land appropriation for various human activities, and the completed cp genome will provide a useful sequence-based resource for the species. • These data will serve as a basis for researchers to conduct further chloroplast and SSRderived studies, such as phylogenetic, population genetics and photosynthetic or oxidative metabolism studies of the species. • The complete set of chloroplast genomic data will be utilized for comparative studies among different species of Cryptocoryne .

Data Description
Whole genome sequencing (WGS) produced 7.3 G bp of paired-end data, comprising 48,663,546 raw reads with a GC content of 42.11% and PHRED quality score (Q30) of 91.95%. From this amount of output, 2,319,458 reads were assembled, and the percentage of the organelle genome in the WGS was 5.13%. Fig. S1 shows coverage depth for each assembled nucleotide position. The assembly has an average sequencing coverage depth mean of 2,278. The lowest coverage depth is 58, located within an intergenic region that is flanked by trnF-GAA and ndhJ genes. The highest coverage depth is 3678, located within the rps19 gene. Fig. 1 shows the In total, 108 unique genes were annotated, including 74 protein-coding genes, 30 tRNA genes and 4 rRNA genes ( Table 1 ). Six of the protein-coding genes and the 3 exon of rps12 were duplicated in the IR regions. Five of the tRNA genes and all four rRNA genes were also duplicated in the IR regions. The presence of one or two introns was identified in 16 genes, which included 10 protein-coding genes, one rRNA gene and six tRNA genes. Table 1 This shows classification of the genes after annotation of the chloroplast genome. The annotated genes were categorized according to their function. Nominations: underlined : contains one intron; underlined bold : contains more than one intron.   Table S1 shows simple sequence repeats (SSR) identified using MIcroSAtellite (MISA) annotation. Mononucleotide, dinucleotide, trinucleotide, tetranucleotide, pentanucleotide and hexanucleotide motifs are present in the sequence. The percentage of different SSR motifs is visualized as a pie chart in Fig. 2 . Mononucleotide motif has the highest percentage (45.6%) while hexanucleotide motif has the lowest percentage (1%) in the genome. A complete list of SSR loci is provided in Table S2.

Plant material and isolation of genomic DNA
Individual plant species were collected from Bukit Panchor, Penang and re-planted in our green house. Fresh leaf tissue was cleaned carefully before being flash frozen with liquid nitrogen and ground using mortar and pestle. A Qiagen DNeasy Plant Minikit was used for DNA extraction, following the manufacturer's protocols, with several modifications. These modifications include using 130 mg of fresh leaf, the AP1 lysis buffer was pre-heated in 65 °C water bath prior adding powdered leaf tissue. In addition, incubation time for tissue lysis was increased from 10 to 20 min with 3 interval vortexing during the incubation period. In the final step, elution buffer was decreased from 200 to 70 μL. Genomic DNA quality was verified using gel electrophoresis, nanodrop quantification and fluorometric quantification. Subsequently, 10 0 0 ng of genomic DNA was used to proceed with library preparation. Fig. 2. The percentage of each SSR type in Cryptocoryne elliptica . The highest percentage of SSR motifs is represented by mononucleotides (45.6%), followed by dinucleotides (22.5%), tetranucleotides (14.2%), trinucleotides (12.8%), pentanucleotides (3.9%) and, finally, hexanucleotides (1%).

Library preparation and sequencing
An Illumina paired-end cpDNA library (average insert size of 350 bp) was constructed using the TruSeq DNA PCR-Free library preparation kit, according to the manufacturer's protocol. Generally, at least 10 0 0 ng genomic DNA is required for library preparation using this kit, thus, the extracted genomic DNA amount is sufficient for library preparation. The paired-end DNA library was sequenced with 150 bp on a Novaseq 60 0 0 platform (Illumina, USA).

Chloroplast genome assembly
Quality control of the raw paired-end reads (48,663,546 reads) was performed using FastQC [2] to get a general overview of data quality. De novo assembly was done using NOVOPlasty version 4.2.1 [3] , which was basically run using Perl script on a Linux workstation, following the recommendation to use untrimmed WGS data for complete circular genome output. Another assembly was done using three different chloroplast genes ( psaB, psaC and rbcL ) as seed, using Novowrap version 1.12 [4] , to review results reproducibility and quadripartite boundary identification. Further validation was done using a Burrows-Wheeler aligner [5] and BWA-MEM [6] to align the assembled contig back to raw reads. The data were then converted to SAMtools format, prior to getting the read depth for all nucleotide positions of the assembled genome. Statistical analysis was done using R program.

Gene annotation
The web-based program GeSeq ( https://chlorobox.mpimp-golm.mpg.de/geseq.html ) [7] was used to annotate the assembled genome using default parameters to predict protein-coding genes, as well as tRNA and rRNA genes. The Chloe program was invoked to enhance the annotation accuracy [8] . The previously reported Colocasia esculenta chloroplast genome [9] was used to identify the coding regions of cp genes. Consequently, GB2Sequin [10] was used to generate a five-column, tab-delimited feature table upon submission of the results to GenBank. Finally, a circular chloroplast genome diagram was drawn by OrganellarGenomeDRAW (OGDRAW) [1] .

Ethics Statement
This article does not contain any studies with human participants or animals performed by any of the authors.

Supplementary Informations
Fig. S1. Coverage plot for the assembled contig of Cryptocoryne elliptica genome sequences. Table S1. MISA annotation from Cryptocoryne elliptica chloroplast genome sequences. Table S2. The complete list of SSR loci identified from Cryptocoryne elliptica chloroplast genome.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.