Genomic Data Reveal the Genetic Variation among Natural Mangifera Casturi Kosterm. Hybrids, an Underutilized Fruit Tree Under ‘Extinct in Wild’ Status from Kalimantan Selatan, Indonesia

Background: Mangifera casturi Kosterm. is an endemic local mango fruit from Kalimantan Selatan, Indonesia. The limited genetic information available on this fruit has severely limited the scope of research into its genetic variation and phylogeny. This study aimed to collect genomic information from M. casturi using next-generation sequencing technology and to develop microsatellite markers and perform Sanger sequencing for DNA barcoding analysis. Results: The clean reads of the Kasturi accession of M. casturi were assembled de novo using a Ray assembler, producing 259,872 scaffolds with an N50 value of 1,445 bp. Fourteen polymorphic microsatellite markers were developed from 11,040 sequences containing microsatellite motifs. In total, 55 alleles were produced, and the mean number of alleles per locus was 3.93. Results from the microsatellite marker analysis revealed broad genetic variation in M. casturi. Phylogenetic analysis was performed using internal transcribed spacers (ITS), matK, rbcL, and trnH-psbA. The phylogenetic tree of chloroplast markers showed that Kasturi, Mawar, Pelipisan, Pinari, and Hambawang belong to one group, with M. indica as the female ancestor. In comparison, the phylogenetic tree of ITS markers indicated several Mangifera species as multiple ancestors of M. casturi. Conclusions: This study strongly suggested that M. casturi originated from the cross-hybridization of multiple ancestors. Further, crossing the F1 hybrids of M. indica and M. quadrida with other Mangifera spp. was hypothesized to produce the observed high genetic variation. The genetic information for this fruit is also a resource for the breeding and improvement as well as for conservation studies of this species.


Background
Mangifera casturi Kosterm. (Anacardiaceae), or Kalimantan mango, is an endemic fruit from Kalimantan Selatan, Indonesia and is classi ed as extinct in the wild according to the IUCN Red List [1]. This local mango belongs to the Mangifera genus within the Anacardiaceae [2] and is classi ed as a common ancestor of Mangifera in Indonesia [3]. The fruit is known by various local names, such as kasturi, mawar, pelipisan, and pinari [2]. Based on phylogenetic analysis using single nucleotide polymorphisms (SNPs), M. casturi is proposed to be a natural hybrid of M. indica and M. quadri da [4]. This local mango is a prospective genetic resource that can be used to improve mango varieties in the future as M. casturi bears small fruits, attractive purple colors, and a distinctive aroma [5]. It is also known to contain useful metabolite compounds, including lupeol, an antioxidant and anticancer agent [6]. However, genomic information on this local mango is still limited. In nucleotide repositories such as the NCBI, only one nucleotide accession has been deposited (MF678493.1) and one SRA study (SRP183190) has been reported.
Recently, sequencing technology has developed signi cantly, from Sanger sequencing to next-generation sequencing technology such as whole genome sequencing (WGS), which can provide comprehensive information on species [7]. From these data, it is easier to obtain genetic information such as microsatellite markers, which have advantages over other markers such as RAPD and AFLP, also used in other Mangifera species [8,9]. Microsatellite markers can determine distinct variations on several levels as they are codominant and thus widely used in populations and genetics [10]. Microsatellite markers have been reported to determine genetic variation in M. indica [11].
Previously, phylogenetic studies using DNA barcoding methods, such as rbcL, matK [12], and trnH-PsbA [13], as well as internal transcribed spacers (ITS) and the second internal transcribed spacers (ITS2) from nuclear ribosomal DNA [14], have been widely used for phylogenetic analysis at various taxonomic levels. These markers can also be determined at the genus or family level because of their inheritance from a maternal ancestor. However, ITS can determine the barcoding of both parentals. This study aimed to collect genomic information from M. casturi using next-generation sequencing technology to develop microsatellite markers and Sanger sequencing for DNA barcoding. Currently, there is no clear information about the genetic variation in M. casturi and its ancestor, which originates from natural hybrids of M. indica and M. quadri da. These markers were also used to determine the genetic variations of M. casturi hybrids.

Results
In this study, 11.01 gb of M. casturi DNA was produced with high-throughput sequencing using a Illumina HiSeq 4000 with 2 × 150 bp paired-ends. The raw data were registered in DDBJ with accession number DRA011022. After ltering, clean reads were obtained, and 10.95 gb of de novo genome assembly was performed using a Ray Assembler. The scaffold obtained was 259,872 bp with an N50 value of 1,445 bp and a maximum scaffold length of 144,601 bp ( Table 1). The annotation process used BUSCO, assessed using details of the complete categories and single-copy BUSCOs (S), with a ratio of 42.3% (Table 2).  Microsatellites were identi ed using the MISA program and 11,040 sequences containing microsatellite motifs, and 770 sequences with more than one microsatellite site were extracted ( Table 3). The trinucleotide motif exhibited the highest proportion (52.77%), followed by the dinucleotide motif (33.3%).
( Table 3). Fourteen candidate sequences were selected and identi ed ( Table 4). All con rmed primers were ampli ed and then registered with the DDBJ accession numbers shown in Table 4.  R : GCAACCCTTACCAACAAGCA Eight samples were used to validate and determine allele size using QIAxcel®. The 14 primers produced 55 alleles in total, and the mean number of alleles per locus was 3.93 (Table 4). All loci were polymorphic ( Table 5). The mc-230178 and mc-58089 loci produced six alleles, while mc-122955 and mc-88387 produced two alleles. Some loci showed the same alleles between Kasturi and Mawar, namely mc-176197, mc-21672, and mc-88075. In the mc-88387 locus, only the Kasturi sample was not ampli ed, and it was proposed that this locus was a null allele of Kasturi. Therefore, mc-88387 can be used to identify M. casturi in the population, as it is otherwise similar to other Mangifera species, such as Mawar. The UPGMA tree was produced using 14 loci (Fig. 2). M. quadri da and Rawa-rawa were placed in the same clade. All accessions of M. casturi were in the same clade as M. indica, even as an out-group for this analysis (Hambangan or M. foetida). Mawar accessions were most closely related to M. indica. Kasturi and Pelipisan had the same clade. Some markers showed allele similarity between Kasturi and Pelipisan; these accessions, thus, had a closer genetic relationship to each other than to Mawar. However, Pinari also exhibited distinct genetic differences from the other accessions of M. casturi, even though Mawar was quite distant from another M. casturi accession. Phylogenetic analysis was performed using three widely used chloroplast markers (Fig. 3).

Discussion
The Mangifera genus originates from southeast Asia and has polyembryony seeds, which originate from gametes or nucellar cell components [2]. Most Mangifera owers are either hermaphrodites or males [32]. Self-crossing can occur in a variety of species. However, self-incompatibility in the Mangifera genus has been reported in several mango types [33]. This evidence suggests that Mangifera can be crossed among varieties and species [2,34]. Many interspecies that have resulted from cross-hybridization in natural populations have been reported, including M. odorata (Kuini), a natural hybrid between M. indica and M. foetida [9].
Based on the indications from the 14 microsatellite loci, different allele sizes were obtained from four accessions of M. casturi. A high level of genetic variation was found to occur in M. casturi accessions and may have arisen from interspeci c hybridization. The accessions of Kasturi, Mawar, and Pelipisan were more closely related compared to those of Pinari. Kasturi and Mawar and were very similar in terms of fruit size; however, the fruit shape of Kasturi was more oval than that of Mawar. In contrast, the Pelipisan variety fruit was more oval and oversized. Pinari showed the largest fruit size among the M. casturi accessions. Pinari was classi ed into the M. casturi group by the locals, based on a purplish skin color, which is similar to that of other M. casturi accessions.
Intraspecies genetic variation can occur because of multiple cross-hybridizations involving several species. For instance, in the natural hybrid of Kuini (M. odorata), cross-hybridization between M. indica and M. foetida was revealed by AFLP analysis and represented a simultaneous backcross between the F1 hybrid and M. foetida [4,9]. Based on SNP analysis using double-digest restriction-site-associated DNA (ddRAD) data, M. casturi was revealed to be a natural hybrid between M. indica and M. quadri da, whereas their F1 hybrid was a backcross with M. indica. Morphologically, M. casturi was very close to M. quadri da, with purplish skin and a small fruit size [4].
In the allopolyploid plant mangosteen (Garcinia mangostana), microsatellite markers show crosshybridization with multiple ancestors, including G. malaccensis, G. celebica, and G. porrecta [22]. Our microsatellite analysis results showed that four accessions of M. casturi, Kasturi, Mawar, Pelipisan, and Pinari had allelic differences in all microsatellite loci. However, allele sharing between four accessions of M. casturi was shown in the mc-8693 locus with an allele size of 160/182. This evidence suggests that these accessions were derived from the same ancestor. In contrast, the allele differences in microsatellite loci indicated a possibility that the four accessions of M. casturi species underwent cross-hybridization with multiple ancestors.
DNA barcoding analysis using matK and rbcL suggested very high nucleotide similarity between the four accessions of M. casturi. This evidence indicated that the maternal ancestor of these accessions was the same and that M. indica is one of the maternal ancestors. Additional evidence was shown in the trnH-psbA phylogenetic tree, where Pinari showed a different maternal ancestor from the other accessions of M. casturi. It is possible that one of the M. casturi hybrids crossed with other Mangifera spp. as the mother ancestor. The ITS of the phylogenetic tree also revealed that three accessions of M. casturi, excluding Pinari, belonged to the same sub-group, which contrasts with a previous hypothesis that M. casturi is a cross-hybrid between M. indica and M. quadri da. Our results also support the hypothesis that F1 hybrids crossing with other Mangifera spp. produced variations in M. casturi in the natural population.

Conclusions
The results of this study demonstrated broad genetic variation in M. casturi. This represents an important source of genetic resources for breeding and the improvement of mango characteristics in the future. More intensive conservation efforts are needed, as M. casturi is currently classi ed as extinct in the wild, and its habitat is severely threatened. Kalimantan Selatan is well known to contain abundant coal that could be exploited extensively in the near future. This poses a serious threat to the existence of this local mango. Moreover, M. casturi has never been con rmed as a variety by authorities. The results of this study can help breeders and local governments to o cially register one of their precious germplasms.

Methods
M. casturi accessions were collected from the Banjar, Kalimantan Selatan, in the southern region of Borneo Island ( Fig. 1; Supplementary Table 1). To analyze whole genome sequencing, genomic DNA was isolated from M. casturi (Kasturi accession) using a DNeasy Power Plant kit (Qiagen) following the manufacturer's protocol. The quality and quantity of DNA were analyzed using a NanoPhotometer® NP80 Touch (IMPLEN) spectrophotometer. Genomic DNA samples were sent to Novogen-AIT Singapore with 150 paired-ends (PE) collected using an Illumina HiSeq4000 system. Raw reads were quality controlled using FASTQC [15], and clean reads were ltered using the Fastp program with default parameters [16]. Clean reads were assembled using a Ray assembler [17] under the Maser Platform facility [18]. BUSCO analysis [19], using the Maser Platform with default parameters, was performed to check the assembled contig quality.

Microsatellite marker development and validation
Microsatellite markers were extracted using the MISA program [20], the parameters being set to the following minimum repeat levels: six for two bases, and ve for three, four, ve, and six bases. The difference between microsatellite motifs was 100 bases. The primer was designed using the web version of Primer 3 [21].
Genomic DNA was isolated using the modi ed CTAB method, with a slight alteration [22]. The quality and quantity of DNA were assessed using a NanoPhotometer® NP80 Touch (IMPLEN). A microsatellite PCR kit (QIAGEN) was used to analyze the microsatellite markers. PCR master mix was prepared with a mixture of 3.2 µL RNase-free water, 5 µL 2x Type-it Multiplex PCR Master Mix, 0.4 Q solution, 0.2 µL of 10 µM forward primer, and 0.2 µL of 10 µM reverse primer. PCR was performed using a SimpliAMP Thermo Cycler (Applied Biosystems). The PCR conditions were as follows: initial conditions of PCR predenaturation at 95°C for 5 min, followed by 32 cycles of denaturation at 95°C for 30 s, annealing at 57°C for 1 min 30 s, extension at 70°C for 30 s, and nal extension at 60°C for 30 min. The amplicons were checked using 1% electrophoresis gel in TAE buffer for 20 min at 100 v. Before loading the sample into capillary electrophoresis (QIAxcel®, Qiagen), the sample was diluted twice and then run using a QIAxcel DNA High-Resolution Kit. Allele size data were con rmed and processed manually using QIAxcel ScreenGel Software (Qiagen).
The molecular data were processed using the Phylip program version 3.695 with the unweighted pair group method and arithmetic mean (UPGMA) method. The resulting dendrogram was edited using the program MEGA-X [23].

DNA barcoding and ancestral phylogenetic analysis
For the DNA barcoding analysis, we used three chloroplast genes: matK [24], rbcL [25], and trnH-psbA [26], as well as one nuclear DNA region, the internal transcribed spacer (ITS) [27]. PCR barcoding was performed using KOD Plus (Toyobo) according to the manufacturer's protocol. The PCR products were cleaned using ExoSAP-IT™ PCR Product Cleanup Reagent (Applied Biosystems). Then, PCR sequencing was carried out with a BigDye™ Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems), followed by puri cation using a BigDye XTerminator ™ Puri cation Kit (Applied Biosystems) according to the manufacturer' s protocol. The products were sequenced using a 3500 Series Genetic Analyzer (Applied Biosystems). Sequence data were analyzed using Sequencing Analysis Software v6.0 (Applied Biosystems), and the data were processed with ATGC-MAC version.7 (Genetyx Co.) and MEGA-X software [23].
Phylogenetic trees were inferred using the maximum likelihood method and constructed using MEGA X software [28]. The best DNA model was calculated using MEGA X for each marker [29,30]. Phylogenetic trees were tested using 10,000 bootstrap replicates [31].

Declarations
Ethics approval and consent to participate