DNA barcode: a potential tool for identifying ‘Hoa Loc’ mango cultivar in Vietnam

Mango is one of the most valuable fruiting plant and occupies a crucial position in Vietnam’s agriculture. There are various indigenous mango cultivars which originate from Vietnam. Utilization of DNA barcode for mango authentication is an appropriate solution that overcomes the limitations of morphological-based methods. In this study, 33 samples, representing 19 mango cultivars, were analysed by amplifying and sequencing the internal transcribed spacer (ITS) and maturase enzyme gene (matK). The results showed that these two barcode candidates were amplified successfully in all samples. ‘Hoa Loc’, a high quality and native mango cultivar is discriminated from others by 52 variation sites in ITS sequence analysis, while the result is 27 for matK. The results also revealed that the noncoding sequence, ITS, has high interspecific distance among such cultivars and should be proposed as a promising DNA barcode for mango identification, based on both sequence quality and discrimination power.


Introduction
Mango (Mangifera spp.), one of the long-standing fruits, has been favoured not only by Vietnamese, but also by international consumers. In terms of mango cultivation, Vietnam ranks 14 th in the world with over 700,000 tons (Pariona, 2018) of various varieties produced in this country and have been exported to nearly 40 different nations including China, European countries, the Republic of Korea, Japan, Australia and New Zealand (Vietnam Economic News, 2019). With a good opportunity to expand further in the global market, not only is the mango cultivation required to meet the Global GAP, but it is also vitally important to confirm and retrieve the mango origin, and protect the trademark of its varieties that includes the wellknown 'Hoa Loc' mango. In recent years, various methods have been developed to confirm and identify mango cultivars of different characteristics, and distinguish a specific variety from others that share similar morphological traits. It is, however, criticised that these methods have low accuracy and some limitations in analysing the processed mango products. To overcome these drawbacks, the application of DNA barcode for the identification of a wide range of plant species has increasingly attracted great attention from researchers. DNA barcode is a tool using a short DNA segment of a given genome as a unique code sequence that allows the researchers to distinguish one individual organism from others (Fazekas et al., 2012;de Vere et al., 2015). Recently, a database of DNA barcodes has widely been developed, attracting great interest from scientists around the world, and this is expected to become a trend in scientific studies in the future (Kress, 2017). Considered as a novel alternative tool, DNA barcode is believed to greatly facilitate the classification and identification of new species, retrieving information of living plants, or their products, which are under slow death after the treatment processes (Barcaccia et al., 2016;Mishra et al., 2016;Kress, 2017) therefore, DNA barcode has various applications in scientific studies as well as in practice (Yang et al., 2012;Shneyer and Rodionov, 2019). Ten years ago, the candidate gene located in chloroplast, namely matK, was assigned as DNA barcode for plant authentication by the Consortium for the Barcode of Life (Shneyer and Rodionov, 2019). The ITS was evaluated by Chinese researchers (Zhang et al., 2016) because it contains both conservative and evolutionary regions (Larranaga and Hormaza, 2015). Zhao et al. (2018) reported the effectiveness of the ITS2 region in discriminating between species belonging to the Zanthoxylum genus with absolute accuracy. Additionally, the ITS region could also be used to distinguish species in the Gentianaceae family (Zhang et al., 2016), cocoa accessions (Ha et al., 2017), Plantaginaceae family (Ay et al., 2018) and sugarcane genotypes (Jomhe et al., 2018). With the merits of quick and highly reliable identification of species, DNA barcode enables scientists to obtain the results within only four hours if the data library is accurately constructed. This tool can be potentially developed and applied in the activities of accreditation. Taking the advantages of DNA barcode into account, this study has aimed to evaluate the variants among mango cultivars, based on the ITS and matK sequences, and also to apply the DNA barcode in identifying 'Hoa Loc' mango of Vietnam.

Material
Mango leaves were collected in Mekong delta regions consisting of Tien Giang provinces (Cai Be district and Southern Fruit Research Institute (SOFRI)), Vinh Long, Hau Giang, Soc Trang, An Giang and Can Tho provinces, Vietnam. Leaves of the ortet plants of Hoa Loc mango (the highly evaluated plants from which a clone is produced by reproduction or propagation) were picked up at SOFRI.

Variation calling analysis
Sequences and compositions of DNA fragments (primers of matK and ITS) for each mango sample were evaluated by BioEdit software. The DNA sequences of the same cultivar were aligned so that the common ones (consensus sequences) could be selected. Subsequently, these consensus sequences were compared from one another to find variable sites of nucleotide substitutions by the MEGA-X-10.0.5 program.

Results and Discussion
Amplification of ITS and matK regions A comparison with the DNA ladder revealed that the ITS regions were 700 -750 base pairs (bp) in size ( Figure 1). All PCR products were visualised on the agarose gel with beautiful bright bands, except for an invisible one in the well 7, thereby requiring it to be repeatedly loaded for confirmation. All confirmed PCR samples were then sequenced.  With 2% of agarose gel, the bands were comparable in sizes and visible on the gel. Furthermore, fewer byproducts were displayed on the gel, demonstrating that DNA primers had been specifically designed to favourably bind to the DNA templates. The quality of all amplified samples was evaluated to meet the criteria for DNA sequencing. As PCR samples of ITS regions were sequenced, the resulted information showed that the sizes of these DNA fragments were comparable to those amplified by primers of White et al. (1990). Apart from that, no disturbances were observed in the obtained sequences, thereby allowing them to be used for detailed analysis. Meanwhile, the findings showed that the sequences of primers matK used in this study were the same as those of Pei (2012), but the amplification produced shorter DNA fragments than those in the previous report, with 750 and 930 bp, respectively. Genetic locations with disturbances were excluded for analysis. Genetic diversity was observed in different cultivars of mango, with 52 sites of variations (Table 4). Especially, Hoa Loc cultivar was quickly detectable, thanks to its genotypic difference at the nucleotide 624 ('Hoa Loc' had an A nucleotide, while X.UC, X.CHAU-TG, X.NGOCVAN-VCAQMN were C nucleotide, and the other mango cultivars had the G nucleotide at that position). In terms of nucleotide compositions, the cultivar of X.HOALOC was compared with other varieties, and the findings showed that there were 52 sites of variation (Table 4). Furthermore, regarding the CONSENSUS-HOALOC, this cultivar had GGG at the nucleotides 559, 589 and 633, while other varieties including DUDU-AG, CONSENSUS-XUC, XCHAU-TG, XNV-VCAQMN and XCHAU-TG contained AAC at the corresponding nucleotide locations.
Do Tan Khang et al.

Table-4: Nucleotide positions of variant in mango cultivars by the ITS region
. The genetic analysis of matK regions by BioEdit showed that almost 16 sequences experienced many different sites of variation, 27 of which could be potentially employed to distinguish Hoa Loc mango from the other cultivars. Based on the data, there were a total of 33 samples from which our researchers analysed the consensus sequences of 5 groups of mango cultivars: CONSENSUS-X.HOALOC (X.HOALOC-VCAQMN, X.HOALOC-TG, X.HOALOC-CT, X.HOALOC-BT, X.HOALOC-AG); CONSENSUS-X.BUOI (X.BUOI-TG, X.BUOI-CD [The genetic fragments were not long enough to be analysed], X.BUOI-NK, X.BUOI-ST, X.BUOI-AG [The obtained sequences have interfered with disturbance, and their length was not qualified for analysis, (Table 5)], X.BUOI-HG); CONSENSUS-X.DAILOAN (X.DAILOAN-TG, X.DAILOAN-CT, X.DAILOAN-AG); CONSENSUS-X.TUQUY (X.TUQUY-CT, X.TUQUY-HG); CONSENSUS-X.UC (X.UC-CT, X.UC-VL). The sample of X.KEO-AG was not analysed, probably due to the lack of DNA sources or the disturbance in its sequences. The evaluation of the sequences of CONSENSUS-X.UC and CONSENSUS-X.DAILOAN was not performed because of the disturbance caused by a mix of 2 peaks. The results revealed that there was a clear genotypic variation between CONSENSUS-X.HOALOC and the remaining 15 varieties at nucleotides 54,55,58,59,60,65,247,250,251,280,302,305,308,500,816,817,818,820. However, CONSENSUS-X.BUOI and CONSENSUS-X.HOALOC were found to share the same nucleotide, 55. It is noticeable that at the sites of 291,292,293,482,483,484,485,486 and 487, CONSENSUS-X.HOALOC had nucleotides that were not available in other cultivars (Table 5). In plants, the ITS region is prominent for identification of the genus level or species (Sang, 2002;Alvarez and Wendel, 2003;Razafimandimbison et al., 2004). The nuclear ITS sequence facilitated the high level of interspecific distance in 10 Mangifera species in central Sumatra, with 22.6% variable sites (Fitmawati et al., 2016). The difference of variable sites between the nucleus noncoding sequence (ITS) and matK was also observed from various plant species.   In a research on DNA barcoding of endangered Paphiopedilum species in Malaysia, the result reported that matK was the most promising barcode with high sequence quality (100%), high accuracy in BLASTn (100%) and clear resolution of species in neighbourjoining phylogenetic tree (100%) (Rajaram et al., 2019). Yang et al. (2017) evaluated the species discrimination of 35 Chinese oak species. The results indicated that there were 96 variable sites among such kinds of plants for ITS sequences, much more than 34 in the case of the matK gene. The sequence characteristics of the four regions (ITS, matK, rbcL and trnH-psbA) in Codonopsis illustrated that the ITS had the highest percentage of variable sites (52.19%) (Wang et al., 2017). Additionally, DNA barcode data from eight Paphiopedilum species, ornamental flowers, showed the divergence of ITS sequence to be 32.7%, while this rate in matK was only 10%. Thus, it is reasonable to suggest that ITS sequence is a potential DNA barcode candidate compared to chloroplast markers in various plant species. The finding is supported from the studies of Zhang et al. (2016) and Braukmann et al. (2017).

Conclusion
The findings from this study partly showed the sites of genetic variation used to characterize Hoa Loc mango from other varieties through an analysis of matK and ITS regions. Nearly as twice as many sites were found in the ITS region as in matK, with 52 and 27, respectively, leading to the fact that the utilization of ITS could provide better identification of 'Hoa Loc' mango compared to matK.