Genetic polymorphism and cluster analyses based on the conserved DNA - derived polymorphism markers in the selected Harumanis and non - Harumanis mango varieties

Various mango varieties have been cultivated in Malaysia for decades and the fruit plays a significant impact on trading nationwide. Harumanis is the most outstanding mango variety in terms of taste and quality which led to a premium price of up to 8.57 USD per kilogram. This has steered fraud in substituting with cheaper mango varieties such as Tong Dam and Susu due to similar morphological features. Morphological characteristics are commonly used to differentiate Harumanis mango from other varieties although it is inefficient, less stable, and affected by environmental factors. This research aimed to evaluate the genetic polymorphism in three mango varieties and assess the potential of conserved - DNA derived polymorphism (CDDP) as a DNA marker in differentiating Harumanis and non - Harumanis mango samples. A total of fifteen Harumanis and non - Harumanis mango samples were studied. A total of 371 bands were amplified by a set of six CDDP primers for fourteen mango leaf samples. The percentage of polymorphism observed for all six primers was higher than 65%. Primer WRKY - R1 showed the highest polymorphism percentage and polymorphism information content, which was 100% and 0.44, respectively, making it the most efficient CDDP primer to differentiate Harumanis and non - harumanis mango varieties in this study. Primer WRKY - F1 showed the highest value of resolving power at 8.57 with the highest number of loci was 15. The UPGMA dendrogram constructed based on CDDP data revealed fourteen samples were grouped into four major clusters with all different varieties forming their own clad. The study demonstrated that CDDP markers can be effectively used in the characterization of different mango genotypes and in genetic diversity analysis, facilitating the development of DNA fingerprinting of the leading Harumanis mango, as well as better management of mango fruit resources in Malaysia.


Introduction
Mango (Mangifera indica L.) belonging to the family of Anacardiaceae is a common and popular tropical fruit throughout the year in Malaysia.There are various challenges in mango cultivation including postharvest diseases (Kadam et al., 2022), fruit fly attacks (Trisnaningsih et al., 2022), climate variability (Asare-Nuamah et al., 2022), improper harvesting age (Astuti et al., 2022), diverse dry matter content (DMC) at harvest period (de Freitas et al., 2022), intravarietal variability (Begum et al., 2014) and lack of systematic genetic knowledge (Rahman et al., 2018).(Afiqah et al., 2014;Lawson et al., 2019).However, Harumanis is the most prominent mango variety (Zakaria et al., 2018).This variety is oblong in shape with a conspicuous beak, the skin remains green or yellowish green when it is ready to be harvested.Besides, this special variety is well known for its attractive appearance, great sweet taste with high nutritional values and well-known as an icon of Perlis (northernmost state in Malaysia).The demand for Harumanis is higher than its supply due to the limited flowering and fruit-bearing season, hence prompting its premium price.
Characterization and identification of mango is commonly accomplished by using morpho-physiological markers assessing several traits such as fruit length, thickness, width, weight, percentage of stone, peel, and pulp as well as total soluble solids (TSS) (Begum et al., 2014).The advanced approach is by visible imaging based on shape and mass (Ibrahim et al., 2016).Although the advanced approaches have their own strengths, they are still carried out based on morphophysiological characteristics and visual inspection.However, the visual inspection of morphological characteristics may not be stable as it is affected by ecological and environmental conditions (Singh et al., 2009;Begum et al., 2013).
In Perlis, Harumanis plays an important role in trading and has been exported to foreign countries such as Japan.Nevertheless, the high value of Harumanis is not only influenced by its taste but also by its limited supply since can only be found in Perlis, Malaysia (Zakaria et al., 2018).The high value and popularity of Harumanis have caused the fruit to be the aim of economic adulteration involving substitution and misinterpretation (Zakaria et al., 2018).This happens when deceitful sellers substitute Harumanis with cheaper varieties that pose similar morphology such as Tong Dam and Susu.
The situation gives rise to another challenge in Harumanis trading which is the identification of the Harumanis varieties.Buyers usually have difficulties in identifying authentic Harumanis mango through a common morphological approach relying on visual inspection.A few varieties of mango such as Susu and Tong Dam have almost identical morphology with Harumanis and thus could be mistaken as the latter.In addition, visual inspection may not be accurate as physical appearances are usually affected by environmental conditions (Begum et al., 2013) and to complicate things, phenotypic variability is not limited to intervarietal only, but also within the same variety (intravarietal).
Several efforts have previously been conducted in order to understand and evaluate the morphological variability within Harumanis mango variety selected from different locations and tree ages (Yusuf, Rahman, Zakaria and Wahab, 2018;Yusuf et al., 2018;Yusuf et al., 2020).The morphological characterization of this fruit is limited in number, but the general conclusion is that this variety has a complex inheritance pattern and the morphological variability is affected by environmental conditions.
Genotypes that exhibit mutations in their genome lead to polymorphism among individuals or populations.Amplification of specific DNA regions using specific markers has enabled determining polymorphism among individuals and populations.Molecular markers such as random amplified polymorphic DNA (RAPD), start codon targeted (ScoT), conserved DNA-derived polymorphism (CDDP), amplified fragment length polymorphism (AFLP), and inter simple sequence repeat (ISSR) have been extensively used in genetic diversity studies.
Morphological traits backed with genotype data should be an auxiliary tool in understanding species variability since morphological traits are subjected to various factors.To date, the RAPD marker has been utilised to assess the genetic diversity and population structure of various mango varieties (Jena et al., 2021), while Harumanis identification based on the chloroplast DNA sequence has been reported, however, it's unable to differentiate between Harumanis and non-Harumanis mango efficiently (Rahman et al., 2018;Rahman et al., 2020).
Currently, a systematic and direct comparison using polymorphic DNA markers in differentiating Harumanis and non-Harumanis cultivars is still lacking.Sufficient data related to the Harumanis genome is needed in order to develop a specific and useful molecular DNA marker for this special industrial variant.According to Collard and Mackill (2009), the plant genome contains conserved genes that can be amplified by a conserved DNA-derived polymorphism (CDDP) primer.CDDP primers are designed to target conserved sequences of plant functional genes involved in abiotic and biotic stresses or plant development (Ma et al., 2022).Thus, the CDDP has advantages compared to random markers in quantitative trait loci (QTL) mapping applications (Andersen and Lübberstedt, 2003).
The banding patterns of well-characterised plant genes are generated by the short tags of the CDDP marker (Poczai et al., 2013) (Aouadi et al., 2019;Igwe et al., 2021;Golian et al., 2022;Ma et al., 2022).To our knowledge, this is the first report on using CDDP marker to evaluate genetic diversity in M. indica germplasm in Malaysia.This study aimed to evaluate the ability of CDDP primers and understand the genetic diversity of selected Harumanis and non-Harumanis mango varieties populations.

Biological samples and DNA extraction
A total of fifteen leaf samples from three varieties of mangoes were collected, and five samples for each mango variety were used in this study, the Harumanis, Tong Dam, and Susu.A leaf sample of Harumanis mango was collected from a greenhouse in the Institute of Sustainable Agrotechnology (INSAT) UniMAP, while the other four samples were collected from an orchard at Jabatan Pertanian Perlis in Perlis.All five samples of Susu mango leaves were collected from Jabatan Pertanian Perlis.Tong Dam mango leaf samples were collected from an orchard at Pusat Pertanian Gajah Mati in Kedah.The leaves samples were labelled as HM 64, HM 70, HM 72, HM 78, and HM 28 for Harumanis, TD 1, TD 2, TD 3, TD 4, and TD 5 for Tong Dam and S4, S5, S6, S7, and S9 for Susu mangoes.The sample information and location of leaf samples being used in this study are tabulated in Table 1.
Leaves were washed under running tap water and kept in a clean plastic bag with a clear label after being collected.Then, the leaf samples were stored in a -20°C freezer until subjected to further analyses (Begum et al., 2013).The DNA extraction method was conducted using CTAB extraction method as described by Yi et al. (2018) with slight modification.The leaves were ground using mortar and pestle, followed by 500 µL of pre-warmed CTAB buffer was added (Edwards et al., 1991).The samples were placed in microcentrifuge tubes containing pre-warmed CTAB buffer until reaching 1 ml.The DNA was eluted in 25 µL of sterile distilled water and stored at -20°C.The DNA quality was verified using agarose gel electrophoresis (1% v/w) and the Gel Documentation System.

Polymerase chain reaction of universal trnL-F and CDDP primers
Since the study employed newly tested CDDP primers on local mango samples, the quality of genomic DNA was validated by performing PCR amplification using a universal primer pair of trnL-F as control (Hollingsworth et al., 2011;Rahman et al., 2018).Six conserved DNA-derived polymorphism (CDDP) primers were used to analyse the genetic polymorphism of fifteen mango leaf samples.These primers were involved in the amplification of functional regions of four genes: WRKY, ERF, MADS, and KNOX genes with universal primer trnL-F as control (Table 1) (Collard et al., 2009;Hollingsworth et al., 2011;Rahman et. al., 2018;Ma et al., 2022).Each of the CDDP primers has having different length.The PCR amplification was accomplished by using a thermal cycler with six CDDP primers according to the protocol suggested by Collard and Mackill (2009).
The reaction mixture of PCR consisted of 1× Lucigen Master mix, 1 μM CDDP Primers ( The PCR amplification was conducted with an initial denaturation at 94°C for 3 mins, followed by 35 cycles of denaturation at 94°C for 1 min; annealing at 50°C for 1 min, and 72°C of elongation for 2 mins.The final elongation step was held for 5 mins at 72°C (Collard and Mackill, 2009).The list of CDDP primers is tabulated in Table 1.The amplicons were then subjected to gel electrophoresis for the analysis of the banding pattern.

Data analysis
The bands were scored either present (1) or absent (0) across the samples.The specific bands were recorded from the banding pattern obtained for each primer and sample.Based on the binary approach, the distance matrix using the Jaccard coefficient was calculated and the UPGMA dendrogram was constructed using online accessible software (http://genomes.urv.cat/UPGMA/).The unweighted Pair Group Method with Arithmetic Average (UPGMA) was developed by Sokal and Michener in 1958 and is a straightforward approach to building a phylogenetic tree from a distance matrix (Weiß and Goker, 2011).UPGMA is the simplest method of tree construction.The result of this analysis was indicated using a dendrogram.
Critical parameters of CDDP primers included the total number of bands for the loci (TNB), number of polymorphic bands (NPB), number of monomorphic bands (NMB), total number of amplified bands (TAF), polymorphism ratio or percentage (%P), polymorphism information content (PIC) and the resolving power (Rp) were calculated by Microsoft Excel software.PIC was calculated using the formula provided in Roldan-Ruiz et al. (2000).RP was calculated as Rp = ∑Ib, where Ib = 1 − (2 × |0.5 − p|) and "p" is the proportion of genotypes containing the band (Prevost and Wilkinson, (1999).RESEARCH PAPER (Kouakou et al., 2022), maize (Hamdan et al., 2016) and plant material (Sika et al., 2015).

DNA extraction, polymerase chain reaction and band scoring analysis
In band scoring analysis of CDDP primers, out of 15 samples of mango genotypes analyzed, a single sample (S7) was eliminated from further analysis due to its consistency in producing negative results.The S7 displayed no bands after PCR amplification using all of the six primers.There could be a few reasons why such results were produced, such as an insufficient amount of DNA template in the reaction mixture and some inhibition or contamination reaction occurring during PCR amplification (Bacich et al., 2011;Mubarak et al., 2020).Difficult templates such as GC-rich regions are challenging to amplify due to the formation of stable and complex secondary structures within a DNA template that could block DNA polymerase during PCR reaction and lead to an ineffective amplification (Obradovic al., 2013).
Therefore, only the remaining 14 samples were used for further analysis of CDDP.It is also important to note that the CDDP technique has so far not been tested on the local Mangifera indica genome.Amplified bands were scored for the presence (1) or absence (0) for each primer combination generate the binary 0/1 matrix.A specific example of band scoring for CDDP primer ERF1 for band scoring was tabulated in Table 3.The results of band scoring analysis of other CDDP primers such as WRKY-F1, WRKY-R2, ERF1, KNOX-1, KNOX-2, and MADS-1 across the samples were summarized by totalling up the band scores of each primer across the samples in Table 4. Six CDDP primers used yielded a total of 371 bands with an average of 62 bands per primer.

3.2
Evaluation of polymorphism percentage, polymorphism information content and resolving power of CDDP primer across samples Table 5 demonstrates the main result of the tested CDDP markers across all 14 samples used in this study.Monomorphic bands are defined as the similar size bands that are present in all the studied samples, amplified by the same primer which generated one-direction reads.For instance, in Table 3, primer ERF1 amplified monomorphic bands of PCR product at a size of ≤100 bp (bands 8 and 9) which are present in all samples (TD1, TD2, TD3, TD4, TD5, HM64, HM70, HM72, HM78, HM28, S4, S5, S6, and S9).Whereas, polymorphic bands are defined as the bands of PCR products that are present in certain samples and absent in another sample.Hence, the percentage of polymorphism in Table 5 was calculated by dividing the number of polymorphic bands by the total number of bands.DNA markers that are used to reveal genotypic differences between individuals due to marker sequence differences are called polymorphic markers while DNA markers that cannot be used to   (Amiteye, 2021).
According to the results obtained in Table 5, the highest percentage of polymorphism obtained is 100% which is yielded by WRKY-R1 CDDP primer.WRKY gene is specific for transcription factors for development and physiological roles, which are involved in diverse biotic or abiotic stress responses as well as in developmental or physiological processes (Jiang et al., 2015).Thus, WRKY markers might be one of the powerful target regions to evaluate the genetic polymorphism of Harumanis and non-Harumanis mango varieties.the lowest percentage of polymorphism is demonstrated MADS-1 CDDP primer with 66.67% of polymorphism.The MADS gene is involved in controlling the floral origin initiation and development.MADS-box genes were discovered in several fruits with most of them being suggested to be involved in early fruit development (Elitzur et al., 2010).
All CDDP primers in these studies have a high potential to be used for evaluation of the genetic polymorphism in Harumanis, Tong Dam and Susu mangoes as the lowest percentage of polymorphism recorded is 66.67% (Table 5).This value is higher compared to the polymorphism percentage recorded for CDDP markers analysis for in vitro propagated Grapevine (Vitis vinifera L.) which ranged from 11% to 33% for CDDP-3 and CDDP-11 primers respectively (Aljuaid et al., 2022).The percentage of polymorphism recorded by CDDP markers used in a study of wheat cultivars was also lower compared to our result, but slightly higher compared to those by Aljuaid et al. (2022) (25% to 83.3% for ERF-1 and KNOX-1 primers respectively) (Hamidi et al., 2014).
The polymorphic information content (PIC) value was calculated for all the CDDP markers used in this study.The PIC value ranges from a minimum of zero for monomorphic markers to a maximum of 0.5 due to its presence in 50% of the plants (Roldán-Ruiz et al., 2000).
The mean of the PIC value for all CDDP markers tested is 0.32.The highest and lowest PIC value was recorded by primer WRKY-F1 and primer WRKY-R1 at 0.44 and 0.25, respectively.The marker in this study was considered useful and informative since the degree of polymorphism was low for PIC of 0 to 0.10, medium for PIC of 0.10 to 0.25, high for PIC of 0.30 to 0.40, and very high for PIC of 0.40 to 0.50 (Serrote et al., 2020).These correlate with the Quercus infectoria Oliv genetic polymorphism study, where the KNOX-2 has the highest PIC at 0.41, while WRKYR-1 has the lowest PIC at 0.22 (Ahmed et al., 2022).The result is in line with the finding in this study, in which the lowest value of PIC was WRKY-R1.
The resolving power (Rp) or the ability of each primer to detect polymorphism in the tested individuals was calculated according to Prevost and Wilkinson (1999).The mean value of resolving power for all the tested CDDP markers is 6.17.The CDDP primer of WRKY-F1 recorded the highest Rp value of 8.57, while MADS-1 gave the lowest value of 2.57 (Table 5).The mean value of Rp in this study is lower compared to the analysis of CDDP markers in Tunisian Pistacia vera L. but is slightly higher compared to the analysis in the Amomum tsao-ko plant which is 7.51 and 5.91 respectively (Aouadi et al., 2019;Ma et al., 2022).Comparatively, good Rp values of the CDDP primers used in this study confirm their ability to discriminate among different genotypes.

Cluster analysis of different mango varieties
Table 6 presents the distance matrix computed with the Jaccard Coefficient based on the bands obtained using CDDP Primers.The UPGMA tree constructed using Jaccard's distance coefficient clustered 14 samples into 4 major clusters denoting distinct diversity and relationships among the samples (Figure 3).These clusters are further divided into clades.For example, Cluster 3 and Cluster 4 consist of another three clades.Interestingly all varieties of mangoes used in this study  Based on the UPGMA tree, the cultivars are grouped in the same cluster since they are probably polyclonal varieties with a high degree of genetic similarity.The dendrogram suggests that the Harumanis samples used in this study are related even though they are collected from different planting sites.Genetic distance is a measure of the genetic divergence between samples.The Susu mangoes, S6 and S5 have the smallest genetic distance which is 0.111, indicating that S6 and S5 samples are closely related (Figure 3 and Table 6).While S6 has a distance of 0.5 when compared to all Harumanis samples, the most distantly related is between Susu mango and Harumanis mango at 0.778.Among Harumanis mango samples, the most related sample is HM 28 with HM 70 at 0.095 and both are clustered the same clade of Cluster 1.The largest genetic distance for Harumanis mango accession is 0.609 between HM 28 and HM 64.Genetic variation among the same mango varieties is manifested by genetic distance values among them caused by multiple factors.The variation permits flexibility and survival of the variety in the face of changing environmental conditions.These were expected since the Harumanis, Tong Dam, and Susu mangoes samples used in this study were collected at different planting sites and may have originated from different areas thus reflected in their genetic variability.Previously, a wide range of dissimilarity values have been reported in the cultivar of Baneshan mango that may reflect the considerable intracultivar variability condition.The existence of considerable intracultivar variability for Baneshan mango may be due to the fact that the farmers are not supplied with uniform planting material (Begum et al., 2014).
Overall, this study indicates the high potential of CDDP primers to be used for genetic polymorphism analysis in differentiating Harumanis from non-Harumanis mango varieties in the future.The CDDP analysis could be an effective supplement to common markers such as RAPD and SSR for useful data mining.The selection of techniques is based on the particular targets, financial, available instruments as well as constraints (Kafkas et al., 2008;Pavlović et al., 2012;Bilčíková et al., 2021).Furthermore, CDDP markers also have been proven to be an effective approach in evaluating the genetic polymorphism for various crops such as mushrooms, apples, and Indian mango (Bilčíková et al., 2021;Jena et al., 2021;Golian et al., 2022).

Conclusion
This is the first report on the utilization of CDDP markers in the genetic diversity study of local varieties of M. indica using CDDP primers in differentiating Harumanis from non-Harumanis mango varieties and their potential ability in assessing the genetic diversity of selected mango germplasm in Malaysia.Data generated by DNA-based markers are valuable in creating a mango database suitable for crop improvement and conservation programs as well as optimizing the management of mango resources in Malaysia.

Figures 1
Figures1 and 2show the result of genomic DNA extraction and PCR amplification products of 15 samples with universal primer of trnL-F3 and trnL-F4 as control after being visualized under UV light.The result shows that all the PCR products obtained are within the range of estimated product size of 100 bp to 250 bp.Therefore, the genomic DNA obtained in this study were considered of good quality and suitable to be used for PCR except for Sample S7.Sample S7 in Lane 14 produced a faint band, and it could be due to the low concentration of DNA present in the sample.The validation of the genomic DNA quality by PCR using a selected set of primers also have been conducted previously for the extraction of DNA from forest elephant dung samples
exclusive clusters.None of the clusters contained more than one of the three investigated varieties.Cluster 1 comprises two samples of Harumanis (HM 70 and HM 28) that were collected from the orchard in Jabatan Pertanian Perlis and a greenhouse in INSAT respectively.Cluster 2 includes another three samples of Harumanis (HM 64, HM 72, and HM 78).Clusters 3 and 4 contain non-Harumanis samples, Susu and Tong Dam mangoes, respectively with both divided into their cluster.Cluster 3 contains all four samples of Sala mangoes, and Cluster 4 belongs to five samples of Tong Dam mangoes.

Figure 3 .
Figure 3. UPGMA dendrogram of 14 samples based on six primers.The UPGMA dendrogram constructed based on CDDP data revealed fourteen samples were grouped into four major clusters with all different varieties forming their own clad.

Table 2 .
trnL-F region and CDDP marker primers that are used in CDDP analysis.

Table 3 .
Bands scored for ERF1 primer across the sample ("1" represents the presence of band while "0" represents the absence of band).

Table 5 .
Results of the tested CDDP markers across all 14 samples used in this study included the number of bands generated based on the available loci for each primer such as the total number of bands (TNB), number of polymorphic bands (NPB) and https://doi.org/10.26656/fr.2017.8(5).294© 2024 The Authors.Published by Rynnye Lyan Resources