SSR genetic diversity assessment of popular pigeonpea varieties in Malawi reveals unique ﬁ ngerprints

Background: Pigeonpea ( Cajanus cajan (L.) Millsp.) is a drought tolerant legume of the Fabaceae family and the only cultivated species in the genus Cajanus . It is mainly cultivated in the semi-arid tropics of Asia and Oceania, Africa and America. In Malawi, it is grown as a source of food and income and for soil improvement in intercropping systems. However, varietal contamination due to natural outcrossing causes signi ﬁ cant quality reduction and yield losses. In this study, 48 polymorphic SSR markers were used to assess the diversity among all pigeonpea varieties cultivated in Malawi to determine if a genetic ﬁ ngerprint could be identi ﬁ ed to distinguish the popular varieties. Results: A total of 212 alleles were observed with an average of 5.58 alleles per marker and a maximum of 14 alleles produced by CCttc019 (Marker 40). Polymorphic information content (PIC), ranged from 0.03 to 0.89 with an average of 0.30. A neighbor-joining tree produced 4 clusters. The most commonly cultivated varieties, which include released varieties and cultivated land races, were well-spread across all the clusters observed, indicating that they generally represented the genetic diversity available in Malawi, although substantial variation was evident that can still be exploited through further breeding. Conclusion: Screening of the allelic data associated with the ﬁ ve most popular cultivated varieties, revealed 6 markers – CCB1, CCB7, Ccac035, CCttc003, Ccac026 and CCttc019 – which displayed unique allelic pro ﬁ les for each of the ﬁ ve varieties. This genetic ﬁ ngerprint can potentially be applied for seed certi ﬁ cation to con ﬁ rm the genetic purity of seeds that are delivered to Malawi farmers.


Introduction
Pigeonpea, Cajanus cajan [L] Millsp. is a drought tolerant crop and one of the most important legumes grown in the tropics and sub tropics. As a rich source of protein for humans it is largely used to supplement cereal staples [1,2] and is also a good source of fodder. In the southern Great Plains of the United States of America, pigeonpea provides high quality forage when other feeds are less productive [3] and poultry performs particularly well when pigeonpea is included in their diet [4,5]. The extensive and deep root system of pigeonpea fixes atmospheric nitrogen and improves the quality and structure of soils [6] and perennial pigeonpea types provide fuel wood and material for basket weaving and roofing in African villages [7]. Due to this versatility, pigeonpea is an established and valued crop among small-scale farmers in Malawi.
Analyzing genetic relationships in species is important for revealing diversity. In addition to displaying the existing variability among cultivars [8] genetic diversity provides valuable information on target trait availability and diversity for successful breeding programs [9,10]. Molecular markers are useful tools for genetic diversity assessment of various crops and among them simple sequence repeats (SSRs) are popular since they reveal more variation e.g. in pea [11], rice [12], maize [13] and wheat [14]. This is also the case for pigeonpea [15,16,17,18,19,20]. In fact SSRs in pigeonpea are bound to be more informative as most have now been mapped in the pigeonpea genome [21,22].
Pigeonpea production in Malawi has increased from 64 to 193 kt between 2005 and 2010 making Malawi Africa's top pigeonpea producer in 2010 [23]. It is an economically important crop for small-scale farmers especially in southern Malawi as it provides food security, is highly nutritious, improves soils and serves as a valuable cash crop [24]. Traditional varieties are predominantly cultivated, leaving potential for increased production if farmers had access to improved high yielding varieties [25]. In addition, production suffers greatly from low quality seeds that result from mixing and/or contamination with pathogen propagules. Natural outcrossing, which can be as high as 45% in pigeonpea, is the major source of varietal contamination [26] and causes significant yield losses for farmers in Malawi. This is further exacerbated by the lack of effective channels to avail sufficient high quality seeds to farmers by various stakeholders [27]. Besides, since genetic purity directly affects pigeonpea yields, access to pure seeds is essential [28]. It is therefore important to determine the general level of purity of each pigeonpea variety available in Malawi and to what extent varieties become mixed. This will help determine how to maintain seed purity. This study aimed to assess the diversity of all known pigeonpea varieties cultivated in Malawi using SSR markers and to determine if the released varieties could be distinguished to establish a basis for future tracking of dissemination and adoption of improved and released varieties.

DNA extraction
Seventy nine varieties, (Table 1) representing all accessions held in the Malawi gene bank as well as released varieties of pigeonpea in Malawi, and one variety (ICP 2309) used as a control in allele scoring, were obtained with the assistance of ICRISAT-Lilongwe and planted in Nairobi, Kenya in a screen house. Two weeks after germination, DNA was extracted from leaves of 5 individual seedlings of each genotype according to the protocol described by Mace et al. [29], omitting the phenol: chloroform extraction step. However, for accessions that failed to germinate at all or produced fewer than 5 seedlings DNA was extracted from the seeds using the protocol described by Sharma et al. [30]. The homogenization solution was modified to contain 5 M NaCl, 2% (w/v) Sarcosyl, 100 mM Tris and 20 mM EDTA. For all samples, DNA quality was determined by agarose gel electrophoresis (0.8% (w/v)) stained with 5 μL/100 mL Gel Red® (Biotium Inc., USA) while the quantity was determined by spectrophotometry (Nanodrop© 1000, Thermo Scientific, USA). All the DNA samples were then diluted to 10 ng/μL and used for PCR.

SSR amplification and optimization
PCR was done using 48 publicly available polymorphic markers (Table S1.). All the forward primers contained a 5′-M13 tag (CACGAC GTTGTAAAACGAC) to allow incorporation of a fluorochrome during the PCR process [31]. The fluorochromes used were 6-Carboxyfluorescein (FAM®), NED®, VIC® and PET®; Life Technologies Corporation, USA). Each PCR reaction contained 1× PCR buffer (20 mM Tris-HCl (pH 7.6); 100 mM KCl; 0.1 mM EDTA; 1 mM DTT; 0.5% (v/v) Triton X-100; 50% (v/v) glycerol), 2 mM MgCl 2 , 0.16 mM dNTPs, 0.16 μM of a labeled M13-primer, 0.04 μM M13-forward primer, 0.2 μM reverse primer, 0.2 units of Taq DNA polymerase (SibEnzyme Ltd., Russia) and 30 ng of template DNA. The volume for each PCR was topped up with sterile water to a final volume of 10 μL. Reactions were performed on a thermocycler (GeneAmp PCR system 9700®, Applied Biosystems, USA) with initial denaturation of 94°C for 5 min, followed by 35 cycles of 94°C for 30 s, 59°C for 1 min and 72°C for 2 min followed by final elongation at 72°C for 20 min. For markers that did not amplify with this PCR protocol, changes were made only to the annealing temperature using the published annealing temperatures for the respective markers, followed by testing annealing temperatures calculated using the SSR primer sequences in the first step of BioMath Calculators (http://www.promega.com/a/apps/biomath/ index.html?calc=tm). A group of eleven SSRs primers that failed to amplify after these annealing temperature adjustments, were submitted to gradient PCR using a Techne TC-5000 Thermo cycler®, (Bibby Scientific Group, United Kingdom), which allocated different annealing temperatures to each column in a 96-well PCR plate. The temperatures used for this study were between 48.8°C in column 1 to 61.1°C in column 12, with intervals of 1.05°C increasing for each of the 12 columns. Amplification was confirmed by electrophoresis using 2% (w/v) agarose gel stained with GelRed® (Biotium, USA) and visualized under UV light. Depending on the efficiency of amplification, 2.5 μL-3.5 μL of 3 to 4 different amplification products were co-loaded along with the size standard, GeneScan™-500 LIZ® (Applied Biosystems, USA) and Hi-Di™ Formamide (Applied Biosystems, USA) and separated by capillary electrophoresis using an ABI Prism® 3730 Genetic analyzer (Applied Biosystems, USA) [32].

Allele calling and fingerprinting
Fragment analysis was performed with Gene Mapper 4.0 (Applied Biosystems, USA) and allelic data for each marker analyzed with PowerMarker V3.25 [33] and DARwinV.5.0.158 software [34]. A table of summary statistics including major allele frequencies, number of alleles, heterozygosity and polymorphic information content (PIC) of each marker was produced using PowerMarker software. DARwin software was used to produce a pair-wise dissimilarity matrix calculated using the formula described in [Equation 1]: where dij was the dissimilarity between units i and j, L was the number of loci, π was the ploidy and m l was the number of matching alleles for locus l. This matrix was then used in displaying the unweighted neighbor-joining dendrogram and the principle coordinate analysis with the same software. Allelic results were investigated to identify markers with the potential to provide a DNA fingerprint for cultivated and released pigeonpea varieties from Malawi. The ideal fingerprinting markers were considered to be those that can unambiguously discern all the varieties from one another. It was highly unlikely that a single marker would fit these criterion and more likely that a set of markers would have to be considered for this purpose. In order to identify such a set of markers, the following steps were followed. First, the allelic data for the target varieties were selected from the complete dataset and considered in isolation from the gene bank and reference data. Second, the data were screened to eliminate all the markers that had low success in PCR amplification (presented ≥ 40% missing data), were monomorphic or heterogeneous (provided multiple different alleles within an accession). If a marker presented a different allele for two individuals within an accession, it was considered heterogeneous and not included. If a marker presented a different allele for only a single individual, it was considered homogeneous and included, provided that it was polymorphic across all the accessions under consideration.

DNA quality and quantity
All 328 samples extracted from fresh leaves provided good quality, high molecular weight DNA. However, the 67 DNA samples extracted from seeds showed some degradation. The average concentration of DNA from leaf samples was 0.574 μg/μL, ranging from 0.066 μg/μL to 1.342 μg/μL. DNA extracted from seeds contained more contaminants and lower concentrations ranging from 0.096 μg/μL to 0.689 μg/μL. The mean A 260/280 for DNA obtained from fresh leaves was 1.9 ranging from 1.7 to 2.1 while that of DNA obtained from seeds was 1.6 ranging from 1.6 to 2.0.

PCR amplification
Amplification was successful for 37 of the 48 SSRs tested. To ensure amplification for the remaining 11 SSRs, PCR conditions were optimized by testing various annealing temperatures between 48.8°C and 61.1°C. This was successful for a further eight SSRs, with the annealing temperatures shown in Table 2. Of these markers, three (CCcttc001, Cccta003 and CCttc007) still did not amplify and were excluded from this study. Another three markers (CCttc006, CCttc012 and CCtc020) still amplified in fewer than 50% of the samples. Further optimization, through reduction of the fluorescent dye for one of these markers (CCttc006) resulted in amplification. However, the fluorescent signals from these amplification products were too low to be detected during capillary electrophoresis to allow allele scoring on GeneMapper® software and this marker was also excluded.

Allele scoring and analysis
Allelic data obtained from GeneMapper V4.0 analysis after capillary electrophoresis, was analyzed using PowerMarker and showed that one marker, CCttc008, appeared to amplify two different loci, since each sample produced two distinct alleles of different sizes, one 251 bp and the second ranging from 251 bp to 255 bp. This marker was therefore not useful for discerning genetic diversity in this germplasm. Similarly, marker Ccat006 was highly heterozygous, i.e. each sample produced two different alleles and these often differed among the individuals within an accession, which complicated the interpretation of the allelic data for this marker within the germplasm. This marker was therefore also excluded from further data analysis. DNA samples from genotypes MW 2243_3, MW 2243_4 and MW 2355_7, which were extracted from seeds, did not work well as only 33% of allelic data was available and these individuals were also excluded from the subsequent analysis.
Following this data curation, the data matrix obtained was for 38 markers and 392 genotypes. This data set was analyzed by PowerMarker to produce a table of summary statistics (Table 3). Two hundred and twelve alleles were revealed with an average of 5.58 alleles per marker and a maximum number of 14 alleles produced by marker CCttc019. PIC, an indicator of how well a marker is able to distinguish the samples tested due to the diversity of alleles detected across the samples, ranged from 0.03 to 0.89 with an average of 0.30. DARwin software was used to produce a dissimilarity matrix which was displayed in a neighbor-joining tree, presented in Fig. 1, and also used for principle component analysis (PCA), presented in Fig. 2. Four major clusters were evident. Clusters I and IV comprised some of the gene bank materials as well as all five released varieties. All of the gene bank materials grouped together in Clusters II, III and IV. The landraces also grouped in Cluster III, apart from ICP 13076, which was in Cluster I.
Screening of the allelic data associated with the selected released and improved varieties (ICEAP 00040, ICEAP 00020, KAT60/8, ICEAP 00068 and ICEAP 00557 and Mtawanjuni a popular traditional cultivar) for which a DNA fingerprint was to be developed, revealed that 6 markers -CCB1, CCB7, Ccac035, CCttc003, Ccac026 and CCttc019could unambiguously discern these six varieties from one another. These markers were also highly homozygous and the amplified fragments were easy to score. The fingerprint developed with the 6 markers listed above, are presented in Table 4.

DNA extraction and amplification
High quality DNA was obtained in this study, even without using the prescribed phenol: chloroform extraction step described by Mace et al. [29]. The total average amount of DNA obtained from leaves was 55 μg, which is higher than the 7.5 μg reported for pigeonpea by Mace et al. [29]. The mean A 260/280 of DNA extracted from fresh leaves was 1.9 while that of DNA extracted from seeds was 1.6. This made the extraction both safer and cheaper by eliminating the use of phenol, which is hazardous and expensive to dispose of [35]. DNA extracted from the seeds was degraded and of lower quality than that obtained from leaf material, likely due to the polysaccharides and polyphenols present in pigeonpea seeds [1]. These compounds co-precipitated with the DNA after the addition of isopropanol/ ethanol: sodium acetate [30] and inhibited Taq DNA polymerase activity in the subsequent SSR genotyping [36], which explained the recalcitrance to PCR amplification of the 3 DNA samples that were obtained from seeds.
PCR optimization is an important step to ensure the successful amplification of the target DNA fragment. All aspects of a PCR protocol can be considered in optimization [37,38]. However, this study focused only on the annealing temperature and primer concentration. Amplification for 37 of the 48 primer pairs was successful using a fixed annealing temperature of 59°C, the standard protocol that worked well in our hands. Eight of the remaining 11 primer pairs successfully amplified the target SSR loci when the annealing temperature was adjusted, as indicated in Table 2. For marker CCttc006, CCttc012, and CCtc020, it was necessary to increase the amount of forward primer and reduce the fluorescently labeled M13 tag concentrations in the PCR reaction mixture. However, with the reduced fluorescent label, the resultant fragments did not incorporate enough fluorescence to be detected by the laser during capillary electrophoresis. This has been experienced before in other studies that used labeled M13 sequences according to the method described by Schuelke [31]. All in all, 45/48 or 94% of the markers tested did amplify by PCR and this was considered sufficient for this study. However, not all markers amplified equally well and another 7 (CCttc031, CCac009, CCcttc001, Cccta003, CCac036, CCtta015, CCttc007) had to be excluded from analysis. Although this represented a substantial amount of data that was excluded from the analysis, the final number of 38 good markers compared well with other published studies on genetic diversity analysis where 30 to 40 SSR markers were typically considered adequate e.g. in pigeonpea [19], sorghum [39], groundnut [40], wheat [41] and rice [42].

Genetic diversity
Allelic data analysis showed an average of 5.58 alleles per marker. This was higher than other pigeonpea diversity studies published to date, which used similar markers on cultivated varieties [15,16]. Diversity in cultivated pigeonpea is generally reported to be low [1,43]. This was observed even when other types of markers were used e.g. diversity arrays technology (DArTs) and amplified fragment length polymorphisms (AFLPs) [44,45]. Consequently, studies that included wild species reported higher PIC and allele number averages [17]. Despite the relatively low polymorphism, the markers used in this study grouped the genotypes clearly into four major groups. After ten thousand iterations the highest bootstrap value was observed in Cluster I. Other clusters showed lower confidence levels and these could be due to low polymorphism/ genome coverage of the SSRs used [46].
Most of the released varieties (ICEAP 00040, ICEAP 00020, KAT60/8, ICEAP 00068 and ICEAP 00557) were developed from Kenyan and Tanzanian varieties and subsequently introduced to Malawi [47,48]. ICEAP 00068 and ICEAP 00557 are released varieties originating from Tanzania, which grouped in different clusters (I and IV respectively). Released varieties that were developed in Kenya (ICEAP 00040, ICEAP 00020, and KAT60/8) all grouped together in Cluster I except for ICEAP 00040, which was in Cluster IV. All these released varieties were selected and improved for traits such as disease resistance, high  [49]. ICEAP 00040 and ICEAP 00020 are medium and long duration maturity genotypes, respectively, which are resistant to Fusarium wilt while ICEAP 00068, of medium duration, is susceptible to wilt but is popular with farmers as it yields large grains [48]. ICPV 9145 and ICP 13076 were ICRISAT-India accessions collected from Kenya, although they grouped in different clusters (III and I, respectively). Both genotypes and ICPV 87105 have moderate resistance to Fusarium wilt [50]. The obvious genetic differences observed between ICPV 9145 and ICP 13076 in this study could indicate possible different sources or mechanisms of Fusarium wilt resistance inherent in these two varieties. This should be further investigated in association mapping studies to confirm if this is the case so that this diversity can be exploited in future in breeding programs.  Although individuals of the same genotype grouped together for the most part, some were spread out among different clusters, such as ICP 9145 and ICEAP 00040. This was probably due to contamination or mixture of the seeds. Two landraces, Mtawanjuni and ICP 9145 grouped with gene bank materials. Mtawanjuni is a popular traditional cultivar in Malawi. It is a high yielding medium duration variety, which farmers prefer due to its relatively good insect resistance. ICP 9145 is a Kenyan landrace and one of the first varieties to be introduced to Malawi in 1987. It is high yielding and has resistance to Fusarium wilt [24].
From the neighbor-joining tree (Fig. 1) the most commonly cultivated varieties in Malawi, which include the four released varieties and four landraces from the region, were spread across three of the clusters observed, indicating that they generally represented the genetic diversity available in Malawi. However, cluster III and cluster IV showed only two released varieties each and none in cluster II; thus there is substantial variation that can still be exploited through further breeding. The markers used in this study were not known to be linked to any traits of interest and this should be the next step in pigeonpea genomics to allow visualization of which varieties harbor important traits such as the differing maturity duration, number and duration of flowering times during a season, high yields, large, cream colored seeds, insect resistance (especially to pod borers) and Fusarium wilt resistance [22]. Markers linked to these traits will allow scientists to determine sources and mechanisms controlling these traits. In addition, germplasm containing these traits can be identified and the traits transferred to the best yielding and most popular varieties [10,51]. Markers linked to these traits will also allow pyramiding the traits into a select few varieties. The recent sequencing of the pigeonpea genome is a major step in this direction [52]. Natural outcrossing, due to insect pollination, is high in pigeonpea and is difficult and expensive to control in the field since plants have to be isolated under insect-proof nets if outcrossing is to be avoided [26]. In Malawi, this causes contamination of seeds in farmers' fields since many farmers plant more than one variety on their farms or have neighbors who plant different varieties whose flowering times overlap. For example, after obtaining pure Mtawanjuni seeds used in this study from breeders, other seeds of this variety were obtained randomly from different Malawi farmers. The seeds obtained from the farmers had five different seed coat colors and none was similar to seeds obtained from breeders. Such contamination can cause yield losses due to loss or dilution of insect or Fusarium wilt resistance and often closes market opportunities when mixtures give rise to different seed colors or seed size [28].

Genetic fingerprint
To our knowledge, there is no available software that can screen allelic data and identify markers suited for a DNA fingerprint. Therefore, this study attempted a logical approach to identify markers that will provide such a fingerprint and the criteria were developed. The six markers identified for the DNA fingerprint, generally had low heterozygosity and intermediate to high PIC scores according to the PowerMarker results of the entire dataset (Table 3). Since the resulting number of markers and genotypes were both small, the fingerprint could be determined visually and is presented in Table 4. In all cases, at least four out of the five individuals always presented the same alleles, except for individual ICEAP 00557/3 and Marker CCac026 where missing data reduced this number to 3/5. CCttc019 was a heterozygous marker, which presented a monomorphic allele of 196 bp for all individuals across all the released varieties. This allele was excluded for the fingerprint and only the second, polymorphic alleles from all varieties were included. When the combination of alleles for each variety across the six markers are considered, this preliminary DNA fingerprint for pigeonpea can discern each variety with confidence. In a similar way, advantage of SSR marker assays was evidenced in pigeonpea hybrid breeding through ensuring the genetic purity of hybrids and their parents [53,54]. However, this fingerprint needs to be further tested for robustness, repeatability and ability to discern admixtures due to cross pollination.

Conclusion
This study set out to investigate the level of genetic diversity in all cultivated Malawi pigeonpea varieties with SSR markers. While this was successful, it was observed that the level of diversity is low and further studies should exploit more new SSR markers, such as those identified from resequenced pigeonpea genomes. It is also recommended that such studies include wild pigeonpea genotypes as they could reveal a new genetic resource. It was however noted that the released varieties are representative of the genetic base available in Malawi pigeonpea.
With a small number of markers it was possible to create a genetic fingerprint of six important pigeonpea varieties in Malawi. Although this needs to be tested further, it indicates the potential of using SSR markers to discern pigeonpea varieties. Moreover, use of more polymorphic markers will increase the number of genotypes in the fingerprint. This can be used to detect seed contamination, which is a major cause of low yields and ensure availability of high quality seeds for Malawi farmers.
Adequate high quality DNA was obtained from leaves despite omitting the phenol: chloroform extraction step. This and the advent of new methods that eliminate use of hazardous substances during DNA extraction show clearly that DNA extraction is becoming safer and cheaper.

Conflict of interest
The authors declare that they have no conflict of interest.