Construction of castor functional markers fingerprint and analysis
of genetic diversity

In order to provide a molecular basis for selecting good hybrid combinations for the identification of castor bean germplasm resources, fingerprint and genetic diversity analysis of 52 castor bean materials from 12 regions in 5 countries were constructed by using the Functinal Markers (FMs) associated with fatty acid metabolism-related genes. A total of 72 alleles were amplified by 29 pairs of FMs with an average of 2.483 per marker and the polymorphic information content was 0.103–0.695. Shannon’s information index (I), observed heterozygosity (Ho) and expected heterozygosity (He) were 0.699, 0.188 and 0.436 respectively. The clustering results indicated that the castor germplasm could be divided into two groups with the genetic similarity coefficient of 0.59. The genetic similarity of 12 regions ranged from 0.518 to 0.917 and the genetic distance was between 0.087 and 0.658. A total of 5 pairs of core primers were screened to construct a digital fingerprint of different castor germplasm resources, which could distinguish all 52 germplasms. This study provides a scientific basis for screening high-quality castor germplasm resources and broadening the genetic basis of castor breeding at the molecular level.


Introduction
Castor (Ricinus communis L., 2n = 2x = 20), a dicotyledonous annual or perennial shrub belonging to the family Euphorbiaceae, is one of the world's top ten oil crops with high social value (Costa et al., 2006). It originated in East Africa (Vavilov, 1951) and has been now widely cultivated in tropical and subtropical regions (Govaerts et al., 2000) with India, Brazil and China as the main producing countries of castors (Downey et al., 1989). The oil content of castor seeds ranks among the best in seed oil crops, reaching as high as 46% to 55%. Castor oil features unique physiochemical properties and has been widely used for machinery, aerospace, pharmaceutical, paint, soap, cosmetics, lubricant, textile, printing, dyeing, energy and environmental protection, composite materials and phytoremediation purposes (Brigham, 1993;Ogunniyi, 2006).
Castor is deemed as an economically important oilseed crop with 3-5% increase in demand annually (Anjani, 2012). However, the lower genetic diversity of castor and relatively lagging genetic research make castor lack high-yield and high-quality varieties, and farmers lack enthusiasm for planting, resulting in a significant reduction in castor planting area. The growing demand for castor products and the decreasing supply of castor have caused a serious supplydemand disequilibrium in the market, e.g., the castor oil and its derivatives in developed countries such as the United States is mainly acquired through import (Roetheli et al., 1991). After 2014, the import dependence rate of castor raw materials from China, as the main producing country, has also reached more than 90% (Information from 2014 to 2018 Annual Meeting of the Chinese Academy of Agricultural Engineering Castor Technology and Economics Branch); therefore, it is of great significance to broaden the genetic background of castor germplasm, improve the utilization rate of germplasm resources, enhance the selection efficiency and improve the scientific nature of castor breeding for cultivating high-quality, stable and resistant castor varieties.
Functional Markers (FMs) are molecular markers developed on the basis of polymorphisms in gene sequences. Different allelic variations of these genes are directly related to phenotypes (Andersen and Lübberstedt, 2003). With the enrichment of a large number of gene EST sequences in public databases, more FMs have been developed to a greater extent, which provide guidance for accurate evaluation and efficient use of genetic information (Gupta and Rustgi, 2004). It has become a powerful means for germplasm resource evaluation, hybrid genetic purity detection and genetic diversity identification (Fjellstrom et al., 2004;Kumar et al., 2014;Liu et al., 2012;Simões et al., 2017). Several studies have reported on the application of FMs in different plant species. Fjellstrom et al. (2004) developed three rice blast-resistant functional genes for screeenig new rice varieties. Also a large number of FMs related to agronomic traits of processing quality and disease resistance have been developed in wheat, which plays an important role in the molecular selective breeding of wheat (Liu et al., 2012). In addition, Kumar et al. (2014) used TRAP (a type of functional marker) to study the genetic diversity of 263 native species of chickpea preserved in the USA-ARS Western Regional Plant Introduction Station. At present, few reports are available on castor fatty acid metabolism-related functional markers. Only Simões et al. (2017) published an article on TRAP marker development of casto, but no report has been found regarding the systematic marker diversity analysis and fingerprint construction of castor oil related genes. The castor fatty acid metabolism-related genes are single-copy  and it is more effective to use such singlecopy genes as FMs for development and related fingerprint mapping and genetic diversity studies.
On the basis of the published castor genome sequence , this study randomly selected several genes in relation to fatty acid metabolism and screened 29 pairs of FMs for 52 castor germplasm resources (from 12 countries or regions) for genetic diversity analysis and fingerprinting. Upon identification of the genetic background differences of castor germplasm resources, it provides an effective way to protect castor varieties and a theoretical basis for breeding castor varieties and broadening the genetic basis of castor breeding.

Materials and Methods
The 52 materials used in the test (Tab. 1) originated from 5 countries and 12 regions, containing wild materials and varieties. The wild materials are representative varieties selected from the molecular breeding laboratory system of the Agricultural College of Guangdong Ocean University in South China. The remaining materials are representative varieties selected from various countries and regions. All these materials were planted in the experimental field of the Agricultural College of Guangdong Ocean University randomly in 2008 in the order of the field ranks, 5 repeats per material, the plant spacing of 0.8 m, the line spacing of 1 m, three repetitions, randomly arranged.
Experimental methods DNA was extracted from young leaves by using the modified CTAB method (Couch and Fritz, 1990). Referring to the EST sequence of castor published in 2010 , the primer of castor fatty acid metabolism-related genes was designed and synthesized by Shanghai Sheng Gong Co., Ltd. From the developed primers, 29 pairs of primers with good amplification effect, clear and stable bands were used to PCR-amplify the genomic DNA of 52 castor materials. PCR amplification was performed in a 20 μl reaction system: 1.5 μl template DNA (20 ng/μl), 0.4 μl Taq enzyme (3U/μl), 2 μl 10 × PCR Buffer, 0.2 μl dNTP (10 mmol/L), 2 μl primer (2 μmol/L) and 13.9 μl double distilled water under optimized experimental conditions according to the following reaction procedures, speciaifcally, initialdenaturation at 94°C for 5 min, denaturation at 94°C for 30 s, annealing at 55°C for 50 s, extension at 72°C for 1 min, 35 cycles and extension at 72°C for 5 min. The amplified products were separated and detected by 6% nondenaturing polyacrylamide gel electrophoresis.

Data analysis
Clear amplicons were scored by a read indicating "1" with no read "0," and "0, 1" matrix is thus established. The number of observed alleles (Na), number of effective alleles (Ne), observed heterozygosity (Ho), Shannon's information index (I) and the inheritance between 12 inter groups were calculated by using Pop Gen version 1.32 (Tehrani et al., 1998) software. PIC values and genotypes were calculated by using Power marker Version 3.25 (Liu and Muse, 2005) software. The genetic similarity coefficients were calculated by the method of Nei and Li (Nei and Li, 1979). The similarity matrix was clustered by UPGMA method (Sneath and Sokal, 1973), and the NT sys software was used to construct the clustering diagram (Rohlf, 1993).

SSR marker polymorphism analysis
A total of 52 pairs of FM primers were used for polymorphism detection of 52 castor materials. The amplification results (Tab. 2) showed that 29 pairs of primers amplified 72 alleles in 52 germplasm resources, and each pair of primers was amplified. The number of alleles ranged from 1 to 4 with an average of 2.48 per locus. The average number of effective alleles (Ne) was 1.951 with OM19 having the lowest effective allele at 1.000 and the highest effective alleles in OM3 at 3.113. The Shannon's Information Index (I) ranged from 0.000 to 1.240 with an average of 0.687. The observed heterozygosity (Ho) of the primer had the maximum and minimum value of 0.941 and 0 respectively. The average polymorphism information content was 0.397, ranging from 0.103 to 0.695.

Construction of DNA fingerprinting
According to the amplification results of 29 pairs of FM primers, the genotypes were analyzed after the primer diversity (Tab. 2). Considering the size of the primer PIC and the statistical difficulty of the allele number, the primers with the highest number of polymorphisms were selected and used to distinguish the 52 castor materials. The number of genotypes distinguished by all primers ranged from 2 to 7, as shown in Tab. 2; therefore, all germplasm resources were distinguished by primer combinations including OM3, ACC3, PEPC10, PEPC6 and ACC12 respectively. A total of 52 genotypes were involved in all tested materials. Based on the primer combination sequence of OM3, ACC3, PEPC10, PEPC6 and ACC12, the result of corresponding amplified fragment size of each variety was converted into a binary, which was connected into a series of numbers. Thus a unique code of each variety was obtained, namely the fingerprint (Tab. 3). The 5 pairs of highly polymorphic primers obtained in this study can provide a convenient and fast recording method for DNA fingerprinting at the molecular level for identification of castor materials.

Construction of DNA fingerprinting
According to the amplification results of 29 pairs of FM primers, the number of genotypes was analyzed upon analysis of the diversity of primers (Tab. 2). Considering the size of PIC of primers, the number of alleles and the difficulty of banding statistics, the primers with the most genotypes were selected. For these 29 tobacco germplasms, if not all identified as one primer at a time, all materials were separated. Tab. 2 shows that the number of genotypes that all primers can distinguish is 2-7, so all germplasm resources are distinguished by the primer combinations including OM3, ACC3, PEPC10, PEPC6 and ACC12 respectively in this paper. A total of 52 genotypes were involved in all tested materials. In this study, according to the primer combination order of OM3, ACC3, PEPC10, PEPC6 and ACC12, the result of the size of each amplified fragment corresponding to each variety was then converted into a binary, which was connected into a series of numbers. Thus a unique code of each variety were obtained as the "Fingerprint," shown in Tab. 3.

Cluster analysis
The cluster analysis was performed by using the UPCMA method (unweighted averaging method) to map the phylogenetic tree (Fig. 1). The results showed that the genetic similarity coefficients of 52 materials ranged from 0.59 to 0.94 and were divided into two large groups at a genetic similarity coefficient of 0.59 and 0.67 respectively. Not all materials from the same region could be clustered into the same group, therefore, the genetic grouping of materials was not entirely affected by the region. Nonetheless, an obvious trend was available that the groups were closely related to the regions. The materials from Taiwan and Malaysia were grouped separately, namely Group A and D, and the materials from Hainan were also distributed in small subgroups of Group G and J. Group C, F, G and H include all materials from the neighboring Guangdong Province and Guangxi Province.

Population distance, genetic identity and UPCMA cluster analysis of different populations
The results of genetic distance and genetic identity distribution of germplasm from 12 different regions in 5 countries including China, Malaysia and Pakistan were compared. The results revealed that the genetic distance of the 12 regions was between 0.087 and 0.658 (Tab. 4), and the materials from Taiwan, Malaysia and Hainan recorded the highest genetic distance between 0.594 and 0.658. The genetic distance of materials from Shandong and France (0.087), materials from Guangxi and Guangdong (0.113) was small; the genetic consistency between 12 castor populations ranged between 0.518 and 0.917. The materials from Taiwan were inferior to those from Malaysia and Hainan (0.518, 0.552), and the relationship between materials from France and Shandong (0.917), Guangxi and Guangdong (0.893) was far apart. The higher genetic coherence was, the higher frequency of genetic communication between them would be. The UPCMA clustering results between the 12 populations ( Fig. 2) were consistent with the results of the cluster analysis (Fig. 1).

Discussion
DNA-fingerprinting, as a powerful tool for testing the authenticity and purity of varieties, highlights the advantages of fastness and accuracy. It has been used for resource diversity and purity identification in many species including corn (Wang et al., 2011) and watermelon (Zhang et al., 2012). The development of molecular marker technology has enriched the identification method of varieties. FM markers are molecular markers developed based on the polymorphism of gene sequences. Different allelic variations of these genes are directly related to the phenotype (Andersen and Lübberstedt, 2003). The fatty acid metabolism-related genes in castor is a single copy , therefore the FM marker associated with the castor oil content is one of the ideal markers for identification of castor materials. The 52 experimental materials involved in this study have a wide range of sources, covering 12 regions in 5 countries. 21 of these materials are wild materials from South China. Studies have pointed out that Chinese castor materials are likely to originate in South China, and the genetic diversity of wild materials is higher than that of cultivated materials (Fan et al., 2019). The genetic diversity of wild materials in South China is slightly higher than that from other places . Therefore, the castor materials used in this study feature rich genetic backgrounds and can represent the majority castor materials. Since the five pairs of primer combinations as selected can completely separate the 52 germplasms of this experiment, it is highly possible to distinguish the majority castor materials that have appeared so far. However, with the continuous enrichment of germplasm resources, the identification ability of these five pairs of primers may decrease, so the number of primers may increase or the primers may be replaced according to specific conditions. Cluster analysis of 52 different types of castor germplasm resources was carried out by using FM molecular markers. When the genetic similarity coefficient was 0.59, 52 materials were divided into 2 groups, and then into several groups when the genetic similarity coefficient was 0.67. Although not all the materials from the same region were clustered in the same group, an obvious trend was found that the groups were related to the regions to a great extent, which was consistent with the results of some genetic diversity analyses of castor as reported. Allan et al. (2008) used 16 pairs of AFLP markers and 9 pairs of SSR markers for genetic diversity analysis of 200 castor materials from 41 regions in 35 countries of 5 continents. Senthilvel (Senthilvel et al., 2017) used 45 SSR markers for genetic analysis of 144 castor inbred lines. Also, Kallamadi (Kallamadi et al., 2015) used RAPD, ISSR and SCOT markers to analyze 35 castor materials from 7 regions of the world. Recently, Agyenim-Boateng (Agyenim-Boateng et al., 2019) used SRAP markers to analyze the genetic diversity of 473 castor-bean materials from South China.
The genetic distance and genetic consistency of materials from 12 regions showed that Taiwan is far away from Hainan and Malaysia. This may be attributed to the differences in geographical location, which significantly reduced their inter-regional genetic communication and separated them,
as identified by the cluster analysis ( Figs. 1 and 2). French materials are closely related to Shandong materials perhaps because the breeders used materials from one region with the parent materials from the other. For the materials from Guangdong, Guangxi and Hainan, it can be seen that the relationship between Guangdong and Hainan materials is the farthest, followed by Guangxi and Hainan, Guangdong and Guangxi, which is consistent with the findings of Wang et al. (2019).

Conclusion
As shown in this study, with the application of the fatty acid metabolism-related functional marker technology, five pairs of primers, OM3, ACC3, PEPC10, PEPC6 and ACC12, can be used to distinguish 12 parts of castor materials from 12 regions in 5 countries. As these 52 parts of castor material are from a wide range of sources, they can represent the majority castor materials; therefore, it can be predicted that the five primers selected in this study can distinguish most of the castor germplasm resources in the world. This study provides a technical basis for identifying the castor materials and protecting the germplasm resources.