The Development of EST-SSR Markers from Transcriptome Sequencing and Genetic Diversity of Twenty Genotypes with High Yields of Cornus wilsoniana, an Important Wood Oil Plant

Oil plants are not only used as essential nutriments in food but also used broadly as ingredients of industrial products. Cornus wilsonianaWanger is an important native woody oil plant with high-yield and high-oiliness characters in China. The fruit oil of Cornus wilsoniana not only could be used as equilibrated dietary oil, but also has hypolipidemic function and helps to overcome EFA deficiency. Genomic information is currently not available for Cornus wilsoniana, which will therefore affect its genetic improvement process. In this study, 8713 EST-SSRs were identified from the transcriptome sequencing of Cornus wilsoniana. Most of these EST-SSRs were composed of dimer and trimer repeats. The AG/CT motif is the most common dimeric EST-SSRs motif (53.5%), whereas the CG/CG (0.44%) microsatellites are present only at low abundances. Among the trimeric microsatellites, AAG/CTT (5.53%) and ATC/ATG (4.27%) are the most common motifs. There are no obvious dominant motifs among the tetra-, penta-, and hexanucleotide motifs. Fifteen pairs of EST-SSRs primers are developed with polymorphism and used for the genetic diversity analysis of twenty genotypes of Cornus wilsoniana with high fruit output. Allele number per locus ranges from 2 to6 with an average of 2.87, and the PIC value ranges from 0.05 to 0.58 with an average of 0.36. Average genetic diversity overall SSR loci for the 20 genotypes was 0.427, ranging from 0.049 to 0.659. All the loci are polymorphic and clearly distinguish the genotypes. Cluster analyze (NJ tree, UPGMA) identifies a similar pattern of variation. 15developed EST-SSRs are informative, codominant and reliable, and could be applied in the improvement programs of Cornus wilsoniana in the future. KeywordsWoody oil plant; Genetic diversity; Polymorphism; Cornus wilsoniana Background Oil plants are widely spread in nature, which is not only used as essential nutriments in food, but also used broadly as ingredients of industrial products. There are approximately 300 million square kilometers of forest land throughout China, where oil plants cover 1 554 species belonging to 697 genera and 151 families (He et al., 2013a; He et al., 2013b). Cornus wilsoniana Wanger, a member of Cornaceae, is one of the important native woody oil plants with high-yield and high-oiliness characters in China. It distributes in the forest land with an elevation ranging from 100~1000 m, an average temperature of 18°C~25°C, and an annual precipitation of 1 000~1 570 mm. It can maintain maximum productivity for over 50 years and its life spans more than 200 years. An average of 50 kg dry fruits per year could be produced on a mature tree (Li et al., 2015). The oil content in dry fruits is 33%~36% and its oil has been taken as edible oil for over 100 years. Consumption of such oil is believed to prevent hyperlipidemia in Chinese folk recipe (Fu et al., 2012). Our previous study demonstrated the oil has a strong hypolipidemic effect, which is because of its very high content of PUFA (38.86%), especially linoleic acid and γ-linolenic acid. The activities of the two n-6 PUFA were possibly related to their high affinity to PPARδ. In addition, SFA/MUFA/PUFA ratio and EPA content are both good in the oil. Therefore, fruit oil of Cornus Molecular Plant Breeding 2021, Vol.12, No.5, 1-10 http://genbreedpublisher.com/index.php/mpb 2 wilsoniana could be used as equilibrated dietary oil, not only having a hypolipidemic function, but also helping to overcome EFA deficiency (Fu et al., 2012). Cornus wilsoniana has been approved as a novel source of woody plant food oil by the Chinese government in 2013. The general office of the State Council of China promulgated “Opinion on accelerating the development of woody oil industry"(issued NO. 2014-68) at the end of 2014. Those have greatly raised the interest of private oil production companies and local government. According to our uncompleted calculations, the annual seed output of wild Cornus wilsoniana plants amounts to 600 000 kg in China. Apparently, this can't meet the demands of woody plant oil market. Magnanimous grafted seedlings with high seed yield are needed for afforestation. Unlike other oil crops such as soybean, maize, rapeseed and sunflower, there are less genetically improved varieties of Cornus wilsoniana available (Huang et al., 2016). Hence, selection and deployment of genetically improved materials are put on the agenda. Plus tree investigations on a large scale and selection of superior individuals have been done in the naturally distribution of Cornus wilsoniana based on the phenotype characteristics (Lin et al., 2010; Zhu et al., 2012; Zhang et al., 2016). Genetic improvement of forest tree species is usually based on recurrent selection for general combining ability. That is, the frequency of desirable genes in the population is increased progressively through cycles of selection and crossing (Shelbourne et al., 1989). Because this process involves sexual reproduction for progeny testing, seed has been the traditional output from genetic improvement programs. However, traditional genetic improvement of forest tree species is a time-, resourceand cost-intensive activity, which greatly restricts the genetic improvement process. Recently, El-Kassaby introduced a breeding strategy called “breeding without breeding” (BWB), which has been proven highly convenient for tree breeding. The efficiency of this strategy has been evaluated by progeny testing, parental selection, and construction of pedigrees. The BWB strategy was demonstrated in a number of tree species and it has been extensively used in different areas of forest tree breeding, including phylogenetic analysis, mating systems, estimation of genetic parameters and breeding value, and spatial variation (Yuan et al., 2016). Selecting superior individuals directly from the open-pollinated progeny implantations, coupled with identifying the phylogenetic relationship and genetic diversity of the selected materials based on molecular markers, could shorten the breeding cycle. Microsatellites, also called simple sequence repeats (SSR), are usually short, 2 to 8 nucleotides in length, which widely exists in the genome of eukaryotes and prokaryotes. SSR markers are popular in population genetics, forensic studies, and paternity analyses because they are informative (multi-allelic, meaning that they have more than two forms of a gene), codominant (heterozygous individuals can be distinguished from homozygous individuals), and reliable (Smith et al., 1997; Jones and Ardren, 2003). The development of SSR marker primer is the prerequisite for the study of SSR markers. However, there isn’t any genomic information available on the species of Cornus wilsoniana. In our previous study, 11 SSRs primers from Cornus controversa (Yoshihiro et al., 2010), which is the same genus, Cornaceae, with Cornus wilsoniana were used to amplify DNA samples from Cornus wilsoniana. Results showed that all 11 SSRs developed for Cornus controversa couldn't produce any polymorphic bands. To help bridge this knowledge gap, our objective was to develop a set of EST-SSR markers based on transcriptome sequencing of Cornus wilsoniana and to examine their polymorphisms. Then, it will allow us to make preparations for subsequent genetic diversity analysis and molecular phylogeny of selected twenty genotypes from naturally open-pollinated populations in relation to fruit output in this paper. 1 Materials and Methods 1.1 Plant materials 6 samples for high-throughput transcriptome sequencing were collected from fruits of two genotypes during three developmental stages in the germplasm bank of Cornus wilsoniana at Hunan Academy of Forestry. 20 samples of SSR amplification were collected from fresh leaves of grafted seedlings in the germplasm banks of Cornus wilsoniana at Longshan forest farm, Lechang city, Guangdong province. Xishan forest farm, Chongqing city and Yangkou national forest farm, Shunchang county, Fujian province. Molecular Plant Breeding 2021, Vol.12, No.5, 1-10 http://genbreedpublisher.com/index.php/mpb 3 1.2 DNA extraction Genomic DNA of above samples was extracted using QIAamp DNAMini Kit. 1.3 High-throughput transcriptome sequencing 6 samples were delivered to Shanghai GR Biotechnology Co. Ltd for transcriptome sequencing using RNA-Seq technique. 1.4 Identification of EST-SSRs EST-SSRs were searched using the MISA (http://pgrc.ipk-gatersleben.de/misa/) with unit size of 3~5 and repeats >5 from the previous assembly of 271 Mb reads derived from the transcriptome sequences of Cornus wilsoniana. 1.5 Primer design and screening Primers were designed to flank the EST-SSRs using software of Premier 3.0 (Premier Biosoft International, Palo Alto, CA). 84 pairs of primers met the demands of the expected product size at 100~300 bp and Tm value (minimum 55°C, optimum 57°C). Primer screening included two steps, unicity screening and polymorphism screening. 1.6 PCR amplification The PCR reactions were carried out in a total volume of 20 μL, which contained 1 μL template DNA (approximately 30 ng), 0.2 μL of each primer, 10 μL Mix (Taq polymerase, dNTPs and PCR buffer), and 8.8 μL sterile distilled water. The PCR amplification was performed on a PCR 310A (Hangzhou, China) with a program of 94°C for 5 min; 35 cycles of 94°C for 30 s, 57°C for 30 s, 72°C for 20 s, and a final extension for 10 min at 72°C. The PCR products were separated and detected by 3730xl DNAAnalyzer (ABI, U.S.A.). 1.7 Data analysis Bands data of EST-SSR amplification were transferred from size to binary 0/1 and scored as present (1) or absent (0). The diversity level of the gene loci was evaluated with the polymorphic information content (PIC, Botstein et al., 1980). The PIC provides an estimate of the discriminatory power of a locus by taking into account the number and the relative frequencies of the alleles. PIC values vary from 0 (monomorphic) to 1 (highly discriminative, with many of the alleles present in equal frequencies). PIC, allele numbers and gene diversity of each EST-SSR were calculated using MolKin v3.0 (http://www.ucm.es/info/prodanim/html/JP_Web.htm). Genetic distances of 20 genotypes of Cornus wilsoniana based on EST-SSRs were determined using POWER MARKER V3.25 (Liu and Muse, 2005). A dendrogram of UPGMA (un-weighted pair-group method with arithmetic means) cluster analysis was generated based on the neighbor-joining method as implemented in POWER MARKERV3.25with the tree viewed using FigTree v1.4.3. Population assignment analysis of the 20 genotypes was completed using GenAlEx 6.503 (http://biology.anu.edu. au/GenAlEx/). 2 Results 2.1 Characterization of the EST-SSR motifs distribution in the transcriptome of Cornus wilsoniana In total, 126,173 transcripts and 78,496 singletons were obtained from a previous assembly of 217 Mb reads derived from the Cornus wilsoniana transcriptome sequences of the fruits.174, 326 sequences from the transcriptome sequencing were examined for EST-SSR development. The total size of examined sequences was 49,770,105 bp. A total number of identified EST-SSRs was 8713. The number of sequences containing more than 1 EST-SSR was 597. An average distance between EST-SSRs of approximately was 15.95 kb. The proportion of the EST-SSR unit size was not evenly distributed. Most of these EST-SSRs were composed of dimer and trimer repeats. The dinucleotides were the dominant motifs. Specifically, the abundance of di-, tri-, tetra-, pentahexanucleotide motifs among these EST-SSRs were 6,635 (76.15%), 1,970 (22.61%), 78 (0.90%), 15 (0.17%), 15 (0.17%), respectively (Table 1). Molecular Plant Breeding 2021, Vol.12, No.5, 1-10 http://genbreedpublisher.com/index.php/mpb 4 Table 1 Characterization of 8,713 identified EST-SSR motifs Repeat motif Repeat number Total (%) 5 6 7 8 9 10 11 12 13 Dimer 2714 155


Background
Oil plants are widely spread in nature, which is not only used as essential nutriments in food, but also used broadly as ingredients of industrial products. There are approximately 300 million square kilometers of forest land throughout China, where oil plants cover 1 554 species belonging to 697 genera and 151 families (He et al., 2013a;He et al., 2013b). Cornus wilsoniana Wanger, a member of Cornaceae, is one of the important native woody oil plants with high-yield and high-oiliness characters in China. It distributes in the forest land with an elevation ranging from 100~1000 m, an average temperature of 18°C~25°C, and an annual precipitation of 1 000~1 570 mm. It can maintain maximum productivity for over 50 years and its life spans more than 200 years. An average of 50 kg dry fruits per year could be produced on a mature tree (Li et al., 2015). The oil content in dry fruits is 33%~36% and its oil has been taken as edible oil for over 100 years. Consumption of such oil is believed to prevent hyperlipidemia in Chinese folk recipe (Fu et al., 2012). Our previous study demonstrated the oil has a strong hypolipidemic effect, which is because of its very high content of PUFA (38.86%), especially linoleic acid and γ-linolenic acid. The activities of the two n-6 PUFA were possibly related to their high affinity to PPARδ. In addition, SFA/MUFA/PUFA ratio and EPA content are both good in the oil. Therefore, fruit oil of Cornus Genetic improvement of forest tree species is usually based on recurrent selection for general combining ability. That is, the frequency of desirable genes in the population is increased progressively through cycles of selection and crossing (Shelbourne et al., 1989). Because this process involves sexual reproduction for progeny testing, seed has been the traditional output from genetic improvement programs. However, traditional genetic improvement of forest tree species is a time-, resource-and cost-intensive activity, which greatly restricts the genetic improvement process. Recently, El-Kassaby introduced a breeding strategy called "breeding without breeding" (BWB), which has been proven highly convenient for tree breeding. The efficiency of this strategy has been evaluated by progeny testing, parental selection, and construction of pedigrees. The BWB strategy was demonstrated in a number of tree species and it has been extensively used in different areas of forest tree breeding, including phylogenetic analysis, mating systems, estimation of genetic parameters and breeding value, and spatial variation (Yuan et al., 2016). Selecting superior individuals directly from the open-pollinated progeny implantations, coupled with identifying the phylogenetic relationship and genetic diversity of the selected materials based on molecular markers, could shorten the breeding cycle.
Microsatellites, also called simple sequence repeats (SSR), are usually short, 2 to 8 nucleotides in length, which widely exists in the genome of eukaryotes and prokaryotes. SSR markers are popular in population genetics, forensic studies, and paternity analyses because they are informative (multi-allelic, meaning that they have more than two forms of a gene), codominant (heterozygous individuals can be distinguished from homozygous individuals), and reliable (Smith et al., 1997;Jones and Ardren, 2003). The development of SSR marker primer is the prerequisite for the study of SSR markers. However, there isn't any genomic information available on the species of Cornus wilsoniana. In our previous study, 11 SSRs primers from Cornus controversa (Yoshihiro et al., 2010), which is the same genus, Cornaceae, with Cornus wilsoniana were used to amplify DNA samples from Cornus wilsoniana. Results showed that all 11 SSRs developed for Cornus controversa couldn't produce any polymorphic bands. To help bridge this knowledge gap, our objective was to develop a set of EST-SSR markers based on transcriptome sequencing of Cornus wilsoniana and to examine their polymorphisms. Then, it will allow us to make preparations for subsequent genetic diversity analysis and molecular phylogeny of selected twenty genotypes from naturally open-pollinated populations in relation to fruit output in this paper.

Plant materials
6 samples for high-throughput transcriptome sequencing were collected from fruits of two genotypes during three developmental stages in the germplasm bank of Cornus wilsoniana at Hunan Academy of Forestry. 20 samples of SSR amplification were collected from fresh leaves of grafted seedlings in the germplasm banks of Cornus wilsoniana at Longshan forest farm, Lechang city, Guangdong province. Xishan forest farm, Chongqing city and Yangkou national forest farm, Shunchang county, Fujian province.

DNA extraction
Genomic DNA of above samples was extracted using QIAamp DNA Mini Kit.
1.3 High-throughput transcriptome sequencing 6 samples were delivered to Shanghai GR Biotechnology Co. Ltd for transcriptome sequencing using RNA-Seq technique.

Identification of EST-SSRs
EST-SSRs were searched using the MISA (http://pgrc.ipk-gatersleben.de/misa/) with unit size of 3~5 and repeats >5 from the previous assembly of 271 Mb reads derived from the transcriptome sequences of Cornus wilsoniana.

Primer design and screening
Primers were designed to flank the EST-SSRs using software of Premier 3.0 (Premier Biosoft International, Palo Alto, CA). 84 pairs of primers met the demands of the expected product size at 100~300 bp and Tm value (minimum 55°C, optimum 57°C). Primer screening included two steps, unicity screening and polymorphism screening.

PCR amplification
The PCR reactions were carried out in a total volume of 20 μL, which contained 1 μL template DNA (approximately 30 ng), 0.2 μL of each primer, 10 μL Mix (Taq polymerase, dNTPs and PCR buffer), and 8.8 μL sterile distilled water. The PCR amplification was performed on a PCR 310A (Hangzhou, China) with a program of 94°C for 5 min; 35 cycles of 94°C for 30 s, 57°C for 30 s, 72°C for 20 s, and a final extension for 10 min at 72°C. The PCR products were separated and detected by 3730xl DNA Analyzer (ABI, U.S.A.).

Data analysis
Bands data of EST-SSR amplification were transferred from size to binary 0/1 and scored as present (1) or absent (0). The diversity level of the gene loci was evaluated with the polymorphic information content (PIC, Botstein et al., 1980). The PIC provides an estimate of the discriminatory power of a locus by taking into account the number and the relative frequencies of the alleles. PIC values vary from 0 (monomorphic) to 1 (highly discriminative, with many of the alleles present in equal frequencies). PIC, allele numbers and gene diversity of each EST-SSR were calculated using MolKin v3.0 (http://www.ucm.es/info/prodanim/html/JP_Web.htm).
Genetic distances of 20 genotypes of Cornus wilsoniana based on EST-SSRs were determined using POWER MARKER V3.25 (Liu and Muse, 2005). A dendrogram of UPGMA (un-weighted pair-group method with arithmetic means) cluster analysis was generated based on the neighbor-joining method as implemented in POWER MARKERV3.25with the tree viewed using FigTree v1.4.3.

Characterization of the EST-SSR motifs distribution in the transcriptome of Cornus wilsoniana
In total, 126,173 transcripts and 78,496 singletons were obtained from a previous assembly of 217 Mb reads derived from the Cornus wilsoniana transcriptome sequences of the fruits.174, 326 sequences from the transcriptome sequencing were examined for EST-SSR development. The total size of examined sequences was 49,770,105 bp. A total number of identified EST-SSRs was 8713. The number of sequences containing more than 1 EST-SSR was 597. An average distance between EST-SSRs of approximately was 15.95 kb. The proportion of the EST-SSR unit size was not evenly distributed. Most of these EST-SSRs were composed of dimer and trimer repeats. The dinucleotides were the dominant motifs. Specifically, the abundance of di-, tri-, tetra-, pentahexanucleotide motifs among these EST-SSRs were 6,635 (76.15%), 1,970 (22.61%), 78 (0.90%), 15 (0.17%), 15 (0.17%), respectively (Table 1). Considering sequence complementary, frequencies of classified repeat types of identified EST-SSR motifs were summarized in Table 2. The AG/CT motif was the most common dimeric EST-SSRs motif (53.5%), whereas the CG/CG (0.44%) microsatellites were present only at low abundances. Among the trimeric microsatellites, AAG/CTT (5.53%) and ATC/ATG (4.27%) were the most common motifs. There were no obvious dominant motifs among the tetra-, penta-, hexa-nucleotide motifs.

Overall EST-SSR diversity
20 genotypes of Cornus wilsoniana from two provinces were evaluated using 15EST-SSR markers. A total of 43 alleles were detected at 15 loci (Table 3). A wide range of allelic variants were observed for each locus. The number of alleles per locus ranged from 2 alleles (SW24, SW33, and SW47) to 6 alleles (SW09), with an average of 2.87 alleles across the 15 loci. The parameter PIC refers to the value of a marker for detecting polymorphism within a population, depending on the number of detectable alleles and the distribution of their frequency, and has been proved to be a general measure of how informative a marker is (Guo and Elston, 1999); the higher the PIC value is, the more informative a marker is. The PIC values of the 15 primers exhibiting polymorphisms among the 20 individuals ranged from a minimum of 0.05 (SW47) to a maximum of 0.58 (SW48), with an average of 0.36,suggesting that the EST-SSR markers developed had a moderate level of polymorphisms .According to Nei's (Nei, 1973),the highest level of genetic diversity value (0.659) was observed with locus SW48, the lowest level of genetic diversity value (0.049) was observed with locus SW47, and a mean diversity of 0.427 was observed (Table 3).

Genetic relationship between20 genotypes of Cornus wilsoniana according to EST-SSR profiles
Genetic distance reflects the genetic relationships among materials. Simple sequence repeats (SSRs) are high-resolution markers that can identify different individuals within the same species. The combination of phenotypic selection, genetic distance-based phylogenetic analysis of selected individuals using SSR markers, and phylogenetic relationship-based field deployment would simplify breeding activities, decreasing inbreeding. A phylogenetic dendrogram of 20 genotypes of Cornu swilsoniana from two provinces with high fruit output was constructed based on their similarity matrix derived from the 15 SSR loci profiles. 20 individuals were mainly divided into four cluster groups (Figure 1). Group A included three genotypes (#1, #4, and #15), Group B included four genotypes (#2, #5, #6, and #7), Group C included six genotypes (#9, #10, #11, #12, #13, and #18), and Group D included seven genotypes (#3, #8, #14, #16, #17, #19, and #20). In the four cluster groups above, it was interesting to find that individuals came from the same collected site were tied to one clade as #1 and #15 in Group A, #2 and #7, #5 and #6 in Group B, #9 and #13, #10 and #11 in Group C, #16, #17, #19 and #20 in Group D, respectively. In this study, the SSRs clearly illustrated the well-documented similarities reflected by specific alleles at all the loci studied. The genetic distance-based results observed in the unrooted neighbor-joining tree revealed four major groups and agreed with genetic similarity analysis using UPGMA (Figure 2).

Discussions
SSRs have already been used broadly for a variety of purposes such as genetic diversity. SSRs can be found in either the coding or the non-coding regions of the genome. However, EST-SSRs score only the expressed region of the genome. Compared to SSR, EST-SSRs have some advantages, which can be developed at no cost from the transcriptome sequencing data and may be used across a number of related species. More importantly, EST-SSRs could illustrate marker-trait associations. In the present study, 8713 EST-SSRs were identified from the transcriptome sequencing of Cornus wilsoniana. 15 pairs of EST-SSRs primers were developed with polymorphism and successfully used for genetic diversity analysis of twenty genotypes of Cornus wilsoniana in relation to fruit output. Those EST-SSR markers are informative, codominant and reliable, and could be applied in the improvement programs of Cornus wilsoniana in the future. However, SSRs primers from Cornus controversa, which is the same genus, Cornaceae, were failed in DNA samples from Cornus wilsoniana.
In this study, dimer and trimer repeats were found to be most abundant, which is in agreement with reports on other plants. The AG/CT motif was the most common dimeric EST-SSRs motif in Cornus wilsoniana.
On the highly efficient of EST-SSRs, 20 genotypes of Cornus wilsoniana has been produced clear distinctions using population assignment analysis. The genotypes clustered into two distinct groups according to the source of those individuals (Figure 3). Only #18 from the LC (Lechang, Guangdong province) group was much close to the group of JX (Jiangxi province). This showed #18 possibly originated from the JX. According to the historical record, farmers in Lechang city planted seedlings of Cornus wilsoniana from the JX province.
The values of pair-wise comparisons of Nei's genetic distance (D) between the genotypes analyzed were computed from the combined data for 15 primers. The pairwise distance ranged from 0.0000 to 0.9259 (Table 4). A comparatively higher genetic distance (0.9259) was observed between #7 and #15. This indicated that genetically they are diverse compared to those having a lower genetic distance value. Basically, this value is an indication of their genetic dissimilarity since #7 was collected at Yudu county in JX (N26°12′56.86″, E115°37′6.29″) and #15 was collected at LC (N25°06′51.44″, E113°19′24.93″). On the other hand, the lowest genetic distance (0.000) was found between #9 and #13 indicating that they are much closer to their genetic make-up. Both of them were collected from the same location with a close geographic distance, and their genetic distance revealed that there is a close genetic relationship between them.
The results of the present study may be finally tuned through utilization of more informative markers and more diverse set of genotypes. In cluster analysis, it is possible to both assess genetic diversity of elite genotypes and select genotypes with higher genetic diversity using EST-SSRs. This finding clearly demonstrated the reliability, usefulness, and efficiency of EST-SSRs in analyzing genomic diversity, genotype identification, plus tree screenings as parental breeding material and plant variety protection in the improvement programs of Cornus wilsoniana.