Genome Wide Characterization, Comparative and Genetic Diversity Analysis of Simple Sequence Repeats in Cucurbita Species

Simple sequence repeats (SSRs) are widely used in mapping constructions and comparative and genetic diversity analyses. Here, 103,056 SSR loci were found in Cucurbita species by in silico PCR. In general, the frequency of these SSRs decreased with the increase in the motif length, and di-nucleotide motifs were the most common type. For the same repeat types, the SSR frequency decreased sharply with the increase in the repeat number. The majority of the SSR loci were suitable for marker development (84.75% in Cucurbita moschata, 94.53% in Cucurbita maxima, and 95.09% in Cucurbita pepo). Using these markers, the cross-species transferable SSR markers between C. pepo and other Cucurbitaceae species were developed, and the complicated mosaic relationships among them were analyzed. Especially, the main syntenic relationships between C. pepo and C. moschata or C. maxima indicated that the chromosomes in the Cucurbita genomes were highly conserved during evolution. Furthermore, 66 core SSR markers were selected to measure the genetic diversity in 61 C. pepo germplasms, and they were divided into two groups by structure and unweighted pair group method with arithmetic analysis. These results will promote the utilization of SSRs in basic and applied research of Cucurbita species.


Introduction
The Cucurbita genus (2n = 2x = 40), belonging to the Cucurbitaceae family, contains more than 13 species [1]. Most Cucurbita species are wild resources, and only three domesticated species, Cucurbita maxima, Cucurbita moschata, and Cucurbita pepo, are widely cultivated and have become important food crops globally [2]. At present, Asia has the largest pumpkin cultivation area, and China is the main producer of pumpkins. In 2012, the planting area of pumpkin was approximately 3.8 × 10 4 Hm 2 in China, and the total output reached 7.0 × 10 6 tons (http://www.fao.org/faostat/zh/#data/QC/visualize, 2020). Due to the fact of their long history of cultivation and domestication, Cucurbita species show a greater diversity in fruit shape, size, and color than other Cucurbitaceae species [3]. Furthermore, Cucurbita species have strong roots and exhibit good adaption to different biotic and abiotic stresses, such as cold, viruses, and salinity, and so they are widely used as rootstocks in grafting [4,5]. Although they are a common global crop, fundamental be useful for research on the population structure, genetic diversity, molecular-assisted selection, and map-based cloning in Cucurbita species.

Plant Materials
All of the pumpkin accessions used in this study were introduced from the National Crop Germplasm Resource Platform of China (platform of vegetable germplasm resources) in 2018. Four of the accessions came from Russia, one from America, and 56 accessions were from 17 provinces in China. The number and sources are shown in Table S1.

Genome SSR Identification and Development in Cucurbita Genomes
The genome information of watermelon, melon, cucumber, and pumpkin was downloaded from http://cucurbitgenomics.org/ (2020). To develop a set of higher polymorphic SSR primers for the future study, the criteria used for microsatellite identification in this study was from 2 to 8 bp, and mononucleotides were not considered due to the difficulty in distinguishing bona fide microsatellites from sequencing or assembly error. The microsatellite identification tool (MISA) was used to identify and analyze SSR markers including perfect and compound microsatellites. The specific screening details were as follows: repeats with a minimum length of 18 bp (for di-and tetra-nucleotides), 20 bp (for penta-nucleotides), 24 bp (for hexa-nucleotides), 21 bp (for hepta-nucleotides), and 24 bp (for octa-nucleotides). The oligonucleotide primers for these SSRs were designed according to the flanking genomic sequence using Primer3 software (v. 1.1.4). Primers were designed to generate amplicons of 100-300 bp in length with the following minimum, optimum, and maximum values for Primer3 parameters: primer length (bp): 18-20-24; Tm ( • C): 50-55-60. Other parameters used the default program values.

In Silico PCR and Synteny Analysis of Cross-Species SSR Markers
Using the SSR markers from pumpkin (C. pepo MU-CU-16) genome as a reference, we comparatively analyzed the genome SSR information of cucumber (Gy14), melon (DH92), watermelon (97103), C. moschata cv. Rifu, and C. maxima cv. Rimu. This was performed with a custom Perl script that used the NCBI BLASTN program as a search engine with an expected value of 10 and filtering. We allowed up to five nucleotide mismatches at the 5'-end of the primer, no mismatches at the 3'-end, and a minimum of 90% overall match homology. To establish the syntenic relationships of chromosomes between C. pepo with C. sativus, C. lanatus, C. melo, C. maxima, and C. moschata, we discarded these SSR markers with multiply physical locations in the same genome, only retaining the SSR markers in the genomes which had a single in silico PCR product. In addition, these shared SSR markers located on the unanchored scaffolds of the chromosome were further filtered. The SSR marker-based syntenic relationships were finally visualized with visualization blocks in Circos software v.0.55 [29].

Genomic DNA Extraction, PCR Amplification, and Electrophoresis Detection
Genomic DNA of all the materials was extracted using 1 g of young leaf sample with the cetyl trimethyl ammonium bromide (CTAB) method [30]. The extracted DNA was dissolved in 1× Tris-EDTA (TE) buffer (Solarbio, Cat: T1121). The concentration and purity were detected by the Nanodrop-2000 nucleic acid analyzer. The extracted DNA was diluted to 30 ng/µL as working solution and kept at 4 • C.
Each PCR reaction contained 1 µL of template DNA, 0.5 µM each of forward and reverse primers, 5 µL mastermix (GenStar, Cat: A012-105), and 3 µL ddH 2 O. The amplification was carried out as follows: An initial denaturing step at 95 • C for 5 min, 94 • C for 30 s, followed by 6 cycles of 68-58 • C for 45 s. Each cycle was reduced by 2 • C, each annealing time was 1 min, and 72 • C for 1 min; 30 cycles of 94 • C for 30 s, 50 • C for 30 s, and 72 • C for 1 min. In the last cycle, primer extension was performed at 72 • C for 10 min.
The PCR products were analyzed by 9% polyacrylamide gel electrophoresis, and a 100 bp DNA ladder was used as the reference marker. After electrophoresis, silver staining was performed to display the PCR products, and photos were taken for preservation.

Calculation of Clustering
The heterozygosity (He), observer gene number (Na), effective alleles (Ne), observed heterozygosity (Ho), and the Shannon-Weaver index (I) were calculated using Pop-gen software v.1.32 (Canada, University of Alberta). Polymorphic information content (PIC) of SSR markers was computed using EXCEL (China, WPS of JINSHAN). When the PIC of an SSR marker was below 0.25, it was considered as a low polymorphic marker, and a marker was considered highly polymorphic if its PIC was above 0.5.
These amplification bands of each SSR primer were separated using polyacrylamide gel-electrophoresis. The band patterns were visualized with silver staining, and gel images were taken with a digital camera. In the same location, the presence of a band was marked as "1", the absence of a band was marked as "0", and a missing band was marked as "−1". In this study we used Genalex-6 software [31] to conduct the matrix calculation of SSR marker data which had been assigned a value, then transformed it into a triangle matrix, saved it as a mega-file, finally, imported the mega-file into the Mega-6.0 software (USA, Tamura, K team), and selected the unweighted pair group method with arithmetic (UPGMA) algorithm in the "phylogeny" dropdown menu to draw the cluster diagram [32].
The software Structure v.2.3 (USA, UChicago; Britain, Oxon) was used to analyze the population structure [33,34]. An admixture model and correlated allele frequencies were used to estimate the number of the populations. For each of the K-values (ranging from 1 to 5), ten independent runs were performed with a burn-in period of 100,000 followed by 500,000 Markov chain Monte Carlo runs. The optimal K-values depends on the peak of K = mean (|Ln"P(D)|)/(sdLnP(D)). Based on the structure results, the most probable K-value was analyzed using Structure Harvester (http://taylor0.biology.ucla.edu/struct_ harvest/, 2020).

The Frequency and Distribution of Different SSR Types in Cucurbita Genomes
A total of 103,056 microsatellite sequences were identified in the Cucurbita genome, including 34,375 SSR loci in the 269.9 Mb draft genome sequence of C. moschata cv. Rifu, 30,577 SSR loci in the 271.4 Mb draft genome sequence of C. maxima cv. Rimu, and 38,104 SSR loci in the 263 Mb draft genome sequence of C. pepo MU-CU-16 (Table S2). Cucurbita pepo had the largest number of markers with the smallest reference genome size, indicating the highest average density of markers (145 SSR/Mb). To obtain more information, we used C. pepo with a higher marker density as the control for the following comparative genomic analysis.
Here, we analyzed repeat types ranging from di-nucleotide to octa-nucleotide. Among all of these nucleotide motifs, di-nucleotide motifs (41.0%) were the most common type, accounting for 41.78%, 39.90%, and 41.01% of the total SSR loci discovered in C. moschata, C. maxima, and C. pepo, respectively, followed by tri-nucleotide motifs (16.97%, 19.19%, and 17.88%, respectively), whereas octa-nucleotide motifs (3.78%, 3.76%, and 3.38%, respectively) were the least represented repeat type in the three Cucurbita genomes (Table S2). In general, the frequency of the total SSR loci decreased with the increase in motif length, except for hepta-nucleotide SSRs.
We further examined the distribution of SSR motifs with regard to their repeat numbers ( Figure 1). For all the repeat types, with an increase in the repeat number, the SSR frequency decreased sharply, and this change was more obvious in the longer SSR motifs ( Figure 1). Consequently, the mean repeat numbers in the di-nucleotides were the highest of all of the repeat types. The analysis of individual SSR types revealed that some specific motifs were more prevalent than others in each class ( Figure S1). For example, the AT motif was the most frequent di-nucleotide type in all three genomes, accounting for 31.61% (in C. moschata), 28.81% (in C. maxima), and 30.45% (in C. pepo) of the total di-nucleotide loci. Similarly, the AAT, AAAT, AAAAT, AAAAAT, AAAAAAT, and AAAAAAAT motifs (AATAATAT motif in C. maxima) were the most frequent types in each class. These results indicated that AnT-rich motifs were the most abundant in all SSR motifs in the C. moschata, C. maxima, and C. pepo genomes. We also investigated the SSR density in each chromosome of the three Cucurbita species and found that the density of microsatellite loci was not correlated with the chromosome size (Table S3). For example, in the C. moschata genome, the SSR density of the longest chromosome (Chr04) had a medium density of SSRs, while Chr02, which is much shorter than Chr04, had the highest SSR density. A similar trend was also observed in the other two genomes, indicating that the distribution of SSRs was uneven in the Cucurbita chromosomes. To better understand the distributions of different SSR motifs, we further checked their frequencies on each chromosome ( Figure 2). Our results showed that the distribution of different SSR types on the chromosomes corresponded with their frequencies and SSR density in the Cucurbita whole genomes.
The genomic sequences containing these microsatellites were screened for PCR primer design, and 94,272 SSR microsatellite loci were found to contain suitable flanking sites for SSR primer design. While C. moschata had the lowest proportion of SSRs suitable for primers design (84.75%), the percentages in C. maxima and C. pepo reached 94.53% and 95.09%, respectively (Table S2). Though the di-nucleotide repeat types were the most frequent in all three genomes, they did not exhibit good performance in primer design. Interestingly, the hexata-nucleotide repeat types had the highest ratio of SSRs suitable for primer design in all three genomes, followed by penta-nucleotide repeat types, indicating that the longer motifs were more suitable for primer design in Cucurbita species. Finally, a total of 91,248 SSR primers (28,194 in C. moschata, 28,061 in C. maxima, and 34,993 in C. pepo) were designed, with some primers including more than one SSR locus as the compound SSR (Tables S4-S6).

Chromosome Synteny Relationships of C. pepo with Other Cucurbitaceae Species
In order to understand the universality and correlation of SSR markers among Cucurbitaceae crops, we compared and analyzed the cross-species SSR markers between C. pepo and other Cucurbitaceae species by in silico PCR. We identified 391 cross-species SSR markers between C. pepo and C. sativus, 425 cross-species SSR markers between C. pepo and C. melo, 717 cross-species SSR markers between C. pepo and C. lanatus, 11,732 crossspecies SSR markers between C. pepo and C. maxima, and 15,274 cross-species SSR markers between C. pepo and C. moschata (Tables S7-S11). The collinear blocks to inversion blocks ratio was 26:26 between the C. pepo and C. sativus genomes, 25:36 between the C. pepo and C. melo genomes, 51:38 between the C. pepo and C. lanatus genomes, 154:158 between the C. pepo and C. maxima genomes, and 153:152 between the C. pepo and C. moschata genomes. Interestingly, the ratio of collinear blocks to inversion blocks was nearly 1:1 among the three Cucurbita species. Each C. pepo chromosome shared 3-36 SSR markers with C. sativus, C. lanatus, or C. melo. However, most of the C. pepo chromosome shared a larger number of SSR markers (3-1,436) with C. maxima or C. moschata. The C. pepo syntenic block, CpeCma7, had the largest number of shared SSR markers (i.e., 296) between C. pepo chromosome Cpe1 and C. maxima chromosome Cma4.
The physical positions of those common shared markers were compared. The main syntenic relationships between C. pepo and other Cucurbitaceae species are listed in Table 1, and the syntenic relationships visualized for C. pepo with C. lanatus, C. melo, and C. sativus are shown in Figure 3. The main syntenic relationships among the chromosomes revealed complex mosaic patterns. In Figure 3, each C. pepo chromosome was syntenic to more than two chromosomes in other Cucurbitaceae species. The C. pepo chromosomes Cpe9 and Cpe16 had the simplest syntenic pattern with watermelon, and each of them was mainly syntenic to one watermelon chromosome (Table 1). Cpe9 was syntenic to watermelon chromosome W5, and 14 commonly shared SSR markers were found between Cpe9 and W5. From the markers CpeSSR15544 to CpeSSR16107, there were three blocks belonging to watermelon chromosome W5, and each block contained at least four SSR markers. According to the continuous physical positions of these markers on both of the reference genomes, the syntenic blocks CpeWM37 and CpeWM38 showed an inversion pattern, and the syntenic block CpeWM39 showed a collinear pattern between C. pepo and C. lanatus. Similar comparisons were carried out between C. pepo and C. sativus or C. pepo and C. melo using the cross-species SSR markers. The C. pepo chromosomes Cpe7, Cpe8, Cpe11, and Cpe20 had the simplest syntenic pattern with C. sativus, and each of them was only syntenic to one cucumber chromosome. Meanwhile, the simplest syntenic patterns between C. pepo and C. melo were mainly found on chromosomes Cpe15, Cpe18, Cpe19, and Cpe20. The most complicated syntenic pattern was found on C. pepo chromosome Cpe1, which corresponded to five chromosomes of C. moschata, four chromosomes of C. maxima, seven chromosomes of C. lanatus, three chromosomes of C. sativus, and five chromosomes of C. melo.  (3), Cmo17 (24) Cma3 (5), Cma4(1,103), Cma9 (9), Cma17 (19) Cla1 (5), Cla5 (21), Cla6(4), Cla7 (13), Csa3 (6), Csa5 (28), Csa6 (3) Cme3 (4), Cme6 (4)  The syntenic relationships among different Cucurbita species were simple and clear. For instance, each of the 20 chromosomes in C. pepo was mainly syntenic with one chromosome in C. moschata or C. maxima (Figure 4), implying that the chromosomes in the Cucurbita genomes were highly conserved during evolution. Our results also showed that there were three main relationship patterns among the C. pepo, C. maxima, or C. moschata genomes, including (1) the eleven linear relationship chromosomes between C. pepo and C. maxima or C. moschata such as Cpe2-Cmo1-Cma1. Most of the cross-markers in the corresponding chromosomes showed collinear patterns. (2) There were eight inverted relationship chromosomes between C. pepo and C. maxima or C. moschata. For example, the chromosome Cpe1 of C. pepo was inverted to the chromosome Cmo4 of C. moschata and Cma4 of C. maxima. (3) There was a mosaic pattern between C. pepo and C. maxima or C. moschata, for example, Cpe4-Cmo11-Cma11. Chromosome synteny between C. pepo and C. sativus was based on 391 cross-species markers; synteny between C. pepo and C. melo was based on 425 cross-species markers; synteny between C. pepo and C. lanatus was based on 717 cross-species markers. W1-W11 represent C. lanatus' eleven chromosomes, M01-M12 represent C. melo's twelve chromosomes, C01-C07 represent C. sativus's seven chromosomes, and LG01-LG20 represent C. pepo's twenty chromosomes. Syntenic blocks are connected by the same color lines from C. pepo chromosomes. Chromosome synteny between C. pepo and C. moschata was based on 14,276 cross-species markers; synteny between C. pepo and C. maxima was based on 10,655 cross-species markers. Cpe1-Cpe20 represent C. pepo's twenty chromosomes, Cmo1-Cmo20 represent C. moschata's chromosomes, and Cma1-Cma20 represent C. maxima chromosomes. The syntenic relationship between C. pepo and C. moschata are connected with the green color lines, and the syntenic relationship between C. pepo and C. maxima are connected with the yellow color lines.

The Genetic Diversity and Population Structure Analysis of the C. pepo Germplasm
In our preliminary study, approximately 400 SSR markers were screened using 61 accessions of C. pepo germplasm. Finally, a total of 66 core SSR markers were selected based on the allelic number, the genomic coverage, and the efficiency of PCR amplification (Table S12). These markers exhibited clear band spectrums and were evenly distributed on the chromosomes. In this study, 276 alleles were detected by the 66 SSR markers in the 61 C. pepo accessions with an average of 4.18 loci per SSR marker. The number of Na ranged from two to nine. The highest number of Na was nine, which was detected by SSR010246, SSR026560, SSR026918, SSR027656, and SSR026980, followed by SSR011546, SSR003315, and SSR026797 with eight alleles. The number of Ne varied from 1.03 to 6.07 with an average of 2.31. The SI ranged from 0.083 to 1.96 with an average of 0.83. The PIC value ranged from 0.03 to 0.83 with an average of 0.43 (Table S13).
We further used a model-based approach for population structure analysis of the 61 C. pepo accessions. According to the results of the structural operation, when K = 2, ∆K showed a significant peak value, indicating that the 61 accessions used in this study could be obviously divided into two groups ( Figure S2), named group I and group II. The five C. pepo subsp. ovifer accessions (2, 29, 30, 31, and 45) were clustered into group I (8.20%), and all of them were wild materials. Most of the C. pepo subsp. pepo accessions were clustered into group II (91.80%), which were all cultivated materials ( Figure 5A). This indicated that the SSR markers we used could clearly distinguish the cultivated materials from the wild materials. The backgrounds of the cultivated accessions were narrow, except for accession 45 in group I, which should have a complex genetic background, similar to accession 14 and 16 in group II. The UPGMA analysis revealed that the 61 C. pepo accessions were divided into two clusters ( Figure 5B), which was consistent with their population structure. The five C. pepo subsp. ovifer accessions were clustered together at the base of the phylogenetic tree, which further supported our population structure analysis.

Frequency, Distribution, and Characterization of Microsatellites in Three Cucurbita Genomes
With the development of sequencing technology, the discovery and mining of genomic SSR loci has successfully been applied in many plant species, such as cotton [35,36], foxtail millet [37], cucumber [11], watermelon [13], tobacco [38], and melon [12]. Cucurbita moschata, C. maxima, and C. pepo are important species that are cultivated worldwide, and their graft genomes were released several years ago. However, there remains little information on the development of genome-wide SSR markers in Cucurbita species, which has strongly limited their genetic research. In the present study, genome-wide microsatellites were identified and characterized in the three Cucurbita species. A total of 34,375, 30,577, and 38,104 SSR loci were detected in the C. moschata, C. maxima, and C. pepo genomes, respectively. The smallest genome size and maximum number of microsatellites were detected in C. pepo, indicating that there was no direct correlation between genome size and the number of microsatellites. The density of the SSR markers in the three Cucurbita species was approximately 113-145 SSR/Mb, which is lower than that in cucumber (552 SSR/Mb) but comparable to that in melon (109 SSR/Mb) and watermelon (111 SSR/Mb) [11][12][13]. In addition to the natural differences among different genomes, many other factors could affect the deviations in SSR density such as the software and parameters used for microsatellite detection. We suspect that the main reason for the difference in SSR density between Cucurbita species and cucumber was the different selection criteria for the SSR loci, e.g., the repeat types (di-to octa-nucleotides versus mono-to penta-nucleotides) and the minimum lengths (18 bp versus 12 bp).
We further analyzed the distribution and frequency of microsatellites in the three Cucurbita species (Figures 1 and 2). In most cases, a negative correlation was observed between the microsatellite frequency and the number of repeat units. Consistent with previous studies in watermelon and melon, the di-nucleotide repeats were the most abundant SSRs, followed by tri-, tetra-, penta-, hepta-, hexa-, and octo-nucleotide repeats [12,13]. This is something that varies in different species. For example, the density of tetra-nucleotide repeats was highest in C. sativus (164.2 SSR/Mb), Populus trichocarpa (144.9 SSR/Mb), Medicago truncatula (102.8 SSR/Mb), and Vitis vinifera (171.3 SSR/Mb), whereas the density of tri-nucleotide repeats was the highest in Arabidopsis thaliana (146.6 SSR/Mb), Glycine max (103.1 SSR/Mb), and Oryza sativa (220.1 SSR/Mb) [11]. Some studies have revealed that the di-nucleotide motifs with high repeat numbers are more abundant and polymorphic compared to those with short repeat units [39]. The reason is that di-nucleotide repeats are much less frequent in coding regions than in non-coding regions [40,41]. It is also reported that the exon region contains more triplet SSRs than other repeats, and triplet SSR motifs may be related to high frequencies of certain amino acids [42,43]. These SSRs in the coding sequence may have the potential to affect all aspects of genetic functions including gene regulation, development, and evolution. However, the function of genes that contain SSRs and the role of these SSR motifs in plant genes are less studied and poorly understood [44]. It is interesting to note that many bacterial SSRs in the intergenic regions have regulatory functions [45], and whether these SSR motifs in the intergenic regions of Cucurbita species play a role in specialization or gene regulation should be further studied.
The low number of repeat motifs was predominant, and the AT-rich motifs in particular contributed a large proportion of all types of di-nucleotide repeats in the Cucurbita species ( Figure S1). The AT or AAT type is more common in dicots [13], which is consistent with our results. Recently, the characterization of SSR markers in bitter gourd showed that the tri-nucleotide repeat units were the main type, with an overrepresentation of A/T, AT/AT, AAT/ATT, and AAAT/ATTT motifs in all kinds of repeat types [46]. This has also been found in other genomes [11,47,48]. On the contrary, the frequency of the GC or CCG type was much lower at the genomic level [49,50], and the GC, TC, or GA types have relatively stable structures. Most of the AT types are distributed in non-genic regions, while the TC/GA types are primarily distributed in coding sequences [38].

Chromosome Synteny Analysis between C. pepo and Other Cucurbitaceae Species
Chromosome synteny analysis has been conducted in many species, such as cucumber, watermelon, and melon, but few studies have been conducted on the chromosome synteny among different Cucurbita species or between Cucurbita species and other Cucurbitaceae crops. In this study, the genome-wide SSR development from the three Cucurbita genomes provided the possibility to identify their syntenic relationships at a high-resolution level via in silico PCR analysis. Though the sizes of the pumpkin genomes are similar to that of other sequenced Cucurbitaceae species, the number of cross-species SSR markers in the Cucurbita genus is much higher. Compared to hundreds of shared markers in previous studies [14], we identified many more cross-species transferable SSR markers in the Cucurbita genus that were used for chromosome synteny analysis. The WGD event in Cucurbita, which has not been observed in other sequenced Cucurbitaceae species, such as cucumber [8], melon [10], and watermelon [9], may be a possible reason leading to the high abundance of SSR markers.
According to the cross-species transferable SSR markers, 52, 61, and 89 syntenic blocks distributed on all chromosomes were identified between C. pepo with cucumber, melon, and watermelon, respectively ( Figure 3). Similar homoeologous blocks were detected by whole-genome comparison [22], suggesting that the cross-species transferable SSR markers are useful and reliable in genome comparisons and chromosome synteny analyses. In most cases, there were multiple synteny blocks detected between C. pepo and other Cucurbitaceae species due to the fact of chromosome fission. The most complicated syntenic pattern existed on chromosome Cpe1 of C. pepo, which was syntenic to seven watermelon chromosomes, indicating that complicated structural changes occurred after their divergence from a common ancestor. The ratio of collinear blocks to inversion blocks was nearly 1:1 in Cucurbita, and the reason for this may be that genome duplication and inter-chromosomal exchanges occurred randomly during chromosome evolution.
Based on the cross-species transferable SSR markers, we identified more highly conserved syntenic blocks among Cucurbita species than melon, cucumber, or watermelon. We found that each block among three Cucurbita species of the same genus contained many more shared common SSR markers, and these homoeologous chromosomes were much conserved, which further confirmed their close evolutionary relationships in the Cucurbitaceae family. For example, the C. pepo syntenic block contained more markers than that in melon [12]. Due to the WGD during chromosome evolution and speciation, the number of the chromosomes and cross-markers increased. However, those blocks were highly conserved during chromosome evolution among different Cucurbitaceae species. The chromosomal pair analysis by cross-species SSR markers showed that there were eight large-scale inversions on different chromosomes between C. pepo and C. moschata or between C. pepo and C. maxima, indicating that C. pepo experienced more complex evolutionary processes (Figure 4). Interestingly, Chr4 contained a mosaic region among Cucurbita species. The reason might be due to the fact of genome duplication, large-scale inter-chromosomal exchanges, or long-term evolutionary forces. Whether the partial inversion of chromosome 4 in C. pepo will affect the mapping, cloning, and study of some traits is worth exploring in the future.

The Genetic Diversity and Population Structure of C. pepo Germplasm
Previously, because of the scarcity of genomic sequences, there were limited molecular markers available to study the genetic diversity and population structure of Cucurbita species. Though the genetic diversity of Cucurbita species has been evaluated using sequence-related amplified polymorphism (SRAP), AFLP, SSR, RAPD, and inter-simple sequence repeats (ISSRs), most of the markers used have high randomness, lack precise location information, and have low genomic coverage and poor polymorphism, which greatly limit their application [18,51,52]. With the draft genome available for three cucurbit crops, we developed 91,248 SSR markers with precise physical locations on chromosomes and evaluated the genetic diversity of 61 pumpkin accessions using 66 core SSR markers.
The population structure of 61 accessions revealed that the background of some materials was mixed between group I and group II, suggesting that these accessions may have undergone gene exchange between two subspecies. The materials were collected from different provinces in China, and they were obviously classified into two subspecies, subsp. ovifer (or subsp. texana) and subsp. pepo, which is consistent with previous studies [21,51]. However, the three subspecies of C. pepo classified by Decker are C. pepo subsp. fraterna (Bailey) Andres, C. pepo subsp. texana (Scheele) Filov, and C. pepo subsp. pepo [53]. The putative ancestor for C. pepo, namely, subsp. fraterna from northeastern Mexico, has been considered a wild gourd [54]. The population structure and UPGMA results indicated that these accessions of C. pepo in China come from the common ancestor. Thus, there have great prospects for germplasm improvement.
The Cucurbita genus contains several economically important crops, but its breeding has lagged behind the other Cucurbitaceous crops. Limited high-quality cultivars cannot meet the production requirements. Thus, different breeding programs can be facilitated using marker assisted selection. The whole-genome SSR markers detected in this study will promote the development and utilization in basic and applied research of Cucurbita species.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/horticulturae7060143/s1, Figure S1: The top five types of each SSR repeat motif and their frequencies in C. moschata, C. maxima, and C. pepo, Figure S2: The optimal K-values analysis by using Structure Harvester, Table S1: The list of the C. pepo introduction accessions, Table S2: The distribution of different nucleotide repeats in the genome of three Cucurbita species, Table S3: The distribution of SSR loci on different chromosomes in C. moschata, C. maxima, and C. pepo, Table S4: The identified SSR markers in C. moschata, Table S5: The identified SSR markers in C. maxima, Table S6: The identified SSR markers in C. pepo, Table S7: List of cross-species SSR markers between C. pepo and C. sativus identified by in silico PCR, Table S8: List of cross-species SSR markers between C. pepo and melon identified by in silico PCR, Table S9: List of cross-species SSR markers between C. pepo and watermelon identified by in silico PCR, Table S10: List of cross-species SSR markers between C. pepo and C. maxima identified by in silico PCR, Table S11: List of cross-species SSR markers between C. pepo and C. moschata identified by in silico PCR, Table S12: The total SSR markers in C. pepo genetic diversity and population structure analysis, Table S13: Polymorphism and allelic diversity of SSR markers in C. pepo materials.  Data Availability Statement: All generated or analyzed data during this study are reflected in the present article.