A High-Density EST-SSR-Based Genetic Map and QTL Analysis of Dwarf Trait in Cucurbita pepo L.

As one of the earliest domesticated species, Cucurbita pepo (including squash and pumpkin) is rich in phenotypic polymorphism and has huge economic value. In this research, using 1660 expressed sequence tags-simple sequence repeats (EST-SSRs) and 632 genomic simple sequence repeats (gSSRs), we constructed the highest-density EST-SSR-based genetic map in Cucurbita genus, which spanned 2199.1 cM in total and harbored 623 loci distributed in 20 linkage groups. Using this map as a bridge, the two previous gSSR maps were integrated by common gSSRs and the corresponding relationships around chromosomes in three sets of genomes were also collated. Meanwhile, one large segmental inversion that existed between our map and the C. pepo genome was detected. Furthermore, three Quantitative Trait Loci (QTLs) of the dwarf trait (gibberellin-sensitive dwarf type) in C. pepo were located, and the candidate region that covered the major QTL spanned 1.39 Mb, which harbored a predicted gibberellin 2-β-oxidase gene. Considering the rich phenotypic polymorphism, the important economic value in the Cucurbita genus species and several advantages of the SSR marker were identified; thus, this high-density EST-SSR-based genetic map will be useful in Pumpkin and Squash breeding work in the future.


Introduction
The genus Cucurbita (2n = 2x = 40) is perhaps the most polymorphic genus in fruit characters in the botanical family Cucurbitaceae [1]. At least five species of Cucurbita have been domesticated by now and three of them, C. pepo L., C. moschata Duchesne, and C. maxima Duchesne, are cultivated world wild [2,3]. Among the many species in the Cucurbita genus, C. pepo has the most important economic status and a wide range of uses. Besides the fact that the mature and immature fruit, seed, seed oil, and seed extract of C. pepo have a great market, its flowers can also be used as food in some places [3][4][5].
Dwarf plant architecture is important for improving agricultural efficiency and reducing the agricultural production costs of modern farming [6][7][8]. C. pepo and C. maxima are two of the few

A High-Density SSR-Based Genetic Map
In order to augment the number of SSR markers in C. pepo, we picked up 1613 novel EST-SSR loci from transcriptome data (Data S1). In combination with 632 previously reported gSSRs [14,26], we then tested the polymorphism of 2245 SSRs between the parents of the F 2 population used in this study, revealing a total of 1559 SSRs in both parents that yielded PCR products. Of these, 646 exhibited polymorphism between parents, including 133 previously reported gSSRs and 513 EST-SSRs (Data S1).
Following the removal of the ambiguous markers, 623 polymorphic SSR markers remained, including 127 gSSRs and 496 EST-SSRs (Table S1). These were then plotted onto 20 LGs to produce a genetic map that encompasses 2199.1 cM and is estimated to cover 93.3% of Ge (expected genome length) (2356.3 cM). Lengths of the 20 LGs range between 67.35 cM in LG20 and 190.97 cM in LG1, while marker numbers range from 16 in LG13 and LG15 to 74 in LG1 (Table 1, Figure 1 and Data S2). LGs represented linkage groups, from LG1 to LG20.

Collinearity Analysis
To integrate two gSSR maps constructed by Gong et al. [14,24], we counted the common gSSR markers located in our map and two previous maps. In total, 74 common markers detected with previous C. pepo map, distributed in 20 previous linkage groups (20LGp in total); meanwhile 62 common markers detected with a previous C. moschata map were distributed in 23 previous linkage groups (27 LGm in total) (Data S2). Overall, the map constructed in this research showed good collinearity with the two previous maps; most of the linkage groups showed a consistent marker order, such as LGp16 vs.
LG4 and LGm1a vs. LG2. A deranged marker order was observed between a few linkage groups, such as LGp3a vs.
LG9 and LGm3 vs. LG4 ( Figure S2). Some separated linkage groups, such as LGp5 and LGp2, LGm21a and LGm21b, were integrated into one linkage group according to the common markers within our genetic map. Some unique linkage groups, such as LGp10a, LGp15, LGp3a and LGp9 with common gSSRs were distributed in different linkage groups during this research ( Figure S2).
To validate the accuracy of this map, and collate the respondence relationships of the chromosomes between published genomes, the unigene sequences of EST-SSRs located in this map were aligned to three sets of published Cucurbita genomes using tBLASTx [13,29]. In total, 496 EST-SSR were located in our map: 469 (94.56%), 475 (95.77%), and 468 (94.35%) hit the C. pepo, C. moschata and C. maxima genomes, respectively (Table S2). The linkage map of C. pepo constructed in this research. Blue, red, and yellow bars indicated the EST-SSR marker, gSSR marker developed from Cucurbita and gSSR marker developed from the cucumber genome, respectively. The scaleplate on the left indicated genetic distance (centimorgan as unit, abbreviated cM).
LGs represented linkage groups, from LG1 to LG20.

Collinearity Analysis
To integrate two gSSR maps constructed by Gong et al. [14,24], we counted the common gSSR markers located in our map and two previous maps. In total, 74 common markers detected with previous C. pepo map, distributed in 20 previous linkage groups (20LGp in total); meanwhile 62 common markers detected with a previous C. moschata map were distributed in 23 previous linkage groups (27 LGm in total) (Data S2). Overall, the map constructed in this research showed good collinearity with the two previous maps; most of the linkage groups showed a consistent marker order, such as LGp16 vs.
LG4 and LGm1a vs. LG2. A deranged marker order was observed between a few linkage groups, such as LGp3a vs.
LG9 and LGm3 vs. LG4 ( Figure S2). Some separated linkage groups, such as LGp5 and LGp2, LGm21a and LGm21b, were integrated into one linkage group according to the common markers within our genetic map. Some unique linkage groups, such as LGp10a, LGp15, LGp3a and LGp9 with common gSSRs were distributed in different linkage groups during this research ( Figure S2).
To validate the accuracy of this map, and collate the respondence relationships of the chromosomes between published genomes, the unigene sequences of EST-SSRs located in this map were aligned to three sets of published Cucurbita genomes using tBLASTx [13,29]. In total, 496 EST-SSR were located in our map: 469 (94.56%), 475 (95.77%), and 468 (94.35%) hit the C. pepo, C. moschata and C. maxima genomes, respectively (Table S2).
In general, the loci located in our map consisted of their corresponding location in the C. pepo physic map, which suggests the high quality of our genetic map ( Figure 2A). This map also showed high collinearity with the physic map of C. moschata and C. maxima ( Figure S3), especially the C. moschata map, which suggested that a high karyotype stability also existed between C. pepo and C. moschata, not only between C. moschata and C. maxima [29]. All linkage groups showed good coverage, with three physic maps except LG8, LG17, and LG20 with less than 70% coverage (Table S2). Using our map as a bridge, the correspondence relationship of the chromosomes between C. pepo and C. moschata/C. maxima were also collated (Table S2).
Unexpectedly, one large deviation that was observed in the alignment between LG4 and Cp4.1LG04 of the C. pepo genome but not in alignment between LG4 and Chr11 of C. moschata genome ( Figure 2B). Furthermore, LG4 showed very good collinearity with LGp12 (developed by Gong et al. 2008a) ( Figure 2B), which suggested a specific chromosome inversion existed in Cp4.1LG04. To obtain more detailed information of this inversion, all CDS (1629 in total) predicted in Cp4.1LG04 of the C. pepo genome were aligned to the C. moschata genome by tBLASTx. Totally, 1617 hits to Chr11 of C. moschata, in this, CpLG04 showed very good collinearity to Chr11 of C. moschata from 2.0 Mb to 2.6 Mb and from 8.  LGp12, LG4 and Cp4.1LG04 (C. pepo genome), LG4 and CmoChr11 (C. moschata genome). The common gSSRs between LG4 and LGp12 were represented by grey broken lines. The in order hits between LG4 and Cp4.1LG04, CmoChr11 were represented by grey line and disorder hits were represented by the dark green line. EST-SSRs are represented by the blue bars and gSSRs by red bars in LG4; (C) collinearity between Cp4.1LG04 and CmoChr11. In order and disorder hits were represented by grey and dark green lines respectively. The disorder regions were marked according to the alignment result.

QTLs for Dwarf Trait
Plant height depends on the total number of nodes and the length of each internode in a stem. In the Cucurbita genus, early research showed changing dominant-recessive relationships between the young and mature stages [11,31,32]. To get detailed QTL information of a dwarf trait, we investigated the nodes number and internode length of the two parental lines and their hybrid (P1/P2/F1) at the young and mature developmental stages, respectively, in three independent LGp12, LG4 and Cp4.1LG04 (C. pepo genome), LG4 and CmoChr11 (C. moschata genome). The common gSSRs between LG4 and LGp12 were represented by grey broken lines. The in order hits between LG4 and Cp4.1LG04, CmoChr11 were represented by grey line and disorder hits were represented by the dark green line. EST-SSRs are represented by the blue bars and gSSRs by red bars in LG4; (C) collinearity between Cp4.1LG04 and CmoChr11. In order and disorder hits were represented by grey and dark green lines respectively. The disorder regions were marked according to the alignment result.

QTLs for Dwarf Trait
Plant height depends on the total number of nodes and the length of each internode in a stem. In the Cucurbita genus, early research showed changing dominant-recessive relationships between the young and mature stages [11,31,32]. To get detailed QTL information of a dwarf trait, we investigated the nodes number and internode length of the two parental lines and their hybrid (P 1 /P 2 /F 1 ) at the young and mature developmental stages, respectively, in three independent environments. No significant difference was found in the number of nodes between the two parental lines at the young and mature stages ( Figure S4A). Conversely, a significant difference was found in internode length between the two parental lines at both the young and mature stages, suggesting that the dwarf trait was controlled by the internode length rather than the internode number in C. pepo ( Figure S4B).
In C. maxima the internode length of the hybrid was similar to the dwarf parent at the young stage and with intermediate value at the mature stage [11]. This developmental reversal was also found in our research ( Figure S4B). The average value of internode length in F 2 population skewed toward the dwarf parent at a young stage and toward the hybrid at the mature stage. The frequency distribution among the test lines suggested that the dwarf was dominant to vein at the young stage because no significant segregation was observed in BC 1 P 2 (Tables S3 and S4).
Using joint analysis (LOD ≥ 4.0), two and three QTLs associated with dwarf traits at the young and mature stages were detected respectively ( Figure 3). In this, the two most significant QTLs (qCpDy1 and qCpDm1) at different developmental stages shared similar candidate regions and LOD values, which spanned a genetic distance with 3.37 cM (from PU051973 to end of LG20) corresponded to a physical distance of 1.39 Mb. This region was also similar to the major dwarf QTL region qCmB2 in C. maxima ( Figure S4C and Table S5).
The early research in C. maxima found that dwarf line SQ026 was gibberellin-sensitive, in which the internode length could be rescued to the vein parental internode length after being treated with exogenous gibberellin (GA). To confirm if the similar mechanism existed in C. pepo, the dwarf parent HM-S2 in this research was treated with gradient exogenous GA and the vein parent Jinganlu was treated with gradient Paclobutrazol (PAC, GA synthesis inhibitor). The results of the exogenous hormone treatment showed the dwarf type in C. pepo was also a GA-sensitive dwarf type ( Figure S4D).
Although several experimental observations in C. pepo are similar with the previous results in C. maxima (F 1 hybrid developmental reversal, major QTL position, and gibberellin sensitivity), qCmB2, which designated as a GA20ox controlling dwarf trait in previous research, was denied by HMMER prediction. Only one GA2ox and one GA20ox detected in Cp4.1LG12 of C. pepo genome and in Chr03 of C. maxima genome, respectively, and only one GA2ox gene related to the gibberellin synthesis in the candidate region ( Figure S4C). environments. No significant difference was found in the number of nodes between the two parental lines at the young and mature stages ( Figure S4A). Conversely, a significant difference was found in internode length between the two parental lines at both the young and mature stages, suggesting that the dwarf trait was controlled by the internode length rather than the internode number in C. pepo ( Figure S4B). In C. maxima the internode length of the hybrid was similar to the dwarf parent at the young stage and with intermediate value at the mature stage [11]. This developmental reversal was also found in our research ( Figure S4B). The average value of internode length in F2 population skewed toward the dwarf parent at a young stage and toward the hybrid at the mature stage. The frequency distribution among the test lines suggested that the dwarf was dominant to vein at the young stage because no significant segregation was observed in BC1P2 (Tables S3 and S4).
Using joint analysis (LOD ≥ 4.0), two and three QTLs associated with dwarf traits at the young and mature stages were detected respectively ( Figure 3). In this, the two most significant QTLs (qCpDy1 and qCpDm1) at different developmental stages shared similar candidate regions and LOD values, which spanned a genetic distance with 3.37 cM (from PU051973 to end of LG20) corresponded to a physical distance of 1.39 Mb. This region was also similar to the major dwarf QTL region qCmB2 in C. maxima ( Figure S4C and Table S5).
The early research in C. maxima found that dwarf line SQ026 was gibberellin-sensitive, in which the internode length could be rescued to the vein parental internode length after being treated with exogenous gibberellin (GA). To confirm if the similar mechanism existed in C. pepo, the dwarf parent HM-S2 in this research was treated with gradient exogenous GA and the vein parent Jinganlu was treated with gradient Paclobutrazol (PAC, GA synthesis inhibitor). The results of the exogenous hormone treatment showed the dwarf type in C. pepo was also a GA-sensitive dwarf type ( Figure S4D).
Although several experimental observations in C. pepo are similar with the previous results in C. maxima (F1 hybrid developmental reversal, major QTL position, and gibberellin sensitivity), qCmB2, which designated as a GA20ox controlling dwarf trait in previous research, was denied by HMMER prediction. Only one GA2ox and one GA20ox detected in Cp4.1LG12 of C. pepo genome and in Chr03 of C. maxima genome, respectively, and only one GA2ox gene related to the gibberellin synthesis in the candidate region ( Figure S4C).

Ancient Tetraploid Origin
Sequence order conservation (synteny) provides the basis for the transfer of genomic information between species. Thus, to determine if C. pepo experienced an additional whole genome duplication (WGD) event after the divergence from the Bennicase tribe, we aligned an unigene sequence from the 496 EST-SSR markers in our genetic map against the watermelon, melon, and cucumber genomes using tBLASTx. In total, 374, 315, and 340 hits were identified according to the alignment result with the watermelon, melon, and cucumber genomes respectively. In general, the alignment results showed a pairwise relationship existing around 20 linkage groups, for example, a large number of loci in LG7 and LG9 with hits in Chr8 of the watermelon genome, and a large number of loci in LG2 and LG6 with hits in Chr5 and Chr10 ( Figure 4A). These pairwise relationships were also confirmed by the alignment results with the melon and cucumber genomes ( Figure S5A and S5B). After removing the ambiguous hits, finally, 282 hits (75.4% in 374) with watermelon genome (Figure 4B), 264 hits (83.8% in 315) with the melon genome ( Figure S5C), and 284 hits (83.5% in 340) with the cucumber genome ( Figure S5D) showed distinguishable evidence for the tetraploid origin in C. pepo.
Generally, 20 C. pepo LGs can be classified into three categories based on their syntenic relationships with watermelon, melon, and cucumber. Among these, category A includes 12 LGs (LG20 and LG18, LG5 and LG14, LG16 and LG12, LG17 and LG3, LG2 and LG6, and LG7 and LG9) that exhibit approximate two-to-one relationships with chromosomes in the three sequenced genomes. Similarly, category B comprises three LGs (LG4, LG8, and LG15), of these, the front part region of LG4 showed pairwise relationship with LG8 and the back part region of LG4 showed a pairwise relationship with LG15. Category C includes the remaining five LGs (LG1, LG10, LG11, LG13, and LG19), the front part region of LG1 showed a pairwise relationship with LG10, the middle part region of LG1 showed a pairwise relationship with the front part region of LG11, in which the end part region showed a pairwise relationship with LG19, the end part region of LG1 showed a pairwise relationship with LG13 ( Figure 4B and Figure S5C,D).
Because of a lack of enough EST-SSR loci in our genetic map, some small segments with a pairwise relationship reported in a previous genome sequencing report could not be detected in this research, however, the relationships conducted by our data roughly coincided with our genome sequencing results. For example, Cma08 and Cma17, Cma03 and Cma07, which showed a typical pairwise relationship, were also proven in our research (LG16 and LG12, LG20 and LG18, respectively). These results suggested that the high-density EST-SSR-based map could be used in an ancient polyploid event, roughly discovering and decrypting the chromosomes evolutionary relationship.

Ancient Tetraploid Origin
Sequence order conservation (synteny) provides the basis for the transfer of genomic information between species. Thus, to determine if C. pepo experienced an additional whole genome duplication (WGD) event after the divergence from the Bennicase tribe, we aligned an unigene sequence from the 496 EST-SSR markers in our genetic map against the watermelon, melon, and cucumber genomes using tBLASTx. In total, 374, 315, and 340 hits were identified according to the alignment result with the watermelon, melon, and cucumber genomes respectively. In general, the alignment results showed a pairwise relationship existing around 20 linkage groups, for example, a large number of loci in LG7 and LG9 with hits in Chr8 of the watermelon genome, and a large number of loci in LG2 and LG6 with hits in Chr5 and Chr10 ( Figure 4A). These pairwise relationships were also confirmed by the alignment results with the melon and cucumber genomes ( Figure S5A and S5B). After removing the ambiguous hits, finally, 282 hits (75.4% in 374) with watermelon genome (Figure 4B), 264 hits (83.8% in 315) with the melon genome ( Figure S5C), and 284 hits (83.5% in 340) with the cucumber genome ( Figure S5D) showed distinguishable evidence for the tetraploid origin in C. pepo.
Generally, 20 C. pepo LGs can be classified into three categories based on their syntenic relationships with watermelon, melon, and cucumber. Among these, category A includes 12 LGs (LG20 and LG18, LG5 and LG14, LG16 and LG12, LG17 and LG3, LG2 and LG6, and LG7 and LG9) that exhibit approximate two-to-one relationships with chromosomes in the three sequenced genomes. Similarly, category B comprises three LGs (LG4, LG8, and LG15), of these, the front part region of LG4 showed pairwise relationship with LG8 and the back part region of LG4 showed a pairwise relationship with LG15. Category C includes the remaining five LGs (LG1, LG10, LG11, LG13, and LG19), the front part region of LG1 showed a pairwise relationship with LG10, the middle part region of LG1 showed a pairwise relationship with the front part region of LG11, in which the end part region showed a pairwise relationship with LG19, the end part region of LG1 showed a pairwise relationship with LG13 ( Figures 4B and S5C,D).
Because of a lack of enough EST-SSR loci in our genetic map, some small segments with a pairwise relationship reported in a previous genome sequencing report could not be detected in this research, however, the relationships conducted by our data roughly coincided with our genome sequencing results. For example, Cma08 and Cma17, Cma03 and Cma07, which showed a typical pairwise relationship, were also proven in our research (LG16 and LG12, LG20 and LG18, respectively). These results suggested that the high-density EST-SSR-based map could be used in an ancient polyploid event, roughly discovering and decrypting the chromosomes evolutionary relationship.  LGs (LG20 and LG18, LG5 and LG14, LG16 and LG12, LG17 and LG3, LG2 and LG6, and LG7 and LG9) that show roughly two-to-one relationships with chromosomes of watermelon, melon, and cucumber. Category B includes three LGs (LG4, LG8, and LG15). The front region of LG4 corresponds with LG8 while the backend region of LG4 corresponds to LG15. No syntenic region is shared between LG8 and LG15 in the three genomes. Category C includes five LGs (LG1, LG10, LG11, LG13, and LG19). The backend region of LG11 corresponds to LG19 while the front region of LG11 corresponds to the middle region of LG1. The front region of LG1 corresponds to LG10 and the backend region of LG1 corresponds to LG13.

EST-SSR with Higher Polymorphism than gSSR
Compared with SNP, SSR had higher polymorphic potential and lower cost (not relying on sequencing data) [7,22,23]. In Cucurbita genus, four SNP-based genetic maps have been published in recent years, but only two SSR-based genetic maps published ten years ago and with the disadvantage of a lower marker density. In this research, using 634 gSSRs and 1613 novel EST-SSRs, we constructed the first high-density EST-SSR-based genetic map in Cucurbita genus, the total mapped loci number in this map was lower than one of the SNP maps in C. pepo (GBS, RILs population) and one SNP map in C. moschata (dd-RAD, F2 population) but higher than the two SNP maps in C. pepo (EST-SNP, F2 population) and C. maxima (GBS, F2 population) [11][12][13]30]. According to common gSSR information and alignment results with published genome data, the other gSSRs located in the previous two gSSR maps but not in this map could also be integrated with the corresponding chromosome. Considering the rich phenotypic polymorphism in Cucurbita genus, these gSSR and EST-SSR markers will be useful in constructing molecular fingerprintings and primary mapping of the agronomic traits.
In general, as EST-SSRs are derived from transcribed genomic regions, they could be transferred across various species but with a lower polymorphic potential, compared with gSSR [15,33]. However, in this research, EST-SSR showed higher polymorphism than gSSR (31.8% versus 24.5%) (Table S1), and these differences are likely because of the selection strategy in the transcriptome data. We preferred to employ EST-SSR with higher A/T content motifs while at the same time avoiding those with higher G/C content motifs because these potentially exhibit lower levels of polymorphism [34,35].

Collinearity Analysis Reveals an Inversion Existed Between Our Map and Ref Genome
In Cucurbita genus, three sets of genome data were published in the last year [28,29], however, the corresponding relationship between chromosomes in C. pepo and in C. maxima/C. moschata is still LG16 and LG12, LG17 and LG3, LG2 and LG6, and LG7 and LG9) that show roughly two-to-one relationships with chromosomes of watermelon, melon, and cucumber. Category B includes three LGs (LG4, LG8, and LG15). The front region of LG4 corresponds with LG8 while the backend region of LG4 corresponds to LG15. No syntenic region is shared between LG8 and LG15 in the three genomes. Category C includes five LGs (LG1, LG10, LG11, LG13, and LG19). The backend region of LG11 corresponds to LG19 while the front region of LG11 corresponds to the middle region of LG1. The front region of LG1 corresponds to LG10 and the backend region of LG1 corresponds to LG13.

EST-SSR with Higher Polymorphism than gSSR
Compared with SNP, SSR had higher polymorphic potential and lower cost (not relying on sequencing data) [7,22,23]. In Cucurbita genus, four SNP-based genetic maps have been published in recent years, but only two SSR-based genetic maps published ten years ago and with the disadvantage of a lower marker density. In this research, using 634 gSSRs and 1613 novel EST-SSRs, we constructed the first high-density EST-SSR-based genetic map in Cucurbita genus, the total mapped loci number in this map was lower than one of the SNP maps in C. pepo (GBS, RILs population) and one SNP map in C. moschata (dd-RAD, F 2 population) but higher than the two SNP maps in C. pepo (EST-SNP, F 2 population) and C. maxima (GBS, F 2 population) [11][12][13]30]. According to common gSSR information and alignment results with published genome data, the other gSSRs located in the previous two gSSR maps but not in this map could also be integrated with the corresponding chromosome. Considering the rich phenotypic polymorphism in Cucurbita genus, these gSSR and EST-SSR markers will be useful in constructing molecular fingerprintings and primary mapping of the agronomic traits.
In general, as EST-SSRs are derived from transcribed genomic regions, they could be transferred across various species but with a lower polymorphic potential, compared with gSSR [15,33]. However, in this research, EST-SSR showed higher polymorphism than gSSR (31.8% versus 24.5%) (Table S1), and these differences are likely because of the selection strategy in the transcriptome data. We preferred to employ EST-SSR with higher A/T content motifs while at the same time avoiding those with higher G/C content motifs because these potentially exhibit lower levels of polymorphism [34,35].

Collinearity Analysis Reveals an Inversion Existed Between Our Map and Ref Genome
In Cucurbita genus, three sets of genome data were published in the last year [28,29], however, the corresponding relationship between chromosomes in C. pepo and in C. maxima/C. moschata is still unknown. According to the alignment result between the genetic map and the three genomes of Cucurbita, we collated the correspondence relationships of each of the chromosomes. Overall, the genetic map showed good collinearity with three published genomes, which consisted of the previous conclusion about karyotype stability in Cucurbita [29]. The collinearity with C. moschata is better than with C. maxima, this could be explained by C. pepo with a closer relationship with C. moschata, which was proven in previous research [3,36].
Theoretically, this map should show the best collinearity with the C. pepo genome but not with C. moschata, this unexpected result was due to one large segmental inversion detected in LG4 vs. CpLG04. However, we are not sure whether the chromosomal inversion in LG4 detected in this research actually exists on the true C. pepo genome.
On the one hand, although the mapping parental lines (both C. pepo subsp. pepo, zucchini and pumpkin cultival-type) used in this research are different than the parental lines (C. pepo subsp. pepo vs. C. pepo subsp. ovifera, zucchini and scallop cultival-type) used in the genome sequencing program, inter-subspecific crosses (C. pepo. subsp. pepo vs. C. pepo. subsp. ovifera, pumpkin and crockneck cultival-type) were also used in Gong's research, which give a good collinearity between LG4 and LGp16 ( Figure 2B). On the other hand, chromosome inversions that occur on the chromosomes often lead to infertility in offspring [37][38][39]; however, no distinguishable productive isolation was found in our breeding work around zucchini, scallop, crockneck, and pumpkin cultival-types lines and their hybrids, which makes us even more skeptical about whether this chromosome inversion really exists.
Given the important economic value of some species in C. pepo and the fundamental role of the genome sequence data in the mining gene related to the agronomic trait, we suggested that some of the following experiments still may be needed, such as fluorescent in situ hybridization (FISH) and high density genetic maps using other parental lines, to get a definitive conclusion.

Dwarf Trait in C. maxima and C. pepo Perhaps Controlled by Same Gene
In this research, the candidate region of the dwarf trait was mapped to the C. pepo genome for the first time, and the linkage group which harbored the dwarf gene (LG20) was the same as Gong et al.'s work [14], in which the Bush gene (B) mapped to LGp12; however, the major dwarf gene in our map and the Bush gene in Gong et al.'s map [14] were not in a common candidate region ( Figure S4C). Considering only two SSRs, located in LGp12 (CMTm61 and CMTp36) in Gong et al.'s map [14], we suggested the gSSR and EST-SSR markers located in LG20 could be used in Gong et al.'s population [14]. After getting a more accurate candidate region and then comparing the results from this experiment, we confirmed whether the dwarf gene shared a common region around different parental lines in C. pepo.
The characters of the dwarf trait observed in this research showed a large similarity with the characters reported in Zhang et al.'s work [11] (developmental reverse in hybrid, major QTL candidate region, and hormone response), however, the LOD value of major dwarf QTL in both qCpDy1 and qCpDm1 were much higher than qCmB2, and the other two dwarf QTLs (qCmB1 and qCmB3) reported in C. maxima were not detected in C pepo. These differences suggested that the molecular mechanism control dwarf trait in C. pepo were not exactly the same as in C. maxima.
For the phenomenon of developmental reversal, Zhang et al. [11] proposed that there were other genes involved in a plant grown to the mature stage in C. maxima. However, in C. pepo, the major dwarf QTL at the young and mature developmental stages shared a common region and both had a large LOD value ( Figure 3 and Table S5), which suggested the dwarf and reversal phenomenon were perhaps controlled by the same gene, and the developmental reversal was due to different expression value of this gene at different developmental stages.
The major dwarf QTL qCmB2 in C. maxima was designated as a GA20ox in Zhang et al.'s work [11], and the dwarf phenomenon was explained by a lower expression level in the dwarf parent. However, this in not supported by HMMER prediction in which only a GA2ox gene exists in the candidate region ( Figure S4C). Furthermore, a lower GA20ox expression level seems hard to explain why the internode length of a hybrid to a dwarf parent is especially close at a young stage. Based on the dosage effect, the hybrid should be with intermediate or similar internode length to vein parental, because half of the GA20ox gene was still expressed normally in the hybrid. Research related to the GA20ox mutant has shown the vein was dominant or semi-dominant to the dwarf [40][41][42], but research in C. maxima and C. pepo suggested that the dwarf was dominant to vein [11,32,43]. These results suggested the the dwarf molecular mechanism in Cucurbita could not be explained simply by a depressed expression level in GA20ox.
Although one GA2ox was detected in the candidate region, due to lacking enough molecular markers in candidate region, we are not sure if this GA2ox gene is the gene that controls the dwarf phenotype, because there are also a large number of reports about transcription factors that control dwarf traits by regulating the expression of a GA synthesis gene [44][45][46]. To get a more detailed explanation about the dwarf molecular mechanism in C. pepo, we will develop more markers from the resequencing data of the two parental lines used in this research and employ a larger F 2 population in the future.

EST-SSR-Based Genetic Map Could Be Useful in Revealing Ancient WGD Event
Based on the alignment results in this research, it is difficult to determine whether the ancient polyploid event happened in C. pepo belongs to an Auto-ploid type or an Allo-ploid type for lacking large amounts of sequencing data information [47]. However, we discovered strong evidence of the tetraploid origin in C. pepo and collated pairwise linkage groups from the alignment results with genomes of adjacent species, which is consistent with the conclusion analyzed from the genome De novo sequencing data [28,29]. Considering that whole genome De novo sequencing is still expensive for most species without a reference genome, and the ancient polyploid event happened frequently [48,49], we suggested that it is still an efficient and lower cost method to research chromosomal evolution through developing EST-SSR or EST-SNP from the transcriptome data and constructing a high density genetic map to align with known genomes [16,50,51].

Plant Materials and Phenotyping
Two inbred lines differing in several phenotypes were used as the mapping parents for this study ( Figure S1). Of these, the P 1 (Jinganlu) belongs to the C. pepo subsp. pepo var. Pumpkin, while the P 2 (HM-S2) belongs to C. pepo subsp. pepo var. Zucchini. Phenotypic evaluations were carried out three times for the replicates and SPSS 17.0 was used to perform the phenotypic data analysis. One six-generation population (P 1 /P 2 /F 1 /BC 1 /BC 2 /F 2 ) was grown at the CAAS experimental station (N39.56 • , E118.18 • ) in the autumn of 2012 (BJA). Plant height, internode number and internode length of two parental lines, hybrid, F 2 population and BC population were investigated at young (25 days after sowing) and mature (60 days after sowing) developmental stages, respectively. To validate the developmental reversal phenomenon in a hybrid and dominant-recessive relationship in the segregating population, two parental lines and their hybrid (P 1 /P 2 /F 1 ) were grown at the langfang experimental station (N39.36 • , E116.36 • ) in the autumn of 2014 (LFA) and one four-generation population (P 1 /P 2 /F 1 /F 2 ) was grown at the langfang experimental station (N39.36 • , E116.36 • ) in the summer of 2015 (LFS), investigating the plant height, internode number and internode length at the young and mature developmental stages, respectively.

SSR Search
A total of 2245 SSR markers were utilized in this study, including 132 gSSRs from the cucumber genome, isolated with PCR products for C. moschata [26], and 500 gSSR (specifically, 307 from C. moschata and 193 from C. pepo) as developed by Gong et al. [14] using single-stranded hybridization.
A further 1613 EST-SSRs were also developed for this study using transcriptome data for C. pepo (http: //www.icugi.org). All the primers were synthesized by Sangon Biotech (Sangon, Shanghai, China).

DNA Extraction and PCR Amplification
Genomic DNA from young leaves was extracted using the CTAB method with minor modifications. DNA quality was assessed using 1.2% agarose gel, and quantity was assessed using NanoDrop 2000 Spectrophotometer (Thermo Scientific, Waltham, MA, USA). A polymerase chain reaction (PCR) was carried out on Bio-Rad T100 thermal cycler (Bio-Rad, Hercules, CA, USA). PCR reactions were prepared in 12.5 µL volumes containing 10 × Taq polymerase buffer (500 mM KCl, 100 mM Tris-HCI pH 8.5, and 1 mg/mL gelatin), 1.0 mM MgCl 2 , 0.5 mM dNTPs, 5 pM of each primer, 0.3 U Taq polymerase, and 25 ng of template DNA, respectively. The final volume was adjusted to 12.5 µL with sterile distilled water, and PCR amplifications were set at 95 • C for 10 min, followed by 35 cycles at 94 • C for 1 min and 20 s at a specific annealing temperature for a specific primer pair and then at 74 • C for 30 s, before final extension at 74 • C for 10 min. All PCR products were separated on a 6% non-denaturing PAGE electrophoresis system and were visualized using silver staining [52].

Genotyping and Map Construction
Both parents and their F 1 offspring were used to select polymorphic markers; markers that had product lengths ranging between 80 bp and 300 bp and that also showed polymorphism between parents were used to construct the linkage map. A total of 187 individuals from the F 2 population (BJA) were used to generate the linkage map. Segregation distortion at each marker locus was tested against the expected ratio for F 2 (1:2:1) using a Chi-square test. Those markers that showed highly significant segregation distortion (p < 0.01) were excluded from the map construction. Linkage analysis was then performed using the regression function in the software JoinMap v4.1. Thus, linkage groups (LGs) were determined using a LOD threshold of 4.0 with the Kosambi mapping function used for linkage analysis. The parameters for this analysis were rec = 0.4, LOD = 1.0, Jump = 5.0, and markers that had linkage lengths that were too long were excluded (>50 cM). The observed genome length (Go) was calculated by summing the observed map length of all the linkages. The expected length of each LG was then estimated using method of Chakravarti et al. [53] and expected genome length (Ge) was then estimated by summing the lengths of the estimated LGs. The observed genome map coverage (GCo) was defined as the ratio between the total length of the map Go and Ge [54].

Collinearity Analysis
Unigene sequences for all EST-SSR loci in the genetic map were aligned against the C. pepo, C. maxima, C. moschata, watermelon, melon, and cucumber genomes using tBLASTx (E-value < e −10 ) [55]. For a visual representation of the analysis results, the R package Omiccircos was used to draw the circos-plot [56].

Detection of Dwarf QTLs and Hormone Application
The QTLs of dwarf trait were detected by IciMapping V4.1 (available online: https:// www.integratedbreeding.net/386/breeding-services/more-software-tools/icimapping) based on the inclusive composite interval mapping (ICIM) model [57]. The ICIM-ADD method was used to detect additive and dominant QTLs and estimate relative genetic parameters. The threshold value was determined by 1000 permutations and the type I error set as p < 0.05, the PIN value = 0.001, step value = 0.5 cM.
Gibberellin (GA) and Paclobutrazol (PAC) were resolved in ethanol. For the treatments, the chemical reagents were diluted to a desired concentration using water and spray to 25-day-old and 60-day-old plants. The blank solvent (add appropriate ratio ethanol) were used as mock treatment, all treatments were performed twice at intervals of 4 days.

Candidate Gene Analysis
HMMER and BLAST were used to screen the putative gibberellin synthesis genes [58]. The HHM profile of the GA20ox, GA3ox and GA2ox family gene domain was downloaded from the Pfam database (PF03171, https://pfam.xfam.org/family/PF03171) and the peptide sequence data of C. pepo and C. maxima genomes was downloaded from the ICUGI (http://cucurbitgenomics.org/). The local BLAST program V2.7.1 was used to remove the repetitive sequence of putative GA20ox, GA3ox and GA2ox family gene, and TBtool V0.664 (available online: http://cj-chen.github.io/tbtools/) was used to locate the respective genes on the reference genome.