The USDA cucumber (Cucumis sativus L.) collection: genetic diversity, population structure, genome-wide association studies, and core collection development

Germplasm collections are a crucial resource to conserve natural genetic diversity and provide a source of novel traits essential for sustained crop improvement. Optimal collection, preservation and utilization of these materials depends upon knowledge of the genetic variation present within the collection. Here we use the high-throughput genotyping-by-sequencing (GBS) technology to characterize the United States National Plant Germplasm System (NPGS) collection of cucumber (Cucumis sativus L.). The GBS data, derived from 1234 cucumber accessions, provided more than 23 K high-quality single-nucleotide polymorphisms (SNPs) that are well distributed at high density in the genome (~1 SNP/10.6 kb). The SNP markers were used to characterize genetic diversity, population structure, phylogenetic relationships, linkage disequilibrium, and population differentiation of the NPGS cucumber collection. These results, providing detailed genetic analysis of the U.S. cucumber collection, complement NPGS descriptive information regarding geographic origin and phenotypic characterization. We also identified genome regions significantly associated with 13 horticulturally important traits through genome-wide association studies (GWAS). Finally, we developed a molecularly informed, publicly accessible core collection of 395 accessions that represents at least 96% of the genetic variation present in the NPGS. Collectively, the information obtained from the GBS data enabled deep insight into the diversity present and genetic relationships among accessions within the collection, and will provide a valuable resource for genetic analyses, gene discovery, crop improvement, and germplasm preservation.


Supplementary note Phenotypic data collection
Over the last 30 years, cucumber (C. sativus L.) data for yield, fruit quality, and resistance to biotic and abiotic stresses have been collected at the Cucurbit Breeding program of North Carolina State University. For GWAS in this study, we used historical data of 13 agronomic traits of cucumber plant introduction (PI) collected over multiple years and locations (http://cucurbitbreeding.com).
Data for fruit shelf life (fruit weight loss, firmness loss and shriveling). Fruit weights were measured before and 2 weeks after storage; and measure in percent loss. Shivering was also rated 2 weeks after storage using a visual scale of 0-9 with 0= none (no appearance of shriveling of fruit skin), 1-3=slight, 4-6=moderate, and 7-9= severe (skin very shriveled). Firmness was tested as amount of force (Newton) required to penetrate the exocarp and mesocarp of fruit. Firmness was tested before and 2 weeks after storage; and measure in percent loss 6 .
Data for yield was collected as plot means and consisted of number and weight of total fruits per plot. Yield was expressed as thousands of fruits ha -1 for fruit number, and Mg ha -1 for fruit weight to make comparisons with other studies easier 7 .
Flowering data was taken for 21 days after the appearance of the first staminate flower on the earliest plant. Number of days from planting to first staminate flower was recorded for each plant. Pollinations were made on the earliest and latest cultigens to observe fruits, skin and spine type 8 .
Root size data was collected 3 months after planting. Roots were washed free of soil using tap water and all plant roots were measured. Root lengths were obtained by stretching roots to the 2 same width (150 mm) and height (5 mm), and then measuring the length of the root system in mm 9 .
All data was summarized as the average of several ratings or data points.