Physical mapping of QTL for tuber yield, starch content and starch yield in tetraploid potato (Solanum tuberosum L.) by means of genome wide genotyping by sequencing and the 8.3 K SolCAP SNP array

Tuber yield and starch content of the cultivated potato are complex traits of decisive importance for breeding improved varieties. Natural variation of tuber yield and starch content depends on the environment and on multiple, mostly unknown genetic factors. Dissection and molecular identification of the genes and their natural allelic variants controlling these complex traits will lead to the development of diagnostic DNA-based markers, by which precision and efficiency of selection can be increased (precision breeding). Three case-control populations were assembled from tetraploid potato cultivars based on maximizing the differences between high and low tuber yield (TY), starch content (TSC) and starch yield (TSY, arithmetic product of TY and TSC). The case-control populations were genotyped by restriction-site associated DNA sequencing (RADseq) and the 8.3 k SolCAP SNP genotyping array. The allele frequencies of single nucleotide polymorphisms (SNPs) were compared between cases and controls. RADseq identified, depending on data filtering criteria, between 6664 and 450 genes with one or more differential SNPs for one, two or all three traits. Differential SNPs in 275 genes were detected using the SolCAP array. A genome wide association study using the SolCAP array on an independent, unselected population identified SNPs associated with tuber starch content in 117 genes. Physical mapping of the genes containing differential or associated SNPs, and comparisons between the two genome wide genotyping methods and two different populations identified genome segments on all twelve potato chromosomes harboring one or more quantitative trait loci (QTL) for TY, TSC and TSY. Several hundred genes control tuber yield and starch content in potato. They are unequally distributed on all potato chromosomes, forming clusters between 0.5–4 Mbp width. The largest fraction of these genes had unknown function, followed by genes with putative signalling and regulatory functions. The genetic control of tuber yield and starch content is interlinked. Most differential SNPs affecting both traits had antagonistic effects: The allele increasing TY decreased TSC and vice versa. Exceptions were 89 SNP alleles which had synergistic effects on TY, TSC and TSY. These and the corresponding genes are primary targets for developing diagnostic markers.


Background
Crop yield is essential in agriculture and therefore a major selection criterion in any crop breeding program. New cultivars with higher yield per area unit are required for producing enough food for a growing world population under limited availability of arable land. Other than in cereals, a high portion of the total yield of root and tuber crops (root or tuber weight per plant, per plot or area unit) consists of water. In case of the cultivated potato (Solanum tuberosum L.), the dry matter content of the tubers ranges from about 18 to 26% of the total yield. The main component is the glucose polymer starch [1] which accounts for approximately 80% of the dry matter. Next to starch, the sugars sucrose, glucose and fructose make up between 0.7% and 10% of the fresh weight [2]. Starch and sugars are interconverted in dormant tubers in response to environmental signals such as the storage temperature [3]. The tuber starch content (TSC) in percent of the total tuber weight is an important quality measure. It has an optimal range for different end uses of the potato crop such as direct consumption (10-17%), processing (14-20%) and industrial starch production (up to 25%). Tuber starch yield (TSY), that is the arithmetic product of the total yield and the percentage of starch, is a particularly relevant parameter for the production of industrial potato starch for multiple purposes [4]. Tuber yield (TY), starch content (TSC) and starch yield (TSY) are therefore decisive characters for the development of new potato varieties for different end uses.
The natural variation of tuber yield and starch content, and consequently of starch yield, depends on multiple genetic and environmental factors. The development of stably high yielding cultivars with optimal tuber starch content and several other quality criteria requires multiple year and location trials. Tuber starch content is reliably estimated by determining the tuber specific gravity via the under water weight, which is linear correlated with dry matter and starch content [5]. The assessment of tuber yield by weighing is simple but unreliable in the early stages of selection due to an insufficient number of tubers for conducting replicated field trials. Sufficient tuber numbers become available only after several years of vegetative multiplication. Dissection and molecular identification of the genetic factors that underly the natural variation of tuber yield and starch content facilitates the identification of superior allele combinations in parents as well as progeny by diagnostic DNA-based markers. Prudent use of diagnostic markers can increase the selection efficiency by identifying optimal parental combinations and reducing the number of progeny to be assessed in the field (precision breeding).
During the last twenty five years, linkage mapping of quantitative trait loci (QTL) for tuber yield and starch content (or specific gravity or dry matter), using various types of DNA-based markers, has been performed in a number of diploid and tetraploid, in intra-as well as interspecific potato families comprising between 50 and 230 full sib clones [6][7][8][9][10][11]. Irrespective of the marker density, which ranged from very low genome coverage with restriction fragment length polymorphism (RFLP) markers to high genome coverage with amplified fragment length polymorphism (AFLP) or single nucleotide polymorphism (SNP) markers, one to two QTL per chromosome and trait were distinguished. When tuber yield and starch content were evaluated in the same family, QTL for both traits were in several cases detected by the same markers, suggesting that the underlying genes are physically tightly linked, or alternatively, the same genes have pleiotropic effects on both traits, e.g. [6,11]. Between two and eight QTL for tuber yield were mapped per family. The most consistent QTL for tuber yield in different genetic backgrounds were located on potato chromosomes I, II, V, VI and XII. In the case of tuber starch content, from one to sixteen QTL were dissected per family, which were distributed on all twelve potato chromosomes (summarized in [12]). The QTL with the largest effects on both tuber yield and starch content were located on potato chromosome V. The genetic resolution was low in these linkage studies, and the diagnostic power of QTL linked markers in genetic materials other than the experimental family used for QTL mapping was not further assessed in most cases.
Association mapping in populations of individuals related by descent increases the genetic resolution, as it takes advantage of recombination events over multiple meiotic generations in multiple parents [13]. Association mapping in populations of cultivars from advanced breeding programs has the advantage that markers can be obtained, which have diagnostic value in a wide genetic background suitable for variety development. In the cultivated potato, association analysis of tuber traits, among others tuber yield, starch content and starch yield, was performed during the last decade in populations comprising between 150 and 300 tetraploid varieties and breeding clones [12,[14][15][16][17][18][19][20][21]. Most of these association studies were based on the candidate gene approach, meaning that genotyping was targeted at known genes selected in a functional context. Populations were genotyped for DNA variants in genes that function in starch and sugar metabolism [12, 14-18, 20, 22], in enzymatic discoloration upon mechanical impact (tuber bruising) [18] and in genes resulting from comparative protein profiling of tubers with contrasting sugar content [21]. Thirty nine candidate loci showed DNA marker trait associations mostly with tuber starch content, several of those also with starch yield and only few with tuber yield. One association study in two variety panels used as markers more than three thousand untargeted AFLP markers. In this study four and eleven loci were found that were associated with underwater weight (indicative of tuber starch content) [19].
The candidate gene approach was appropriate for association mapping of tuber starch content because the biochemistry and molecular genetics of starch metabolism is well understood in plants including the potato. Many genes encoding enzymes and transporters functional in carbohydrate metabolism have been cloned, sequenced and functionally characterized [23,24]. The candidate gene approach is less suitable though for the identification of diagnostic markers for tuber yield, as numerous, mostly unknown metabolic, regulatory and developmental processes have a potential influence on this highly complex trait. Examples for processes affecting tuber yield are carbon and energy supply [25][26][27] and tuberisation control [28][29][30]. Nevertheless, allelic variants of ten genes functional in carbohydrate metabolism, enzymatic discoloration or tuberisation were found to be associated with tuber yield and in some cases also with starch yield [12,14,17,18].
Recently, new potato genomic resources became available, which allow a more comprehensive approach to the dissection of QTL and possibly the identification of the causal genes. The annotated genome sequence provides physical maps of the twelve potato chromosomes, to which any QTL can be anchored by means of linked or associated sequence based markers [31,32]. This allows comparisons between QTL mapping experiments in different genetic backgrounds and integration of the results. A first 8.3 K SNP chip allows the genotyping of 8303 genome wide SNPs including the allele dosage in tetraploid potato [33][34][35][36][37][38]. Moreover, next generation sequencing technologies make SNP genotyping by sequencing possible, for example by restriction-site associated DNA sequencing (RADseq) [39,40].
In this paper we address the question of the number and physical position of QTL for tuber yield, starch content and starch yield based on genome wide SNP genotyping of tetraploid potato cultivars using both RAD sequencing and the 8.3 K SolCAP SNP array [38]. We compare the results obtained with these two genotyping methods in the same three case-control populations, which were selected for having contrasting phenotypic means of tuber yield, starch content and starch yield. The results of the case-control studies are compared with the results of a genome wide association study (GWAS) using the SolCAP SNP chip, for tuber starch content in an independent population of cultivars. We also assess the reproducibility of diagnostic markers identified in previous association studies based on the candidate gene approach. In addition, we report novel candidate genes for tuber yield, starch content and starch yield.

Plant material and phenotypes
Two populations of tetraploid varieties and breeding clones were used for association mapping. The QUEST population consisted of 264 varieties and breeding cloness, which were evaluated in Northern Spain in two years at two locations for tuber yield (TY), starch content (TSC) and starch yield (TSY) [12]. A subset of 90 genotypes of the QUEST population was selected for SNP genotyping with the 8.3 k SolCAP potato SNP array and by RAD sequencing, based on the phenotype of TY, TSC and TSY (Additional file 1). The PIN184 population comprised 184 tetraploid breeding clones and is described in [41]. This population was evaluated in Northern Germany in three years at two locations for tuber starch content (TSC) measured by specific gravity [42]. Adjusted entry means for TSC, TY and TSY were calculated as described [12,41] and are used throughout in this study.

RAD library preparation and sequencing
Ninety six RAD (Restriction site Associated DNA) libraries for paired-end sequencing were prepared from 90 QUEST genotypes and six duplicate genotypes according to Etter et al. [43] with modifications according to Bus et al. [44]. In brief, 2 μg genomic DNA of each genotype were restricted with KpnI. The 96 libraries were individually barcoded with a custom made P1 adapter, which was ligated to the KpnI cut sites. Adapter ligations were incubated at 16°C overnight. The barcodes were designed to be separated from each other by at least six mutational steps. The 96 libraries were then pooled into two samples comprising 48 genotypes each. The two samples were sheared by ultrasound and DNA fragments of 300 to 400 bp were extracted after size separation from 1.5% agarose gels. An A-overhang was added to the blunt ends and the common P2 adapter was ligated as before to the template. P1 and P2 adapters were kindly provided by Anja Bus and Benjamin Stich (MPI for Plant Breeding Research). After a PCR (polymerase chain reaction) enrichment step, a second size selection of 300-400 bp fragments was performed. Sample quality was assessed on a 2100 Bioanalyzer (Agilent Technologies, Böblingen, Germany). The two samples were custom sequenced in two lanes of an Illumina HiSeq2000 system at the Max-Planck Genome Centre Cologne using GAIIx chemistry.

Sequence analysis and SNP detection after RAD sequencing
Paired-end reads were combined from both sequencing lanes and sorted according to the barcode, allowing maximally one mismatch. No mismatch was allowed at the restriction site. The sequences were mapped against the potato genome sequence (version v2.1.11, 32] using the Bowtie software package version 0.12.8 with default settings [45]. Reads that did not map to a unique genomic position were excluded from further analysis. The sequence reads of individual genotypes were assigned to the three case-control populations for TSC, TY and TSY. Further analysis was performed with the sequence reads pooled for cases and controls. Bi-allelic SNPs were called within case-control populations using the Genome Analysis Toolkit GATK [46]. Fisher's exact test was implemented with a custom-made Perl script to test wether the allele frequencies of a biallelic SNP differed between a pair of genotype pools with contrasting phenotypic means. FDR (false discovery rate) values were obtained with Bonferroni's correction for multiple testing and SNPs significant at FDR < 0.05 were selected. The SNPs in annotated loci were obtained by combining the SNP genomic positions (pseudomolcules version v2.2.11) with the annotation file of the potato genome sequence [32]. Non-synonymous SNPs were identified with the software package SnpEff [47].

Genome wide genotyping with the SolCAP SNP array
Genomic DNA was extracted from freeze dried leaf tissue as described [12,41]. DNA samples were diluted to 50 ng/μl AE-buffer (10 mM Tris-Cl, 0.5 mM EDTA, pH 9.0). The same 90 QUEST genotypes as used for RAD sequencing plus six technical replicates and the complete PIN184 population were custom genotyped with the Infinium 8303 potato array (8.3 k SolCAP SNP array) at the Life & Brain Center (Department of Genomics, Bonn, Germany) on an Illumina iScan system, applying the Infinium assay. The genotypes of bi-allelic SNP markers in the PIN184 population were called using the software package 'fitTetra' [48]. The package was run using the function fitTetra with the option try. HW = F. SNP genotypes were assigned manually to the 96 QUEST genotypes using GenomeStudio software version 2011.1 (Illumina). One of the five possible genotypes (AAAA, AAAB, AABB, ABBB, BBBB) was assigned to each individual for each SNP. The quality of the genotype clusters of differential/associated SNPs was visually examined a posteriori with GenomeStudio software version 2011.1 (Illumina) and very low quality SNPs were discarded. To detect SNPs with differential allele frequency in the QUEST case-control populations, Pearson's goodness-of-fit test (chi-square test) was applied to test the null hypothesis that the allele frequencies of a biallelic SNP in a pair of genotype pools with contrasting phenotypic means were equal. The null hypothesis was rejected at the significance level α < 0.01. The chi-square tests were performed using SAS software (version 9.1). Duplicates of six QUEST genotypes were included in the analysis. For each genotype duplicate Pearson's correlation coefficient between all genotypic data was calculated to assess the technical repeatability of the genotyping method.

SNP detection by amplicon sequencing and pyrosequencing
PCR amplification of genomic fragments, amplicon sequencing, pyrosequencing, SNP detection and scoring were performed as described previously [12,49]. Amplicon sequences, primer sequences and annealing temperatures are provided in Additional file 2.

Association analysis
In the QUEST population, association analysis of SNPs in candidate genes was performed as described in Schönhals et al. [12] using a mixed linear model which included kinship and population structure. Kinship and population structure were calculated based on the marker data of 183 microsatellite alleles at 29 loci distributed on the twelve potato chromosomes. In the PIN184 population, genome wide association analysis of SolCAP SNPs was performed using the kinship model K1 (a mixed linear model) described in Mosquera et al. [36]. The kinship matrix was calculated based on the genotypes of 241 SolCAP SNPs that were selected for equal distribution on the twelve potato chromosomes.

Results
Case-control populations for tuber starch content (TSC), yield (TY) and starch yield (TSY) Based on the adjusted means for TSC, TY and TSY of 264 genotypes of the QUEST population [12], ninety genotypes were selected in order to assemble genotype groups with most contrasting phenotypic means (casecontrol populations). The same genotype could be member of more than one group (Fig. 1, Additional file 1). Each of the case-control populations for TSC and TY consisted of two groups of 24 genotypes with high (cases) and low (controls) mean values of TSC and TY. The case-control population for TSY consisted of 21 genotypes with high TSY (cases) and 24 genotypes with low TSY (controls). The mean phenotypic differences between the HIGH and LOW groups were highly significant (p < 0.001). The two groups selected for TSC differed also for TSY (p < 0.001) but not for TY (p = 0.732), the groups selected for TY differed also for TSY (p < 0.001) but not for TSC (p = 0.576), whereas the groups selected for TSY differed also for TSC and TY (p < 0.001).
SNPs with differential allele frequencies in QUEST casecontrol populations obtained by RAD sequencing (RADseq) The case-control populations for TSC, TY and TSY were genotyped by RAD sequencing (European Nucleotide Archive (ENA): Study accession No. PRJEB10900, http:// www.ebi.ac.uk/ena/data/view/PRJEB10900). SNPs were identified by mapping the sequence reads to the potato genome sequence. SNP allele frequencies were estimated after pooling the data of the HIGH and LOW groups for TSC, TY and TSY. The results are summerized in Table  1. Of approximately 600,000 SNPs detected in each case-control population, 10.2%, 7.7% and 3.4% showed differential allele frequencies (FDR < 0.05) between the case-control populations for TSC, TY and TSY, respectively (Table 1: data filtering criteria 1 and 2). Further analysis was restricted to 25,501 SNPs with differential allele frequencies (FDR < 0.05) in 6664 annotated genes ( Table 1: data filtering criteria 3 and 4, Additional file 3). Eighty percent of the annotated genes (5333) contained two or more differential SNPs, predominantly for TSC and TY, and 34% of these genes (1799) contained multiple SNPs differential for TSC, TY and TSY singularly and combinations thereof (see below). Eighteen percent of the differential SNPs (4668, FDR < 0.05) caused amino acid changes in 2308 deduced proteins (Table 1: data filtering criteria 5 and 6). Stringent filtering of the data for differential SNPs with FDR < 0.0001 and causing amino acid changes resulted in 582 SNPs in 450 genes ( Table 1, data filtering criterion 7, genes are highlighted green in Additional file 4). One or more differential SNPs with FDR < 0.001 were present in 3144 genes (Table 1:    Twenty three percent (6000) of the SNPs had differential allele frequencies in two case-control populations, in TSC and TY (3005), TSC and TSY (665) or in TY and TSY (2330) (FDR < 0.05) ( Table 1: data filtering criterion 3, Additional file 6). The 3005 SNPs with differential allele frequencies in both the TSC and TY case-control populations showed antagonistic effects: the allele frequency in the HIGH-TSC and LOW-TY population was higher compared with the LOW-TSC and HIGH-TY population, and vice versa. This indicated that the allele with a positive effect on TSC was compromised by a negative effect on TY and vice versa. SNPs with differential allele frequencies in both TSC and TSY, and both TY and TSY case-control populations showed the same direction of effect. This was to be expected as TSY is derived from TSC and TY. Six hundred and forty eight SNPs in 468 genes showed differential allele frequencies in all three case-control populations (Additional file 7). In the majority of these cases, a positive effect of the SNP allele on TY and TSY was also confounded by a negative effect on TSC and vice versa. Importantly, the alleles of 88 SNPs in 71 genes showed the same or a neutral direction of effect on all three traits (highlighted green in Additional file 7). The positions of these 71 genes on the physical chromosome maps are shown in Figs. 2 and 3 (chromosomes c). Twenty six of the 88 SNPs in 18 genes caused amino acid changes ( Table 2).

SolCAP SNPs with differential allele frequencies in QUEST case-control populations
The same genotypes as used for genotyping by RAD sequencing were genotyped for 8303 SolCAP SNPs. Genotyping was highly reproducible as assessed by comparing the results of replicated genotypes. Genotype calling resulted in 7454 SNPs (89.8%) that could be assigned to one of the five genotype classes AAAA, AAAB, AABB, ABBB and BBBB. The SNPs were tested for differential allele frequency between cases and controls and ordered according to the p-value. SNPs with differential allele frequencies at p < 0.01 were selected and further analysed. This resulted in 204 differential SNPs in the TSC, 85 in the TY and 55 in the TSY case-control populations. Eight of these SNPs did not map to the physical chromosome maps and were not considered further. When combining the differential SNPs from the three case-control populations, differential allele frequencies of total 306 SNPs in 275 genes on all potato chromosomes were observed (Figs. 2 and 3, chromosomes a and b, Additional file 8). Eighty three percent of the differential SolCAP SNPs co-localized within less than 0.5 Mbp with 81 peaks in the distribution of genes with differential RADseq SNPs (Additional file 8). If a SNP was significant at p < 0.01 for one trait, values of p < 0.05 for the other two traits were also included in the comparison among the case-control populations. Based on this comparison, forty five SNPs showed differential allele frequencies in both the TSC and TY case-control populations. As observed for the same category of SNPs in RADseq, these SNPs showed an antagonistic allele effect: the allele with higher frequency in the HIGH-TSC compared to the LOW-TSC population had lower frequency in the HIGH-TY compared to the LOW-TY population and vice versa. Forty three and seventeen SNPs showed differential allele frequencies in both the TY and TSY and both the TSC and TSY case-control populations, respectively. The allele effect on both traits had the same direction (Additional file 8). Fifteen SNPs showed differential allele frequencies in all three case-control populations. With one exception (solcap_snp_c1_5656), the allele effect in the TSC population was opposite to the effect in both the TY and TSY populations (Additional file 8). Sol-cap_snp_c1_5656 was located in locus DMG401025958 annotated as 'glycine-rich protein' [50]. The same gene was detected by differential RADseq SNPs.
Association analysis of genes with differential SNPs in the QUEST case-control populations In order to test, to what extent SNPs with differential allele frequency in the QUEST case-control populations show association with TSC, TY and/or TSY in the whole QUEST population (n = 264), we selected eleven candidate genes for association analysis (Table 3). Primers for PCR amplification were designed for gene fragments which included 32 differential SNPs in the case-control populations. In total 83 SNPs and one deletion were scored in the whole QUEST population, either in amplicon sequences or by pyrosequencing (Additional file 2). Based on association analysis using a mixed linear model which considered kinship and population structure, twenty SNPs in nine genes were associated with TSC, TY, TSY or all three traits ( Table 4). Sixteen of these twenty associated SNPs were also detected by differential allele frequencies in the case-control populations, whereas four were novel.

Association analysis of SolCAP SNPs with tuber starch content in the PIN184 population
The PIN184 population was evaluated for tuber starch content over three years in replicated field trials. The phenotypic distribution of TSC (adjusted means) in the PIN184 population is shown in Fig. 4. Genotyping the PIN184 population for 8303 SolCAP SNPs yielded 6286 SNPs with genotype calls [36]. Association analysis using a mixed model that corrected for kinship resulted in 127 SolCAP SNPs in 117 genes which were associated with TSC at p < 10 −4 . Seventy five percent of the associated SolCAP SNPs co-localized within less than 0.5 Mbp with 50 peaks in the frequency distribution of genes with differential RADseq SNPs in the QUEST case-control populations (Additional file 9). The positions of the genes with associated SolCAP SNPs on the potato physical chromosome maps are shown in Figs. 2 and 3, chromosomes a.
Higher correspondence was observed between the experiments when the comparison was based on the genes containing differential or associated SNPs. Seventy two identical genes were detected by SolCAP SNP genotyping as well as RAD sequencing in one, two or all three QUEST case-control populations (Additional file 10). Importantly, forty two identical genes were detected in both the PIN184 and the QUEST case-control populations. Three genes, annotated as methyltransferase (DMG400031262), (See figure on previous page.) Fig. 2 Physical maps of potato chromosomes I to VI. The maps are based on the pseudomolecules version 4.03 [31], which are represented as vertical bars. Chromosomes a: to the left are shown as horizontal blue lines the positions of the genes detected by association of SolCAP SNPs with TSC in the PIN184 population (details in Additional file 9). To the right are shown as horizontal blue lines the positions of the genes tagged by SolCAP SNPs with differential allele frequency in the QUEST case-control population for TSC (details in Additional file 8). The length of the lines is proportional to the p value. The scale (− log10(p)) is shown on top of the chromosome maps. Chromosomes b: To the left are shown as horizontal red lines the positions of the genes tagged by SolCAP SNPs with differential allele frequency in the QUEST case-control population for TY. On the right are shown as horizontal green lines the positions of the genes tagged by SolCAP SNPs with differential allele frequency in the QUEST case-control population for TSY; otherwise as in chromosomes a (details in Additional file 8). The numbers between chromosomes a and b indicate the positions of the 46 genomic regions, where QTL for TSC detected by GWAS in the PIN184 population overlap with QTL for TSC, TY and TSY detected by differential SNPs in the QUEST case-control populations (details in Additional files 4 and 5). Chromosomes c: To the left is shown as horizontal purple lines the frequency distribution of 3144 genes with differential RADseq SNPs, at least one with FDR < 0.001 (Table 1 Table 2). They are identified by the last five digits of the PGSC0003DMG locus number. Included are here also the positions of the eleven candidate genes with differential SNPs that were tested for association in the whole QUEST population (indicated by *) ( Table 3), and the positions of 14 candidate genes (underlined) associated with TSC, TY or TSY in previous association studies that were also detected in the QUEST case-control populations ( Table 6). The approximate position of PHO1A on the short arm of chromosome III, which is not included in pseudomolecules version 4.03, is also shown kinase (DMG400029885) and copper ion binding protein (DMG400027986) on chromosomes V and IX were detected in all three experiments (Table 5). When the window for the comparison was widended to genomic regions, where genes associated with TSC in the PIN184 population co-localized with differential SNPs for TSC, TY and TSY in the QUEST case-control populations, either within 0.5 Mbp with SolCAP SNPs and/or with peaks in the frequency distribution of genes with differential RADseq SNPs, 46 genomic regions between

Discussion
SNPs with differential allele frequencies in QUEST casecontrol populations for TSC, TY and TSY and their corresponding genes Comparative RAD sequencing of the QUEST case-control populations resulted in a large number of differential SNPs distributed over the whole potato genome, despite a stringent Bonferroni correction for multiple testing. The genotypes in the case-control subpopulations were pooled in order to homogenize different genetic backgrounds of the individual genotypes analogous to the well known bulked segregant analysis (BSA) [51]. As in BSA, the homogenization of genetic background by genotype pooling was probably incomplete and resulted in an unknown fraction of differential SNPs that are unlinked to QTL for TSC and TY. This could be the reason for a background noise of genome wide distributed differential SNPs. The fact that only nine of eleven tested genes with differential SNPs could be validated by association analysis in the whole QUEST population also indicated a contamination of the RADseq data with false positive differential SNPs. Validation of physical linkage or eventually identity with QTL for TSC, TY and TSY of individual candidate genes is therefore required. This can be done by association analysis in the unselected QUEST population, by testing for reproducibility with different genotyping methods and in different populations as performed in this study, and by functional analysis of most promising candidate genes. The majority of differential SNPs in the QUEST casecontrol populations showed differential allele frequency between the HIGH-TSC and LOW-TSC populations, based on SNP genotyping by RAD sequencing as well as by the SolCAP SNP chip. The same observation applied to the number of genes with differential SNPs. Possible reasons for this observation are that more genes control tuber starch content than tuber yield, or that many genes with small effect on tuber yield were below detection level. Alternatively, the phenotypic selection of the HIGH-TY and LOW-TY genoptypes might have been less effective due to the lower heritability of tuber yield compared with starch content [12], resulting in less power to detect genetic effects on tuber yield. Twenty percent of the SNPs differential for TSC were also differential for TY but with antagonistic allele effect. The allele with a positive effect on TSC was compromised by a negative effect on TY and vice versa. This antagonism neutralized the effects on TSY. The number of differential SNPs for TSY was therefore lower compared with its components TSC and TY. The same effect was observed for most SNPs differential in all three case-control populations. In these cases however, the allele effect on TSY was not completely neutralized by the antagonistic effect on TSC and TY. This shows that the relative amounts of starch and water, which are the major constituents of total tuber yield, are partly controlled by a common set of genes and explains the previously observed negative phenotypic correlation between tuber starch content and yield [16,18]. In contrast, the alleles of 88 RADseq and one SolCAP SNP showed the same or a neutral direction of effect on TSC, TY and TSY. The corresponding genes were located on all chromosomes, with clusters on the south arms of chromosomes I, II, III, VI, IX and XII and the north arms of chromosomes V, VI, VIII, X, XI and XII (Figs. 2 and 3). These SNPs with unidirectional effect are particularly interesting for breeding applications, as they allow the simultaneous selection of positive alleles for tuber starch content and yield. More SNPs were differential for both TY and TSY than for TSC and TSY, and the same was true for the corresponding genes. This indicates that tuber starch yield was more strongly influenced by genes controlling tuber yield than starch content.
Eighty percent of the genes detected by RAD sequencing contained multiple differential SNPs, and the largest fraction of these contained different SNPs that were differential for TSC, TY, TSY and combinations thereof ( Table 1). Many of these SNPs were most likely redundant due to high linkage disequilibrium (LD) between SNPs within the same gene or between physically closely linked genes. A direct estimate of LD was not possible in the QUEST case-control populations though due to the necessity to pool the genotypes. The allele counts of individual genotypes were insufficient in numbers for statistical analysis. Symptoms for the presence of LD were different SNPs with nearly identical allele counts in the HIGH and LOW populations, and the local peaks observed in the frequency distribution of genes with differential RADseq SNPs on the physical chromosome maps (see below). On the other hand, the allele counts of many differential SNPs within the same locus were not obviously correlated. Genome wide genotyping of the QUEST case-control populations clearly showed that hundreds of genes influence TSC and TY. Even in case they function independently, many of them will be located physically close enough to each other such that their allele effects overlap due to LD between physically linked loci. Consequently, multiple differential SNPs with independent allele counts in the the HIGH and LOW populations will be observed within the same locus. The same observation can be explained by pleiotropic effects of multiple alleles of the same gene.

Comparison between genome wide genotyping methods
The three QUEST case-control populations for TSC, TY and TSY were genotyped for genome wide SNPs on the one hand by RAD sequencing and on the other with the 8.3 K SolCAP SNP chip. The main difference between the two genotyping methods was the genome coverage, which was two orders of magnitude higher in RAD sequencing. The average physical distance between SNPs was approximately 1.25 kbp in RAD sequencing and 100 kbp between SolCAP SNPs. The average SNP frequency in the genome of tetraploid potatoes is one SNP in every twenty to thirty base pairs [52]. Both genotyping methods were therefore far from capturing the full DNA diversity among potato cultivars. The selection of SNPs for the SolCAP chip was technically biased by the Table 4 SNPs associated with TSC, TY  requirements for allele detection and quantification with the Infinium SNP assay (ca. 50 bp sequence surrounding the SNP should be void of other SNPs) and to some extent biologically, as a portion of the 8303 SNPs was selected from candidate genes [38]. RAD sequencing was technically biased by the frequency and distribution of the KpnI restriction site used for generating the RAD libraries. The SolCAP SNPs represented a small fraction of the DNA variation among five US varieties and the European variety Bintje [34], whereas RAD sequencing captured a larger fraction of the DNA variation of ninety mostly European cultivars and breeding clones [12]. The detection of identical, differential SolCAP SNPs by RAD sequencing was therefore rather unexpected. Nevertheless, six differential SolCAP SNPs were also detected by RAD sequencing, which confirms that the corresponding genes are closely linked to a QTL. The correspondence between the two genotyping methods increased, when the comparison was based on the identity of the genes detected with both methods but mostly with different SNPs. In this case, 72 of 275 genes (26%) with differential SolCAP SNPs were also detected by differential RADseq SNPs. When the window for the comparison was further widended to physical genome segments, where differential SolCAP SNPs co-localized with peaks in the frequency distribution of genes with differential RADseq SNPs, 83% of the differential SolCAP SNPs fulfilled this condition. This points to the fact that short distance LD within genes and longer distance LD between physically linked genes (haplotype blocks) strongly influenced the numbers of the detected differential SNPs and their corresponding genes.
The numbers of genes with differential SNPs obtained with RAD sequencing were one order of magnitude higher than with the SolCAP SNP array, most likely due to inflation by extensive LD between differential SNPs in physically closely linked genes, only one of which might have a genuine effect on the phenotype. Although a direct estimate of LD was not possible in the case-control populations (see above), we assumed that LD caused the peaks observed in the frequency distribution of genes with differential RADseq SNPs on the physical chromosome maps (Figs. 2 and 3, Additional file 5). We further assumed that the peak width from less than 0.5 Mbp up to 4 Mbp was an indicator for the size of haplotype blocks in the QUEST population. Recently, an LD decay between 0.6 and 2.5 Mbp in euchromatic regions was reported in a population of 537 tetraploid potato varieties [53], which is in good agreement with most peak widths observed in this study. The observation of larger, three to four Mbp wide peaks on chromosomes I, III, VI, X and XII might be the result of overlapping haplotype blocks and lack of resolution. We conclude from these observations that several hundred rather than several thousand genes control the natural variation of tuber starch content, yield and starch yield in the QUEST population.
Comparison between different genetic backgrounds.
GWAS in the PIN184 population with the SolCAP SNP chip allowed a comparison of QTL for TSC in different genetic backgrounds, which were evaluated in different geographical regions. The QUEST population was evaluated in northern Spain [12] and the independent PIN184 population in northern Germany [41]. The overlap between the populations was again low on the level of identity between associated SNPs in the PIN184 and differential SNPs in the QUEST case-control populations. The seven SolCAP SNPs that were indeed identical, and the corresponding genes are particularly interesting for breeding applications, as they showed reproducible effects on TSC irrespective of genetic background and environment. Among those was the starch phosphorylase gene PHO1a on chromosome III, a well characterized functional candidate gene that was also associated with TSC in two additional, independent populations, although with different SNPs [14,16,18]. There is evidence that PHO1a is one of the causal genes for tuber starch content QTL [17]. On the level of gene identity, 42 of 127 (33%) genes with associated SolCAP SNPs in the PIN184 population were also detected by differential SNPs in the QUEST case-control populations ( Table 5). The highest overlap between QTL in the two genetic backgrounds was again observed when the comparison was extended to physical genome segments, where associated SolCAP SNPs in the PIN184 population co-   [14,18]. The same populations have been genotyped for DNA polymorphisms in candidate genes, the majority of which function in carbohydrate metabolism and transport. The QUEST population has also been genotyped with SNPs in several candidate genes [12]. These previous association analyses identified DNA variants in 39 genes that were associated with TSC, TY and/ or TSY (details in Additional file 11). Under the cut off conditions used for data analysis in the present study, fifteen of these genes (38%) were detected by differential SNPs in the QUEST case-control populations, twelve by RADseq, one by a SolCAP SNP and two by both RADseq and SolCAP SNPs. SolCAP SNPs in two of these genes were also associated with TSC in the PIN184 population (Additional file 11, Figs. 2 and 3, chromosomes c). Less stringent conditions for p values and analysed genomic sequences could increase the overlap between results of the candidate gene and the genome wide approach.
Eight of the ten candidate loci associated with TSC, TY or TSY in the whole QUEST population [12] were not detected by RAD sequencing in the QUEST casecontrol populations. Reasons could be the incomplete genome coverage by RAD sequencing, and that the case-control study design is not fully equivalent to  3 The gene is not annotated in the potato genome. Solyc**g****** corresponds to the annotated orthologous gene in the tomato genome (https://solgenomics.net/) association mapping in an unselected population. Vice versa in the present study, only one half of 32 differential SNPs in the QUEST case-control populations but nine of the eleven corresponding genes were reproducible by association analysis in the whole QUEST population. Similarly, the PHO1a locus was detected by the same SolCAP SNP in the QUEST case-control population for TSC as well as in the unselected PIN184 population (see above), whereas the marker for a specific PHO1a allele diagnostic for higher tuber starch content [16,17] was not associated in the whole QUEST population [12]. These comparisons show that the results obtained with different study designs and genotyping strategies in different genetic backgrounds partially overlap. Such comparisons are novel in potato genome analysis and highly useful for selecting the most promising candidate genes for breeding applications as well as further functional studies.
Novel candidate genes for TSC, TY and TSY RAD sequencing provided the highest genome coverage of the three experiments described in this study. RAD sequencing offered therefore the best chance that some among the 6664 genes with differential RADseq SNPs are causal for TSC-, TY-and TSY-QTL. Enriched for such genes should be the 2308 genes with differential SNPs causing amino acid changes, although we cannot exclude that some amino acid changes deduced from sequence alignment are artefacts caused by the algorithm used for mapping the sequence reads [54]. Stringent filtering for differential SNPs with the smallest FDR values (FDR < 0.0001) and causing amino acid changes resulted in 450 genes, which might be considered as primary functional candidates for TSC-, TY-and TSY-QTL. Hardly any of these genes has been functionally characterized in potato. Except one hexose transporter (DMG400031832), this set did not include genes with a direct function in starch metabolism [17]. The largest fraction comprised genes with unknown or unspecified function with respect to the pathway involved (e.g. DMG400008990, DMG 400028376 and DMG400038781). Next to this, signalling, transcriptional and posttranscriptional regulation seem to have important roles for TSC and TY. Multiple genes with highly differential SNPs were in this categories, which were annotated as receptor-like kinases (e.g. DMG4 01027271), transcription factors (e.g. DMG400000459), protein kinases and phosphatases (e.g. DMG400002885, DMG400011440), F-box proteins and ubiquitin ligases (e.g. DMG401031260, DMG400009902). In the same context of cellular regulation belongs protein homeostasis regulated by synthesis (e.g. DMG401026390), conformation stability by chaperones (e.g. DMG400008355) and degradation by proteases (e.g. DMG400028471) or the 26S proteasome pathway (see above). A number of genes encoding 'chromo' (chromatin organization modifier) domain containing proteins (e.g. DMG400043455) suggest that transcriptional regulation through chromatin structure and modification [55] contributes to the natural variation of TSC and TY. Chromatin organization might also be a biological explanation for the fact that several genes typical for retroelements such as 'integrase core domain containing protein' (e.g. DMG400015917) and 'Gag-pol polyprotein'(e.g. DMG400038533) were found even among the 450 top candidate genes [56]. Alternatively, the effects observed at these loci might be indirectly caused by LD, as they were mostly located in central, heterochromatic chromosomal regions where LD is very extensive [53]. Numerous genes encoding structural components of the cell wall and the cytoskeleton as well as cell wall modifying enzymes also contained highly differential SNPs. Most remarkable in this category were two clusters of extensin genes on chromosome XII between 51 and 53 Mbp and between 60.7 and 60.8 Mbp, which are excellent candidates for the TSC-, TY-and TSY-QTL on the long arm of chromosome XII. Extensins have essential roles in building and maintaining the growing primary cell wall [57]. Transport processes seem to be important as well (e.g. DMG400022183). The category 'biotic stress' was represented by multiple genes with differential SNPs, such as putative genes for pathogen resistance (e.g. DMG400006800), the PR1 gene (DMG400037874) and a cluster of methylketone synthase genes possibly involved in insect resistance [58] between 85.0 and 85.3 Mbp on chromosome I in QTL region 5. In the case of biotic stress it is clear that the effects on TY, TSC and TSY are indirect, caused by the obvious fact that susceptibility to pests and pathogens reduces crop yield.

Conclusions
This is the first study, which physically dissects and maps QTL for potato tuber starch content, yield and starch yield based on genome wide, high troughput genotyping methods and the potato genome sequence. Comparative genotyping of three case-control populations by RAD sequencing and the 8.3 k SolCAP SNP array showed that 8303 SolCAP SNPs were not sufficient for the comprehensive tagging of QTL for TSC, TY and TSY in the potato genome, whereas 600,000 RADseq SNPs tagged most QTL redundantly due to LD within genes and between physically linked genes. Neither method captured fully the DNA diversity present in tetraploid potato cultivars. Comparative RAD sequencing of the QUEST case-control populations identified, depending on the criteria for data filtering, between 450 and 6664 annotated genes with differential SNPs for TSC, TY, TSY and combinations thereof. Based on a small sample of eleven genes, approximately half of the differential SNPs in 80% of the genes could be confirmed by association analysis in the unselected QUEST population. The 450 genes which contained highly differential SNPs causing amino acid changes, are considered as primary functional candidates for controlling the natural variation of tuber yield and starch content. TSC and TY are partially controlled by identical genes with pleotropic effects. In the majority of cases, a positive effect on yield was confounded by a negative effect on starch content and vice versa. However, 89 SNP alleles had synergistic effects on TY, TSY and TSY. These SNPs and the corresponding 72 genes are primary targets for the development of diagnostic markers for breeding applications. The positions of the genes with differential SNPs on the physical chromosome maps of potato suggest that at least 200 QTL regions with one or more underlying genes on all chromosomes control the natural variation of tuber starch content and yield, and consequently starch yield, in the QUEST population. GWAS for tuber starch content in the independent, unselected PIN184 population using the SolCAP SNP chip identified SNPs associated with TSC in 117 genes. The comparison of the physical QTL maps of the QUEST and the PIN184 population identified genomic regions, which harbour QTL for TSC, TY and TSY across genetic backgrounds and environments.