Untargeted metabolic quantitative trait loci (mQTL) analyses reveal a relationship between primary metabolism and potato tuber quality

Recent advances in ~omics technologies such as transcriptomics, metabolomics and proteomics along with genotypic profiling have permitted to dissect the genetics of complex traits represented by molecular phenotypes in non-model species. To identify the genetic factors underlying variation in primary metabolism in potato we have profiled primary metabolite content in a diploid potato mapping population, derived from crosses between Solanum tuberosum and wild relatives, using gas chromatography time of flight mass spectrometry (GC-TOF-MS). In total 139 polar metabolites were detected of which we identified metabolite quantitative trait loci (mQTLs) for ~72% of the detected compounds. In order to obtain an insight into the relationships between metabolic traits and classical phenotypic traits we also analysed statistical associations between them. The combined analysis of genetic information through QTL coincidence and the application of statistical learning methods provide information on putative indicators associated with the alterations in metabolic networks that affect complex phenotypic traits.


INTRODUCTION
The variation observed in phenotypic trait values in plants is often of quantitative nature and it remains challenging to unravel the genetic basis of these traits. Quantitative trait locus (QTL) mapping is currently the most commonly used approach to dissect the genetic factors underlying complex traits.
The goal of QTL mapping is to identify genomic regions associated with a specific complex phenotype by statistical analysis of associations between genetic markers and phenotypic variation (Doerge, 2002). Recently, advances in high-throughput analysis and analytical detection methods have facilitated more integrated approaches to measure global phenotypic variation at the molecular level.
Metabolite profiling is a rapidly evolving technology which has significantly increased the possibilities of performing high-throughput analysis of hundreds to thousands of compounds in a range of plants including complex crop species. Metabolite composition is of great importance in crop plants as a number of important traits such as biotic and abiotic stress resistance, post-harvest processing and nutritional value are largely dependent on the metabolic content (Fernie and Schauer, 2009).
In potato breeding metabolomic studies have progressively increased in importance as many potato tuber traits such as content and quality of starch, chipping quality, flesh colour, taste and glycoalkaloid content have been shown to be linked to a wide range of metabolites (Coffin et al., 1987;Dobson et al., 2008). As a result, tuber quality can be assessed by assaying a range of metabolites. Gas chromatography (time of flight) mass spectrometry (GC-TOF-MS) has been shown to be useful for the rapid and highly sensitive detection of a large fraction of plant metabolites covering the central pathways of primary metabolism (Roessner et al., 2000;Lisec et al., 2006). In potato, untargeted metabolomic approaches by GC-MS have been successfully applied to assess changes in metabolites under different conditions (Roessner et al., 2000;Urbanczyk-Wochniak et al., 2005), to evaluate the metabolic response to various genetic modifications (Roessner et al., 2001;Szopa et al., 2001;Davies et al., 2005), and to explore the phytochemical diversity among potato cultivars and landraces (Dobson et al., 2008;Dobson et al., 2009). Additionally, metabolite profiling has been applied to monitor changes during key stages in the tuber life cycle (Davies, 2007). Untargeted approaches have thus generated a substantial amount of data providing important information concerning compositional metabolite changes upon perturbation and phytochemical diversity in potato. However, so far these studies have focused on applications of metabolite profiling as an evaluation and comparative tool.
Technological developments and improved data processing techniques now also allow the use of metabolite profiling to obtain further insight into the genetic factors controlling metabolic traits (Keurentjes, 2009). Exploration of the genetic factors underlying metabolite variation in mapping populations is particularly advantageous. The genetic variation between related individuals in a segregating mapping population can be exploited to locate the genomic regions involved in the regulation of the observed metabolite variation (Keurentjes et al., 2006). tuber shape (van-Eck et al., 1994), carotenoid biosynthesis (Wolters et al., 2010) and methionine content (Kloosterman et al., 2010).
In this study we have used the C x E population to explore the genetic basis of primary metabolites. To this end, untargeted GC-TOF-MS metabolic profiling was carried out on a core set of individuals of the C x E population. In order to investigate the variation in the detected metabolite levels in the individuals of the population we applied a genetical genomics approach (Jansen and Nap, 2001). QTL analysis of metabolite levels resulted in the identification of genomic regions associated with the metabolic variation. In addition, we performed a parallel QTL analysis for starch and cold sweetening related traits and genetically inferred links were established between these phenotypic traits and primary metabolites. We further applied multivariate analysis to the combined data sets of starch related traits and metabolic profiles to determine the predictive power of metabolites on a given phenotypic trait. For this we used a Random Forest (RF) (Breiman, 2001) approach to find significant associations between phenotypic and metabolic traits independent of genetic information. Putative predictors were tested and confirmed in an independent collection of potato cultivars.
Our results show the value of combining biochemical profiling with genetic information to identify associations between metabolites and phenotypes. This approach reveals previously unknown links between phenotypic traits and metabolism and thus facilitates the discovery of biomarkers for agronomically important traits.

Metabolite profiling
In order to assess the content and variation of polar primary metabolites present in the C x E diploid potato population untargeted GC-TOF-MS-based metabolite profiling was performed. The GC-TOF-MS method was applied to the polar aqueous methanol extracts of dormant tubers of 97 genotypes and the parental clones of the CxE population. After processing of the raw data 139 representative masses were obtained, consisting of reconstituted mass spectra (see Materials and Methods). The distribution of trait values for the detected compounds across all the genotypes was wide, with coefficients of variation higher than 50% for the majority of metabolites (~52%, Fig 1). This large variation can in part be explained by the segregation of genetic factors and therefore is amenable to genetic mapping approaches. further inspection of the spectra and retention indices. However, we included these unknown compounds in the cluster analysis to investigate the degree of association with identified metabolites.
Compounds from the same class, such as amino acids or carbohydrates, generally clustered together.
The correlation coefficients within the identified amino acids ranged between 0.6 and 0.9 (Fig. 2) and two metabolites were considered to be highly correlated if the absolute correlation coefficient had a value ≥ 0.6. Such high correlations have been reported before in potato between amino acids (Roessner et al., 2001;Dobson et al., 2008). It has been suggested that this correlation may reflect the mechanism of general amino acid control in plants (Halford et al., 2004). Interestingly, within the amino acid cluster the branched amino acids isoleucine, leucine, and valine clustered separately from the aromatic amino acids tyrosine, phenylalanine and tryptophan, as was also reported in earlier metabolomics studies in different potato cultivars (Roessner et al., 2001;Noctor et al., 2002;Dobson et al., 2008). Amino acids that are biosynthetically linked such as serine, glycine, and threonine also show a strong correlation ( Fig. 2). This suggests that much of the variation in amino acid content is genetically controlled by just a few master regulators. However, this was not the case for all related amino acids. The Pearson correlation coefficients among GABA (γ-aminobutyric acid), glutamic acid and proline were less than 0.2, although they are closely linked biosynthetically as members of the glutamate family. Other amino acids, such as glutamic acid and asparagine, show weak correlations (<0.4) with the major cluster of amino acids. This could suggest that the genetic regulation of these biosynthetic routes occurs independently from that of the main cluster of amino acids. In addition, most of the detected carbohydrates, such as mannose and fructose, also form a cluster ( Fig. 2). In contrast, sucrose clusters with a group of organic acids rather than with the other carbohydrates. The clustering of organic acids is, however, less distinct, and this is likely due to the diverse biochemical origins of these compounds.

Identification of metabolic QTLs (mQTLs)
To determine if the variation observed in metabolite levels could indeed be explained by allelic differences in genetic factors, we performed metabolic QTL (mQTL) analysis on the metabolic profiles.
The software package Metanetwork was used to map the metabolite variation. Metanetwork applications (Fu et al., 2007) were designed from data collected from recombinant inbreed lines (RILs), hence, in order to adjust the software applications to a cross-pollinated species like potato, we used two separate linkage maps: one for the female parent C and one for the male parent E. Overall, the mQTL for tyrosine on chromosome eight was also detected in a previous study on the same population (Werij et al., 2007). On chromosome five, we found eight mQTLs for glutamic acid, mannose, tryptophan and a number of unknown compounds. On chromosomes three, six and ten no significant mQTLs were detected. Fig. 3a shows the QTL profiles of all the metabolites mapped to the C parent map in a heat map.

E-parent map:
In this map, 160 significant mQTLs were detected for 85 representative masses (Fig.   3b). For 33 masses, only one QTL was detected and a maximum of six mQTLs for one metabolite (identified as quinic acid). The highest number of mQTLs on a single chromosome was 71 on chromosome five. This chromosome also contributed the most to the total explained variation of all detected mQTLs. A single genomic region, spanning three adjacent markers, accounted for the highest density of detected mQTLs (34). This region is known to be involved in plant maturity (van Eck and Jacobsen, 1996;Collins et al., 1999;Oberhagemann et al., 1999) and as such exerts many pleiotropic effects on development-related traits. The majority of compounds mapping to this region were classified as amino acids, organic acids and carbohydrates. This is not unexpected, as rapid changes in primary metabolism are known to occur during the later stages of maturation. Interestingly, similar to observations in the C-map, some amino acids that are biochemically related shared an mQTL at the plant maturity locus, e.g., glycine and threonine. Other amino acids like phenylalanine, lysine, valine and methionine also mapped to the plant maturity region. Some of the identified compounds mapping to this region also showed significant mQTLs in other chromosomes, both in the C and the E map. For example, four more mQTLs were detected for L-threonine, in chromosomes one, two and ten in the E-parent map and chromosome seven in the C-parent map. For methionine another mQTL was detected in chromosome two of the C-parent; and for valine one additional mQTL was detected in chromosome three of the E-parent indicating complex regulation of these traits.

Association between phenotypic and metabolic traits
Traits of agronomic importance in potato such as starch and cold sweetening are expected to be associated with primary metabolites. To investigate this relationship in more detail we carried out a parallel QTL analysis for phenotypic starch and cold sweetening related traits determined for this population for two years of harvest: 2002and 2003(Supplemental table S5, (Werij, 2011. Having mapped both metabolic and phenotypic QTLs to the two parental maps, we investigated the level of co-regulation of these two sets of traits by determining co-localizing QTLs. As expected, a substantial number of the phenotypic traits mapped to the plant maturity locus at chromosome five. However, a number of significant QTLs for essential metabolites were also detected outside this region indicating a possible regulation independent of the developmental stage and therefore these mQTLs are of particular interest. and 30 minutes. This position coincided with mQTLs for two unknown compounds (No 044 and 081).
A fourth trait, starch grain particle size, mapped to a different region of chromosome eight on the Cmap which co-localized with mQTLs for amino acids, organic acids and carbohydrates. Chip colour difference between reconditioning and harvest mapped to chromosome six on the C parent map and chromosomes three and nine on the E-parent map but only the latter two regions co-localized with a limited number of mQTLs.
Two traits, chip colour after harvest and chip colour difference between storage-harvest mapped to chromosomes nine and three of the E-parent map. These positions coincide with genomic regions where mQTLs for succinic acid, fumaric acid, butanoic acid and Unknown044 were detected.
The strongest association, however, was detected between starch phosphorylation and a number of  Table S6), indicating a general high reproducibility of the expression of this trait. The observation that two to three QTLs could be mapped in each year together with a high broad sense heritability (H 2 = 0.5) further indicates that a substantial part of the trait variation can be attributed to genetic factors. In addition, a number of co-localizing mQTLs were identified (see above) that suggest links between starch phosphorylation and metabolic processes. To rank the associations between starch phosphorylation and the 139 representative masses, we used a multivariate Random Forest (RF) regression approach (Breiman, 2001). The starch phosphorylation measurements of the two years were used separately as a response variable regressed against the 139 representative masses over all population individuals and significantly associated metabolites were recorded .
Using starch phosphorylation measurements of the 2002 harvest and all of the detected compounds, the RF model explained 16% of the variance at a permutation threshold of 1 0 metabolites were significantly associated with phosphate content at this threshold (Table II). Univariate correlation analyses between these significant metabolites and starch phosphorylation yielded absolute R 2 values ranging between 0.07 and 0.26 (of which two have a negative Pearson correlation value). For the 2003 harvest the RF model explained 33% of the variance at a permutation threshold of α =0.001. In this case, eight metabolites were found to be significantly associated with starch phosphorylation (Table II). The correlations ranged from 0.12 to 0.39 (of which one has a negative Pearson correlation value). Interestingly, all the significantly contributing metabolites from the 2003 model were also identified using the 2002 data, again illustrating the high reproducibility between the two years. From these eight compounds seven showed co-localizing QTLs with at least one of the starch phosphorylation QTLs: β -alanine, GABA, L-aspartic acid, alanine, butanoic acid, Unknown027 and Unknown033 (Supplemental Table S4).
As a third independent line of investigation we performed RF regression on a subset of cultivars from a potato collection available in our laboratory (year of harvest 2007). All 214 cultivars of this collection were analysed for starch phosphorylation and from these 30 cultivars were selected covering the whole distribution range of this trait (Supplemental Fig. S1) These 30 cultivars were analysed for metabolite content with the same analytical procedure as used for the C x E population. RF regression was performed using the starch phosphorylation measurements and the metabolites detected in this set of cultivars (data not shown). The resulting RF model explained 30% of the variation in starch phosphorylation and nine compounds were found to contribute significantly at a permutation threshold of α =0.001 (Table II). The univariate correlations between these significant compounds and starch phosphorylation ranged between 0.09 and 0.41 (of which four have a negative Pearson correlation value) .
A comparison of the significant predictive compounds after RF analysis in the two sets of potato material (i.e. C x E (2002,2003) and cultivar collection) revealed one compound in common, β -alanine.
In the C x E population β -alanine shows a positive correlation with starch phosphorylation in both years. This positive trend was also observed in the selected cultivar set (Supplemental Fig. S2).
Because a robust metabolic predictor of a phenotypic feature is preferably valid across different potato sources we consider β -alanine as a reliable metabolite significantly linked to the level of phosphorylation of starch in potato tubers.

DISCUSSION
The use of an untargeted metabolomics approach permits a quantitative assessment of a wide range of metabolites and allows the detection of known and unknown metabolites. Untargeted metabolomics approaches have been successfully applied to experimental plant populations to uncover loci controlling variation of metabolites. (Overy et al., 2005;Keurentjes et al., 2006;Morreel et al., 2006;Schauer et al., 2006;Tieman et al., 2006;Lisec et al., 2008;Rowe et al., 2008).
In this study we used untargeted GC-TOF-MS metabolite profiling to assess the quantitative variation in polar metabolites present in dormant tubers of a diploid potato population. The observed variation in 1 1 this population enabled us to locate genomic regions involved in the regulation of a range of polar primary metabolites. Primary metabolites, consisting mainly of carbohydrates, amino acids and organic acids, have an essential role in plant growth and development. In potato, carbohydrates are important for a number of agronomic traits related to tuber quality such as starch content and cold sweetening. In this study, the same genetic material was used to detect QTLs for starch and cold sweetening related traits and metabolic traits.
We investigated associations between phenotypic and metabolic traits through QTL co-localization, correlation analysis and Random Forest analyses. The detection of a QTL identifies the existence of at least one polymorphic locus that is contributing to the variation observed for a given trait (Causse et al., 2004). When QTLs for two different traits co-localize this might indicate the existence of a common regulator that controls the variation of both traits. This is of special value in the search of candidate genes for traits with complex modes of inheritance or for which most of the genetic basis is unknown.
However, it cannot be excluded that co-localizing genomic regions contain genes that are closely linked but are involved in different biological processes. Due to the limited resolution of QTL mapping and a finite number of markers, co-localization of (unrelated) QTLs is inevitable when large data sets are involved (Lisec et al., 2008). Therefore we performed independent statistical tests to confirm true positive associations between metabolites and phenotypes. In addition, we have validated putative metabolic biomarkers in an independent set of potato varieties. Our stringent selection criteria resulted in the determination of a strong relationship between potato starch phosphorylation and primary metabolism. Furthermore, our analyses resulted in identification of β -alanine as an important predictor for the degree of phosphorylation of starch in potato tubers.

Mapping of metabolic variation in potato tubers
Mapping populations are very suitable to identify loci controlling the variation of a given trait. In this study we aimed to explore the variation of metabolic and phenotypic traits present in dormant potato tubers. Our results show that we could assign genomic regions involved in metabolite variation for approximately 72% of the detected metabolites.
The abundances of metabolites that share an mQTL, especially for major loci of qualitative traits, are expected to correlate because they co-segregate in a mapping population. For instance, L-leucine and lysine share an mQTL on chromosome nine and are positively correlated. Metabolites sharing an mQTL often belong to the same biochemical pathway. For example, phenylalanine and tyrosine share a common biosynthetic pathway and hence are found to be co-regulated. Alternatively, co-locating QTLs can be the result of closely linked independent regulators (Lisec et al., 2008). In case of a shared regulator a direct, or even causal, relationship can be expected between traits whereas in the latter case the two traits are independently controlled. This distinction should be reflected in correlation analysis with higher values expected for co-regulated traits. In contrast, high correlation values between traits can also be expected when environmental factors that affect multiple traits simultaneously are in play. The resulting decrease in signal to noise ratio would be reflected in low heritabilities and QTL detection power. We therefore have applied independent lines of investigation, including genetic, correlation and Random Forest analyses to reliably identify biologically meaningful relationships between metabolites and complex phenotypes.
In targeted studies, QTLs were found for some of the metabolites that were also detected in our analyses. In a previous study an mQTL for tyrosine was detected on chromosome eight in the C parent map (Werij et al., 2007). This amino acid has been reported to be associated with the level of enzymatic discoloration (Corsini et al., 1992), although other studies have reported otherwise (Mondy and Munshi, 1993). In agreement with the results of Werij et al.2007 we did not find overlapping QTLs for tyrosine and enzymatic discoloration. In addition, we confirmed the QTL detected by Werij et al. 2007 and also detected two more QTLs for tyrosine at chromosomes five and eleven of the E-parental map. This difference is likely to be explained by differences in analytical techniques used in the two studies. Additionally, revisions in the linkage map that was used in our study may have influenced the power of detection of QTLs.
Another interesting metabolite that was also mapped in previous studies is methionine. The level of this amino acid is important for the nutritional value of potato. Moreover, it is the precursor of metabolites important to potato flavour (e.g., methional) and attempts have been made to enhance the methionine content to improve flavour (Di et al., 2003;Dancs et al., 2008). In earlier work on the C x E population two QTLs, in chromosomes three and five, were detected underlying the variation of this amino acid content in tubers (Kloosterman et al., 2010). In agreement with that study, we detected QTLs for methionine that mapped on chromosomes three and five of the E parent map and an additional QTL on chromosome two of the C parent map.
The significant mQTLs detected in both parental maps were unequally distributed over the genome, indicating hot and cold spots for metabolite regulation. A well-known locus involved in plant maturity is located at chromosome five, where a major QTL for this trait has been detected in the C x E population (van Eck and Jacobsen, 1996). Plant maturity has been shown to be closely linked to a number of traits, including resistance and developmental traits (Collins et al., 1999;Oberhagemann et al., 1999;Bormann et al., 2004), although the underlying genetic factor has not been identified thus far.
Products of primary metabolism such as carbohydrates and amino acids are expected to influence the degree of plant maturity and vice versa. We therefore anticipated that a large number of metabolic traits would show association with the plant maturity region on chromosome five. Nonetheless, a substantial number of mQTLs for amino acids, organic acids and carbohydrates have not been reported before and were identified outside this region. This finding highlights the importance of other genomic regions in the regulation of primary metabolite accumulation despite the pleiotropic effects displayed by the plant maturity region.
In addition, a number of mQTLs mapped to multiple positions, which indicates complex regulation. addition to multiple other loci and plant maturity only at one may indicate that metabolism is at least partly under developmental control or possibly another factor upstream.

Putative predictors of starch phosphorylation
QTL co-localizations can be useful to identify metabolites involved in the regulation of phenotypic traits. This is of special importance for traits of which the genetic basis is unknown providing a valuable tool to search for candidate genes. However, one should be cautious when making such assumptions because two different traits that share the same regulatory region are not necessarily involved in the same molecular or biological process. In a specific genomic region genes might be present that are linked but that have different enzymatic functions. The phenotypic traits evaluated in this study are known to be related to carbohydrate metabolism and consequently metabolites involved in this pathway are likely to be linked to these traits. Nevertheless, QTL co-localizations can disclose unknown associations and moreover identify candidate predictors of trait variation (Lisec et al., 2008).
One of the aims of this study was to test to what extent phenotypic and metabolic QTLs co-localize in order to identify metabolites associated to phenotypic features. We focused on starch phosphorylation as a phenotypic case study. Potato starch has a particularly high content of phosphate groups in comparison to other plant species. The degradation of starch is dependent on reversible phosphorylation of the glucans at the surface of the starch granule (Zeeman et al., 2010) and although a direct link between the content of phosphate groups and starch degradation has not been found it has been shown that alterations in the starch phosphorylating enzymes lead to an excess of starch accumulation in the plant (Caspar et al., 1991;Zeeman and Rees, 1999;Yu et al., 2001). In potato the high phosphate content of starch affects the viscosity and formation of stable starch pastes (Wiesenborn et al., 1994;Viksø-Nielsen et al., 2001) which is important for the diversified uses of starch in industry.
Here, we show that a number of amino acid mQTLs co-localise with trait QTLs for starch phosphorylation. To measure the strength of the genetically inferred links between the detected metabolic and phenotypic QTL co-localizations we examined the associations and predictive power of the metabolite data for starch phosphorylation using RF regression analysis.
The application of multivariate statistical methods to assess associations between metabolites and phenotypic traits has been successfully applied in a number of studies. An approach using canonical correlation analysis to test the predictive power of metabolic composition for biomass traits in Arabidopsis revealed a number of metabolites related to biomass and growth (Meyer et al., 2007). In potato a Partial Least Squares (PLS) analysis was used to discover metabolites that function as predictors for susceptibility to black spot bruising and chip quality (Steinfath et al., 2010). The validity of these results was tested in a collection of potato cultivars and in a set of individuals of a segregating population where metabolic and phenotypic information obtained from a first environment was used to predict phenotypic properties from the metabolic data obtained from a second environment. Those results demonstrate the application of multivariate data analysis and the value of independent validation to discover a small set of metabolites that can be used as biomarkers for a phenotypic trait of interest. We used Random Forest analyses to predict starch phosphorylation from a GC-MS data set. A similar approach was used to predict flesh colour and enzymatic discoloration from transcriptomics and liquid chromatography -mass spectrometry (LC-MS) data sets (Acharjee et al., 2011). This study resulted in the successful identification of associated genes and metabolites, of which some were known to be involved in the regulation of the traits under study. Correspondingly, in our study, the application of RF regression resulted in a list of highly ranked metabolites, representing the most important compounds associated with starch phosphorylation. Inspection of the annotation of these included a number of unknown metabolites and more interestingly a few amino acids for which we also detected mQTLs coinciding with starch phosphorylation QTLs. Amongst these relevant metabolites, After a dormant phase, potatoes develop from a sink to a source organ that will subsequently support the growth and development of the new sprout. Owing to a higher content of phosphate groups, starch may be more easily mobilized and converted into resources for the growing sprout. Vitamine B 5 is www.plantphysiol.org on August 31, 2017 -Published by Downloaded from Copyright © 2012 American Society of Plant Biologists. All rights reserved. used in the synthesis of coenzyme A (CoA), an acyl carrier protein. CoA is required in many central metabolic processes and it is essential in the conversion of pyruvate to acetyl-CoA to enter the tricarboxylic acid cycle (TCA cycle) (Chakauya et al., 2006). In addition, CoA is fundamental in the biosynthesis of fatty acids, polyketides, depsipeptides and peptides (Kleinkauf, 2000). β -alanine constitutes an important part in the biosynthesis pathway of Vitamine B 5 and the presence of this amino acid may be indicative for the formation of many essential metabolites for plant development and furthermore, it may act as an indicator of the mobilization of storage resources. In this study, we identified β -alanine associated with starch phosphorylation as well as a number of other metabolites for which it also might be predictive. Our approach has been shown to be instrumental in generating hypotheses about functional relationships between metabolites and phenotypes. In addition, it may help for a gradual understanding of metabolic processes contributing to observed phenotypic features of interest.
Our current data demonstrate the benefits of the applied methods for a broad untargeted metabolomics approach in potato. In this study we combined genetic information through mQTL and phenotypic QTL analysis and non-genetic information through regression of trait values to predict phenotypic traits from metabolomics analysis. We identified candidate metabolites which can be informative for phenotypic traits of interest.
The advances in metabolomics have opened up the way to high-throughput approaches allowing the analysis of variation of a large number of samples in a reasonable amount of time. In addition, advanced statistical methods enable us to explore and monitor different profiling techniques in nonmodel species. A multilevel integrative approach to study organisms as a system of genetic, proteomic and metabolic events may enable us to achieve a higher level of understanding of the interactions occurring in a biological system of interest. In potato, although this field is still in its infancy, some examples have already shown the advantages of such approaches to identify, and hypothesize about, the components in biologically relevant pathways (Acharjee et al., 2011). Furthermore, the genome sequence of potato (Consortium, 2011) has now revealed genes specific to this highly heterozygous crop, bringing a platform that will ultimately facilitate the elucidation of the genetic basis of complex traits of high importance in breeding for tuber quality.

Plant material
The C x E population The diploid population (C x E) consisting of a total of 251 individuals was obtained from a cross between two heterozygous diploid potato clones, USW5337.3 (coded C: Solanum phureja x Solanum tuberosum ) and 77.2102.37 (coded E: S.vernei x S.tuberosum). This population has been of special use to study the inheritance and genetic mapping of traits related to tuber quality (i.e tuber shape, tuber size, eye depth, flesh colour, among others). The development and characteristics of the population and the parental lines are described in detail in (Celis-Gamboa, 2002)  other centered of these normally distributed traits (data not shown), demonstrating the large amount of transgression present in this population . In addition, the female and male parent show very similar plant maturity phenotypes, with the female C showing a slightly earlier maturity phenotype than the male E.
A subset of 97 genotypes of this population was grown in two subsequent years (2002 and 2003) during the normal potato growing season (April-September) in Wageningen, The Netherlands. For each genotype, tubers were collected from three plants. Harvested tubers were either used for phenotypic analysis or mechanically peeled and immediately frozen in liquid nitrogen before being ground into fine powder and stored at -80 o C.
Phenotypic analyses for 26 starch and cold sweetening related traits were performed for both years of harvest (Supplemental Table S5). Metabolite profiling was carried out on the ground material of tubers of the 2003 harvest.

Potato cultivars
Potato cultivars used for independent confirmation and further statistical analysis were part of the potato collection available at Wageningen UR Plant Breeding. This collection consists of 221 tetraploid potato cultivars which were provided by Dutch breeding companies and gene banks. Phosphate measurements were carried out for 214 potato cultivars. In accordance with the distribution of the trait values (Supplemental Fig. S2), we selected 30 cultivars representing high, medium and low phosphate content.

Determination of starch phosphorylation
The degree of phosphorylation of starch was determined in a colorimetric assay. 20 mg starch was mixed with 250 µl of 70% HClO 4 and incubated at 250°C for 25 min. Then 50µl of 30% H 2 O 2 (w/v) was added and incubated for another 5 min. After cooling, 2ml of H 2 O was added to reach a final concentration of HClO 4 8.75% (w/v).
The colour reagent consisted of 0.75% (w/v) (NH 4 )6Mo 7 O 24 .4H 2 O, 3% (w/v) FeSO 4 .7H 2 O and 0.75% SDS (Sodium Dodecyl Sulphate) (w/v) dissolved in 0.375 M H 2 SO 4 . 100 µl of the sample extract, or a standard solution, was mixed with 200 µl of the colour reagent solution in a microtiter plate and incubated for 10 minutes at room temperature. The absorbance was measured at 750 nm in a microplate reader using 8.75% HClO 4 as a blank. A calibration curve of PO 4 dissolved in HClO 4 ( 0 -500 µM ) was used to determine the phosphate content.

Extraction and derivatization of potato tuber metabolites for GC-MS analysis
Relative metabolite content was determined as described in (Weckwerth et al., 2004) with modifications specific to the potato material. Briefly, polar metabolite fractions were extracted from ~100 mg fresh weight (FW) of tuber powder. 1.4 ml of a single phase solvent mixture of methanol:chloroform:water (2.5:1:1) was added to the ground tuber powder in a 2 ml Eppendorf tube.
Alanine-d3 was used as deuterated internal standard and ribitol was used as a representative internal standard and they were all mixed in one solution. In the water phase: 25 ul of a solution containing the www.plantphysiol.org on August 31, 2017 -Published by Downloaded from Copyright © 2012 American Society of Plant Biologists. All rights reserved. aforementioned internal standard solution was added. After vortexing, the closed tubes were sonicated for 15 min. After 5 min. of centrifugation the supernatant was transferred into a new Eppendorf tube and 400 µl of water was added. The mixture was thoroughly mixed by vortexing and centrifuged for 10 min at 21000 rfc. The methanol/water supernatant (polar phase) was carefully transferred into a new Eppendorf tube. Aliquots of the polar phase (100 µl) were dried by vacuum centrifugation for 12-16 hours.

GC-MS data processing methods
Data pre-processing Raw data were processed by ChromaTOF software 2.0 (Leco instruments) and MassLynx software (Waters Inc.) and further analysis was performed using MetAlign software (Lommen, 2009) to extract and align the mass signals (s/n≥ 2). Mass signals that were present in less than 2 samples were discarded. Signal redundancy per metabolite was removed by means of clustering and mass spectra were reconstructed (Tikunov et al., 2005). This resulted in 139 reconstructed polar metabolites (representative masses).

Compound identification
The mass spectra of the representative masses were subjected to tentative identification by matching to the NIST05 (National Institute of Standards and Technology, Gaithersburg, MD, USA; http://www.nist.gov/srd/mslist.htm) and Wiley spectral libraries and by comparison with retention indices calculated using a series of alkanes and fitted with a second order polynomial function (Strehmel et al., 2008). Library hits were manually curated, and a series of commercial standards were used to check annotation. Compound identification is limited to the availability of spectra in the libraries used. The identities of the detected compunds are listed in Supplemental Table S1

Data normalization and Multivariate Analysis
Mass intenstity values of the representative masses were normalized using isotope labeled d3-alanine as an internal standard. Relative amounts of the compounds were obtained by normalizing the intensity of individual masses to the response of the internal standard. The ratio between the mass intensity value of the putative compound and the d3-alanine internal standard was then scaled by multiplying the resulting value by the average of the d3-alanine mass intensity across all samples.
Normalized values were log-transformed in GeneMaths XT version 2.12 software (www.appliedmaths.com). These data were used for cluster analysis using Pearson's correlation coefficient and UPGMA for hierachical clustering method.

Metabolic and phenotypic QTL analysis
Metabolite QTL analyses were performed using the software package Metanetwork (Keurentjes et al., 2006;Fu et al., 2007). Metanetwork applies a two-part model and a p-value is determined for each part of the model. In this study, p-values and QTL thresholds were determined as described in (Keurentjes et al., 2006). Since Metanetwork is not designed for cross-pollinated species, two separate linkage maps were used in our analysis: one for the female parent C and one for the male parent E. The number of markers specific to the C parent map is 218 and for the E parent map 178 with an average spacing between markers of 6.1 and 3.9 cM respectively. The significance QTL threshold value was estimated by Metanetwork. Empirical thresholds for significant mQTLs were calculated separately for both parental maps, C-parent map: -10 log(p) = 3.43, (p=0.00037) and Eparent map: -10 log(p) = 3.19, (p=0.00065).
Phenotypic measurements containing missing data cannot be analysed by Metanetwork, hence, QTL analyses for phenotypic data were performed using the software package MapQTL ® Version 6.0 (Van Ooijen, 2009). QTL LOD thresholds were calculated per trait using a permutation test (N = 10,000) provided in MapQTL ® .
Broad sense heritability was estimated for starch phosphorylation measurements over the two years (2002,2003) according to the formula H 2 = V G / (V G + V E + V G * E ), where V G is the variance among genotypes and V E is the year variation. One phosphate content measurement per year was used in a mixed model to calculate variance components for genotypes, years and residual (=genotype * year).