Mapping taste and flavour traits to genetic markers in lettuce Lactuca sativa

Highlights • NMR spectroscopy is an effective and rapid alternative to chromatographic methods.• Stable QTL were identified for sugars and bitter sesquiterpene lactones.• We identify breeding markers for flavour and nutritional traits.• Candidate genes are identified for key metabolites.


Introduction
Lettuce is an important Asteraceous leafy vegetable crop consumed across the world, and in the western diet is one of the main constituents of salads by weight.It is a source of folates, polyphenols, terpenoids, dietary fibre and has a low glycemic index (GI) (Shi et al., 2022).These factors make it one of the most commonly consumed vegetables (Statistica, 2023) associated with leading a healthy lifestyle.The absolute concentration of secondary metabolites is lower compared to other leafy vegetables, particularly brassicas and spinach, however the large quantities consumed make it rate highly in total contribution of polyphenols to western diets (Song et al., 2010).
Major traits of consumer interest that determine the acceptability and potential market size of a crop are taste, flavour and health benefits.These traits are determined by the concentrations of sugars and beneficial secondary metabolites, such as polyphenols, sesquiterpenoids, and fatty acids.These metabolites can affect consumer liking and buying preference (Chadwick et al., 2016), seen in the trend toward consuming food considered healthy, additionally driving up consumption of lettuce and increasing the consumption of beneficial compounds.
Sesquiterpenes are a significant class of plant secondary metabolites which are characteristic of the Asteraceae and have evolved as phytoalexins and anti-feedants in lettuce to protect the crop against chewing insect pests.In lettuce these are predominantly sesquiterpene lactones (STLs); more than 2 % of the plant's dry weight can be made up of these compounds (Heinrich et al., 1998).Lettuce is a perceived as being bitter by many consumers; sensitivity to bitterness is particularly prominent in younger children, who represent a major target group for healthy eating, and for whom fruit and vegetable consumption is lacking (Saksvig et al., 2005, Cooke ,2007, Rodrigues et al., 2020).STLs are the primary source of bitterness in lettuce, which are detected by HTAS2R receptors (Brockhoff et al., 2007).A range of health benefits have been reported for STLs isolated from across the plant kingdom, primarily from the Asteraceae, and include those present in lettuce (Bischoff et al., 2004, Zhang et al., 2016, García et al., 2020, Wang et al., 2020).The health benefits, which include anticancer and anti-inflammatory effects, antifungal, anxiolytic, analgesic and antitrypanosomal activities (Moujir et al., 2020), are broadly attributed to the presence of an α-methylγ-lactone (αMγL) functional group (Kupchan et al., 1971, Macías et al., 1996, Hehner, Heinrich et al., 1998, Chadwick et al., 2013, Schomburg et al., 2013, Kemboi et al., 2022) and an unsaturated carbonyl moiety (Chen et al., 1998, Lyß et al., 1998, Siedle et al., 2004) as a part of the lactone ring structure.
In addition to sesquiterpene lactones, phenolics represent another major nutritional group in lettuce.Quercetin, luteolin, chlorogenic acid, chicoric acid, caftaric acid and coumaric acid and their derivatives are present in lettuce and both profile and abundance are highly variable between varieties.Up to 2 mg.g − 1 (fresh weight) of polyphenolics has been reported (Heimler et al., 2012) and considered as a major component of the health giving properties of plants (Llorach et al., 2004, García-Macías et al., 2007, Fraga et al., 2019, Catalkaya et al., 2020).Their taste is slightly bitter and astringent, though with less of an impact on overall bitterness than STLs.
Concentration of sugars counterbalances bitterness (Green et al., 2010).Sugar content of food is an international concern in terms of contributing to diet-related disease when consumed in high quantities, particularly within processed foods (Machado et al., 2020).Nonetheless, there is a strong consumer preference for sweeter and less bitter lettuce (Chadwick et al., 2016).The health benefits of consuming leafy vegetables, such as fibre and secondary metabolites are considered to outweigh the negative impacts of consuming sugars at the concentrations they are found in lettuce (~50 nmol.kg− 1 fresh weight; (Gent, 2012)).Sugars remain an important breeding target to generate new vegetable varieties with good consumer acceptability.
In order understand the environmental impact on sesquiterpene metabolism and to search for stable QTL which were minimally affected by the growing environment we used a mapping population derived from L. sativa cv.Salinas x L. serriola UC96US23.The population of 104 F 9 lines was grown in three environments: an indoor controlled environment with limited stress; and two field-based trials simulating commercial practice but one with minimal soil nitrogen and with soil nitrogen provided above concentrations found in common commercial practice.

Nuclear Magnetic Resonance (NMR) spectroscopy as a rapid tool in determining the metabolome of lettuce
NMR spectroscopy enables the simultaneous measurement of a large range of molecules from a variety of biochemical classes in rapid and reproducible manner.The high-throughput nature of NMR makes it an ideal tool for metabolomic profiling allows the biomolecular fingerprints of large sample sets to be reliably assessed in a relatively short time window.Previous work has characterised the metabolomes of lettuce leaves by 1 H and 13 C NMR (Sobolev et al., 2005) and comparisons drawn between similar varieties (Sobolev et al., 2010, Lanzotti et al., 2022).In addition, 1 H NMR spectroscopy has been applied to identify differences in genotype, and the impact of environmental conditions on the metabolome of various crops, including tomato (Mattoo et al., 2006, Schauer et al., 2006), bell pepper (Villa-Ruano et al., 2018) maize (Piccioni et al., 2009) and tea (Rubel Mozumder et al., 2020), and shown to have utility in distinguishing critical developmental and physiological differences in a wide variety of foods (Mannina et al., 2012).This approach has the advantage of minimal required sample processing, reducing the degradation of unstable molecules, and permitting a broad range of structurally diverse compounds to be assessed within a single sample, broadening an untargeted approach to metabolomic studies.
The metabolic pathways for synthesis of sugars, polyphenols, and sesquiterpenoids are well studied, however, it is currently not known which stages in metabolism, catabolism, translocation, or utilisation have the greatest overall effect on metabolite concentration in lettuce at a commercially relevant developmental stage.We used a wide cross of L. sativa cv.Salinas (a commercial variety) and L. serriola accession UC96US23 (a wild accession) a mapping population shown to have high variation in flavour and nutritional components.To establish stability in different commercially relevant environments, we explored matched a conventional field trial with standard fertiliser application with a lower nitrogen trial to simulate reduced fertiliser inputs, and a controlled environment to simulate indoor farming.In this study, QTL analysis has been used to identify genomic regions that have the greatest influence on quantitative metabolite traits.Subsequent finemapping to refine the locus can be utilised to elucidate the genetic basis underlying the trait (Korstanje and Paigen, 2002).

Plant growth, and processing
F 9 recombinant inbred lines (RILs) were supplied by the Michelmore lab (Genome Center, UC Davis, USA) and 102 RILs plus their parents, L. sativa cv.Salinas and the wild L. serriola UC96US23, were propagated by Tozer Seeds (Cobham, UK).For these studies, one trial of the whole population was grown in a controlled environment rooms (Weiss Gallenkampf, Loughbrough UK) at The University of Reading.Temperatures were kept at a constant 25 • C, 16 h day length, with light level of 170 µmol.m 2− 1. s − 1 and humidity of 60 %.Plants were harvested after 49 days, at a mature, commercially viable, stage prior to floral transition.Six plants of each line were grown from which four representative plants were sampled to account for and remove lost and nonrepresentative plants.
Two field trials (high N and residual (low) N) were grown at the trials site (Tozer Seeds, Cobham, UK) in four randomised blocks for each treatment; high nitrogen plots were fertilised to 180 kg N.Ha − 1 , while low nitrogen was the residual levels available on the plot, which averaged 50 kg N.Ha − 1 .The residual N plots represent the minimum recommended by RB209 fertiliser guide (MAFF, 2000), while the high N plot represents an over-fertilisation of around 50 kg N.Ha − 1 compared to the accepted maximum value recommended for growers.Six identical plants were grown per plot, and four most representative plants of each genotype were harvested.Field trials were harvested after 39 days.
For all trials all aerial parts were immediately flash frozen in liquid N, and subsequently lyophilised and milled to a powder at Rothamsted Research UK.Four plants from each plot were homogenised, and each of the four blocks from each environment were treated as an independent replicate for further analysis.

1 H NMR analysis
Lettuce leaves were lyophilised and 20 mg of this material was suspended into 1 ml of NMR buffer (containing 0.05 % of the internal standard trimethylsilypropionate [TSP]; w/v 80:20 D 2 O:CD 3 OD) (Sigma-Aldrich MO, USA) and mixed thoroughly.The solution was heated to 50 • C for 10 min in a waterbath, centrifuged at 6,000 x g for 5 min, and the supernatant was collected.The supernatant was subsequently heated to 90 • C for 2 min in a waterbath and rapidly cooled.Before analysis the samples were centrifuged for 5 min at 15 500 x g and the supernatant was assessed once by Bruker 700 MHz spectrometer (MA, USA) equipped with a cryoprobe for enhanced sensitivity.For each sample, a one-dimensional 1 H NMR spectrum was acquired using a standard pulse sequence with water peak suppression (recycle delay-90 • -t1 -90 • -tm -90 • -acquisition), 90 • pulse of 12 µs.The recycle delay was set at 2 s and 100 ms mixing time.For each lettuce sample, 128 scans were collected after 8 dummy scans into 64,000 data points.A spectral width of 20 ppm was used and acquisition time per scan was 3.41 s.Each homogenised lettuce sample was analysed once, and were resampled if data collection failed.
All spectra were calibrated to the internal standard TSP at δ 0.00 ppm, and baseline and phase distortions were manually corrected using Topspin 2.1 software (Bruker, MA, USA).In-house spectral libraries, online resources, and previously published data was used to assign discriminatory peaks arising from the statistical models and metabolites of interest were identified by measuring the 1 H NMR spectra of purified standards where available (Sessa et al., 2000, Sobolev et al., 2005, Wishart et al., 2013).A complete list of identified peaks is provided in Supplementary Table S1.Redundant signals arising from the water and TSP in the samples were excised from all spectra.The remaining spectra were manually aligned, normalised using a total area approach, and peaks of interest were integrated using in-house Matlab (MA, USA) scripts.In addition, 2D NMR experiments were performed on the parent lines from the controlled environment trial to inform metabolite identification ( 1 H-COSY, 1 H-TOCSY and 1 H-DIPSI) using the same instrument and conditions as reported above.Whie this analytical method limits cannot accurately give absolute concentration of metabolites, the relative values are as powerful for QTL analyses.
This processed spectral data was used to construct principal components analysis (PCA) models and orthogonal projection to latent structures-discriminant analysis (OPLS-DA) models in Matlab using inhouse scripts.For the OPLS-DA model, lettuce species (L.sativa v L. serriola) was used as the response vector and the predictive performance (Q 2 Y) of the model was assessed using a seven-fold cross-validation approach.The significance of this predictive performance was evaluated using a permutation approach (1,000 permutations, P<0.05).
For the OPLS-DA model, the covariance of each spectral datapoint with class (L.sativa v L. serriola) are plotted and the colour indicates those datapoints that are significantly associated with class (red, P<0.05; black not significant).

Isolation of sesquiterpenes for use as standards
Sesquiterpenes which could not be accessed commercially were isolated chromatographically from L. serriola, to provide reference spectra for NMR spectroscopy.Spectral data for STLs is shown in Fig. S1.Plants were grown under glasshouse conditions at The University of Reading (UK) until initiating floral transition, when they were cut on alternate days and the latex harvested over two weeks.The extracts were prepared by suspending 1 ml of latex into 10 ml of ethyl acetate (SLS, Nottingham, UK), vortexing to mix, and centrifuging at 6 000 x g for 5 min to form a pellet.The supernatant was dried and resuspended into 1 ml of DMSO.Samples were separated by Prep-HPLC using 100 % MeOH solvent, using a phenomenex 'Prodigy' 5 µm ODS-3 100 Å, 250 x 21.2 mm column (Phenomenex CA, USA), and fractions collected manually according to the retention time of the peaks previously identified by HPLC-MS.All peaks were subsequently run by HPLC MS as described in Chadwick, Gawthrop et. al (2016) to confirm the identity and purity of the sesquiterpenoid standards.These were dried completely and resuspended in 1 ml of NMR solution (0.05 % TSP w/v 80:20 D2O:CD3OD) to allow for comparison with sample data.

QTL analysis
MapQTL 6 (Kyazma, Netherlands) (van Ooijen, 2011) was used to conduct QTL analysis in all cases.A linkage map generated by the Michelmore lab (UC Davis, USA; (Truco et al., 2013) was used to map the QTL.Predicted means were calculated for QTL mapping using Reduced Maximum Likelihood (REML).Logarithm of odds (LOD) threshold was determined for each trait by permutation testing (1000 permutations) to determine a threshold of statistical significance (p = 0.05).QTL were initially detected by interval mapping (IM) using a regression model.Significant markers were selected and used to inform automatic cofactor selection.Cofactors were taken into account for a final multiple QTL mapping (MQM) to confirm QTLs.

Candidate gene identification
Candidate genes were identified by consultation of KEGG (Kyoto Encyclopaedia of Genes and Genomes) metabolic pathways for sesquiterpenes, phenolics and sugars.Gene sequences encoding the transcription of these enzymes were taken from Arabidopsis thaliana, where possible, otherwise closest phylogenetic relative to lettuce was used.In some instances, sequences from fungi and microbes were explored where no plant relative could be found with the equivalent enzyme.

Mapping of synthesis genes
In an automated procedure using Perl script all sequences were BLASTed against the full draft L. sativa sequence (Lsat.1.v3.ScaffoldSequence) as the reference sequence, and aligned to the best matches using Exonerate aligner (Slater and Birney, 2005).Sequences were returned if they met the following stringency tests; 10 % of the full sequence, minimum allele frequency of 30 %, maximum rate of missing data of 30 %, minimum rate of segregation of 30 % and introns of less than 3000 bp.The maximum number of results returned per sequence was limited to the five strongest matches.Where several candidate gene identities aligned to the same sequence, the sequence was tentatively identified as the candidate gene with the strongest alignment.SNPs were identified within the mapping population by consulting sequence data for the RILs, and the markers were aligned to the existing genetic map using JoinMap 4 (Kyazma, Netherlands).

NMR as a means of profiling the metabolome
From 285 independent NMR signals, 36 compounds were identified (Supplementary Table S1).Identified compounds of interest included amino acids, sugars, fatty acids, and polyphenols.Patterns of compound concentration were consistent with data from previous investigations using NMR and chromatographic methods (Sobolev et al., 2005).
Sesquiterpene lactones standards were isolated from lettuce midrib latex and purified by HPLC.Lactucin, lactucopicrin, and 8-deoxylactucin-15-oxalate were all identified and verified by mass spectrometry (Supplementary Fig. S1).Standards were individually assessed by NMR spectroscopy.Despite being distinct in isolation, common structural backbones meant they could not be individually identified in a complex sample, with the exception of 15-p-hydroxyphenylacetyllactucin-15oxalate.For this reason, all other STL data reported were collected identified and quantified by HPLC-MS.

Variation in metabolite composition between species and growing environments
The parents of the mapping population, L. sativa cv.Salinas and L. serriola UC96US23, were selected to capture a wide interspecific genetic variation and to include a commercially relevant genotype.From the PCA model constructed on the full spectral profiles of both genotypes grown in different conditions, clear species-dependent metabolic variation was observed.Interestingly, genotype-specific variation was greater than that dependent on growing conditions, predominantly seen in separation along PC1 (capturing 37 % of variation) (Fig. 1A).As species-dependent variation was greater than environmental-induced variation, an OPLS-DA model was constructed on the L. sativa (n = 7) and L. serriola (n = 7) profiles grown in all conditions.A significant model with strong predictive performance (Q 2 Y=0.69; P=0.001; Fig. 1C) was obtained.This showed that L. serriola contained greater amounts of sucrose, chlorogenic acid, chicoric acid, and pheophytin a, while L. sativa contained greater amounts of glucose, fructose, Metabolite integrals extracted from the NMR data were compared to investigate the impact of growing conditions across the species.This difference between the genotypes is substantially less under field conditions than in the controlled environment trial (Fig. 2), but statistically significant differences remain between the parental genotypes, particularly, 1.35 (p = 0.03) and 2.38 (p = 0.021) fold higher fructose concentrations for L. sativa observed in the low and high nitrogen trials respectively.In low nitrogen conditions L. serriola accumulated more glucose than L. sativa, and lower fructose and glucose.
Sucrose was found to be the most abundant sugar in all environments and the two field environments led to greater accumulation of sucrose, fructose, and glucose in both species compared to controlled environment (Fig. 3).Fructose accumulated at 7.97 fold higher concentration in L. sativa under controlled conditions compared to the wild species.
L. serriola had higher concentrations of phenolics than L. sativa (Fig. 1A, 1C; 5-8 ppm), particularly caftaric acid (2.6-6 fold, chicoric acid (1.5-5 fold) and chlorogenic acid (1.9-4.5 fold), which are the most abundant phenolic acids in lettuce.Caftaric acid accumulated up to 6fold greater concentrations in the wild parent compared to the domesticated cultivar.Chicoric acid levels were highest in the controlled environment, for L. serriola, but only 25 % as high in the controlled environment for L. sativa compared to the two field trials.Caffeic acid, ocoumaric acid and chlorogenic acid levels were highest in the two field trials.Phenolic concentrations were amongst the most variable of the molecules assessed, RIL 11 is shown to be an outlier with high levels of chicoric, chlorogenic and caftaric acid, though RIL 103 had a higher total content of phenolics based on the common peak at 6.92 ppm.RIL 54 was consistently containing the lowest levels of these compounds.L. serriola was consistently higher than L. sativa in the case of all phenolic acids analysed (Supplementary Table S3).
Concentration of organic acids in the two parent types was similar; malic acid remained stable across all conditions, formic acid and fumaric acid concentrations increased within the field trials relative to the controlled environment (CE), with the greatest environmental influence coming from high nitrogen conditions (Fig. 3).Pheophytin (a form of chlorophyll a) was nearly 10 fold higher in L. serriola than its domesticated relative under controlled conditions, and around three times higher in field conditions which is consistent with the darker green colour observed in the wild species.However, whereas L. sativa increased pheophytin accumulation in the field compared to controlled environment L. serriola did not, perhaps indicating an adaptive capacity of the former to respond to light conditions that enable more photosynthesis to take place.
A total of 11 essential amino acids were identified (Tables 1, 2).The largest differences were observed for isoleucine, which was 4.2 x higher in L. sativa, and glutamate, which was up to 2.42 x higher in L. serriola, under low nitrogen conditions compared to the controlled environment.Overall, total amino acid content was similar between the genotypes, indicating differences in profile arise from alternate pathways of protein synthesis rather than the overall capacity of the system.Amino acids glutamine and alanine along with choline were amongst the least variable molecules assessed, though only in the controlled environment (Supplementary Table S3).

Phenotype variation within the mapping population
A PCA model built on the full spectral profiles of the progeny populations showed separation of the controlled environment trial from the two field trials along the second principal component (PC) (capturing 14.5 % of the variation) (Fig. 1B).From the loadings for PC2 (Fig. 1D), the metabolite features explaining this separation were higher amounts of sucrose, inulin, and glucose in the controlled condition and lower amounts of branched chain amino acids, succinate, choline and malic acid compared to the residual/high nitrogen trials.Across the mapping population sucrose variation was highest in the controlled environment (a factor of 4 times more concentrated in RIL 57, the most abundant in sucrose, compared to the least concentrated, RIL 61.Variation was lowest in the high nitrogen trial where RIL 56 was 2.7 times higher than RIL 26 (Supplementary Table S3).By contrast glucose variability was highly plastic in the field trials.Variance was lowest in the controlled environment where the parental genotypes L. sativa and L. serriola were highest and lowest respectively and no transgressive segregation was observed (Supplementary Fig. S2).Most variance was observed in the residual nitrogen trial, where glucose concentration was over four times higher in RIL 58 than low glucose RIL 3 which had the lowest glucose content of any RIL across all trials.Glucose abundance was consistently low in the wild parent L. serriola and the lowest across the population in the controlled environment trial.Inulin was functionally absent in the residual nitrogen trial for some RILs, and was on average nine times lower than either other trial.Only RIL 123 accumulated higher inulin in these conditions, over double the next highest of RIL 15.
Genotypic segregation was observed across the metabolome.Parental lines typically represented extreme phenotypes in the context of the mapping population, although varying degrees of transgressive segregation was observed for most metabolites, particularly in the field trials (select data are shown in Supplementary Fig. S2).The controlled environment gave rise to the greatest spread of variation across the population.This indicates genetic plasticity across the population.ANOVAs performed on log 10 transformed data showed that there was significant genetic variability for all the chemotyping traits (data not shown; p < 0.05), indicating that 1 H NMR derived datasets were suitable for further QTL analysis.

QTL mapping
A total of 187 QTL were obtained.Of these, 63 unique QTL were associated to 29 identifiable compounds plus total antioxidant activity.QTL hotspots appear on chromosomes 4, 7 and 9 (Table 1 and Fig. 4).
The co-locating QTL for fructose on linkage group (LG) 9 at 185 cM was identified in all three growing environments, accounting for 22 % (high nitrogen) to 36 % (controlled environment) of the total variation for this trait.Sucrose also maps to the same locus, implying a genetic control early in sugar metabolism underlying this region.The glucose QTL at linkage group 4 at 165 cM, which is present from the high nitrogen trial and is driven by the L. serriola allele, co-locates with a QTL driven by L. sativa from the controlled environment.An additional QTL for glucose, with a higher concentration associated with the L. serriola allele, appears on linkage group 5 at 155 cM in both high and low Fig. 3. Fold change in selected compounds between the controlled environment and the two respective field trials for each of the parents.Scores for each parental genotype was derived from averages of all NMR run peak areas.Shown is the fold change of each compound identified in each the low (residual) nitrogen trial and the high nitrogen trial, relative to the controlled environment, for each genotype.Bars to the right of centre represent higher concentration of metabolite in the field trial relative to the controlled environment, bars to the left of the centre show lower concentrations of metabolites in the field trial relative to the controlled environment.

Table 1
Selected QTLs detected by MQM mapping for all metabolic traits in the RIL mapping population of three environments.Peak ppm gives the precise peak isolated for these QTL from a number of potential candidates.Linkage group represents the chromosome number to which the QTL corresponds.C3A is part of chromosome 3, but where the two parts cannot currently be linked by existing markers.All distances (Marker Position and QTL interval) are given in cM.QTL interval is the area in which LOD score is within 2 of the peak value, and represents the extent to which we can be confident to find a QTL.LOD is Log of the odds score.Variance indicates the percentage of phenotypic variation within the population can be explained at by that QTL.nitrogen field trials, accounting for 14.8 % and 15.9 % of the population variation respectively.Interestingly, this locus co-located with a QTL for 15-p-hydroxyphenylacetyllactucin-8-sulfate derived from NMR data and which also was driven by the L. serriola allele and for multiple phenolic acids in the high nitrogen trial.Additional QTL for 8-deoxylactucin-15-oxalate and 15-p-hydroxyphenylacetyllactucin-8-sulfate were identified from HPLC data, on linkage groups 3A and 4 respectively.These accounted for 23-31 % of variance in the population.
A QTL for total antioxidant activity, accounting for 15.8 % of population variation, co-locates to the C9 133 cM chicoric acid and is close to a chlorogenic acid QTL at 143 cM, implying this region is important for the accumulation of these antioxidant molecules in wild lettuce.The chicoric acid QTL on C9 accounts for 21.2 % of variation and the chlorogenic acid QTL locates to 143 cM on LG9 accounting for 30.3 % of variation.In each case the was a higher concentration of metabolite in L. serriola.The controlled environment revealed the most QTL.

Identification and mapping of candidate genes
A total of 403 distinct protein sequences were extracted from KEGG and reverse translated into transcriptome sequences.Sequences related to sugar (293), polyphenolics (44), and sesquiterpene synthesis (64) pathways were screened, with 136 sugar, 49 polyphenolic and 51 STLs assigned an identity and by using BLAST, matched to scaffolds of the v3 lettuce genome (Truco et al., 2013).These were added to the existing marker map using Kyazma Join Map 4 software to identify possible colocation with existing QTL.Six genetic sequences were identified which aligned with related QTL (Table 2).Of particular interest were a sequence for squalene monooxygenase, on Lsat.1.v3.g.5.228; a gene which catalyses the oxidation of the triterpenoid squalene (EC 1.14.14.17), mapped to linkage group 5 at 156 cM.This co-located with the only sesquiterpene QTL generated from NMR, that for 15-p-hydroxyphenylacetyllactucin-15-sulphate from the low N population (NMR peak 6.835 ppm).
After quality checks 293 sugar sequences, 136 SNP containing sequences were added to the marker map.A sequence from Arabidopsis identified as L-iditol 2-dehydrogenase (Last.1.v3.g.2.145 (EC 1.1.1.14)),involved in metabolism of various sugars, mapped to linkage group 2 at 169 cM, to the same location as a QTL for sucrose (Fig. 4.) indicating that this is a potential dehydrogenase enzyme responsible for sucrose variability in the mapping population.This QTL was identified in high N conditions, and showed higher levels of the compound were driven by the L. serriola genotype and could be a way to increase palatability in domesticated lettuces.
A candidate gene mapping to the same location as a relevant QTL was for sucrose-phosphate synthase (EC 2.4.1.14),which synthesises sucrose-6-phosphate from UDP-glucose.This mapped to linkage group 6 at 152 cM and to a QTL derived from NMR peak 3.543 which was represents both a sucrose and a fructose peak overlapping.This QTL was responsible for increased sucrose/fructose with the L. sativa genotype and was only identified within the controlled environment.
Additionally, a gene sequence derived from the protein sequences of a cellulose synthase (EC 2.4.1.12),which converts fructose to cellulose, mapped to linkage group 9 at 3.9 cM.This co-located to fructose QTL (NMR peaks 3.573 ppm and 3.538 ppm) from the controlled environment which resulted in higher concentrations of fructose in L. sativa.
A sequence identified as flavonoid 3′,5′-hydroxylase (EC 1.14.13.88), by ourselves and Zhang (2018) mapped to linkage group 5 at 152 cM.This enzymatic class is involved in several locations within the flavonoid metabolism pathway (Kanehisa and Goto, 2000), including anthocyanin and phenolic acid production.The sequence underlies QTL for multiple phenolic acids (NMR peaks 7.253 ppm, 6.425 ppm and 6.452 ppm; Supplementary Table S2).However as these peaks represent protons in the phenolic backbone, it was not possible to identify which individual metabolite(s) are represented.This QTL was found the low N population only, and showed higher levels of the compound were driven by the L. serriola genotype.This was the only candidate gene found to underlie our QTL for phenolic compounds.QTL analysis of the population identified three genetic hotspots for phenolic traits, one on linkage group two at around 133 cM (putative caffeoyl-CoA methyltransferase, Lsat.1.v3.g.2.1786), and on linkage group 4 at around 50 cM (Flavone synthase, Lsat.1.v3.g.4.815), while the third includes the flavonoid 3′,5′hydroxylase candidate gene locus on linkage group 5 centred about 145 cM (Lsat.1.v3.g.5.3110) (Supplementary Table S2).

Variation between population parents
The main differences between the parental genotypes was found to be a substantial change in the concentration of sugars, especially with regards to fructose, which is has the highest relative sweetness of the

Table 2
QTL which map to candidate genes.Candidate genes sequences from KEGG pathways were added to the existing marker map. 5 QTL co-located to these candidate genes when they were remapped.Map distances changed due to the mapping algorithm used when updating the marker maps, so for comparison, the flanking markers in the original marker map are included.In each case the QTL which maps to the candidate gene also maps to the corresponding markers in the original map.sugars (Pangborn, 1963).This is a consequence of conventional breeding techniques being used to select more palatable lettuce from the wild progenitor and increase fructose content by 9 x in the domesticated genotype.Phenolics, however, have selected against during the domestication process, typically on the basis of their acidic and bitter flavour (Drewnowski and Gomez-Carneros, 2000) and lack of a clear phenotypic benefit to conventional breeders.Modern consumers are more aware of the health benefits of eating vegetables, which polyphenols help to provide (Spencer, 2009, Mubarak et al., 2012, Lima et al., 2014, Fraga et al., 2019, Aravind et al., 2021) and so it is possible that, as breeding programmes develop to reflect consumer desire for enhanced nutritional traits in their food crops, the loci and markers associated with phenolic compounds will become commercially important.
Sub-optimal supply of nitrogen caused a strong trend for the domesticated lettuce phenotype to converge with that of the wild type, resulting in a narrower range of concentrations for each metabolite studied.The domesticated parent appears to be more affected by the nitrogen availability than L. serriola, as the metabolic profile of L. sativa is substantially changed between high and low nitrogen environments, in contrast to the wild parent which showed very little impact of growing in different concentrations of nitrogen.From this we can infer that the domesticated plant is better adapted to exploit commercial growing conditions, but retains some plasticity when challenged with adverse conditions.The difference between the chemotype profile seen in the controlled environment compared to field conditions in both species highlights the usefulness of utilising multiple trial conditions to promote identification of breeding markers that are stable in different

Variation between environments
Differences in absolute amounts of a metabolite between genotypes frequently contrasted with the capacity of a genotype to display phenotypic plasticity in response to different growing environments.We observed greater sugar accumulation in L. serriola when it was exposed to the field conditions in comparison to the controlled environment, notably fructose content was increased by nearly 6.5x in the low N trial.Glucose and sucrose concentrations, while higher in the domesticated lettuce under most conditions, were higher in the wild parent genotype under low N.The rationale for this is not fully understood, but it may be that the wild species retains the capacity to adapt to more variable conditions or may represent that the highly domesticated lettuce has been selected for adaptation to a particular growing environment in the case of sugar accumulation.However, L. sativa showed the greatest plasticity across the three different environments in terms of polyphenol and pheophytin content, which were both much more abundant in field trials than in the controlled environment trial, indicating greater level of environmental adaptability.These data indicate that the domesticated plant is still capable of reverting back toward a more resilient phenotype when environmental pressure makes this beneficial.The low N trial also led to much higher concentrations of the phenolic acids, which act as antioxidants and herbivory defence molecules in plants and therefore are produced under stress, in both parental genotypes (Treutter, 2006, Skłodowska et al., 2011, Mithöfer and Boland, 2012).

Phenotype variation across the mapping population
PCA analysis of the populations showed that there is a great deal of overlap between the two field grown trials, despite extreme changes in nitrogen application, while the controlled environment clustered separately.This is a consequence of the field trials exposing the plants to conditions that are more similar to each other than to the controlled environment.Variation within the trial populations as a result of genetic variation shows a range of chemotypes largely within the range of the parental chemotypes, with some variation extending beyond this as the result of unmasked genes.This concept of transgressive segregation is critical for the breeding of beneficial traits into new cultivars as it relies on adding genetic information from the wild parent.The data presented in this paper shows that there is the genetic potential to identify QTL and associated markers which can be used as the basis for breeding more extreme beneficial traits into the already heavily domesticated L. sativa cultivar.Some peaks appear to be linked without any known relationship between the compounds, for example linkage group 5 at 155-160 cM there are QTL for glucose, STLs and phenolics.This may result in genetic linkage, therefore it is useful to see correlations between concentrations of functionally unrelated compounds in terms of breeding as this may cause undesirable phenotypes to be inherited with the beneficial trait.Zhang et al. (2007) suggested that hotspots relating to pre and postharvest quality traits may be due to related metabolic regulation.The present analysis has shown that groups of metabolites linked to the synthesis or regulation of the same class of compound cluster together in hotspots.We have observed this with the clustering of phenolic acid QTL and with the clustering of sugar QTL, both on linkage group 9.This indicated that the locus harbours a gene or genes regulating early stages of the metabolic pathway, prior to its divergence into the synthesis or catabolism of individual metabolites within the broader group.This population represents a wild parent, and a domesticated parent which has previously undergone significant breeding for agricultural traits and consumer traits.It is therefore understandable that some regions contain more QTL than others.Clusters were identified on LG1, LG3, LG4 and LG7 and LG9.Clustering of QTL is important with regards to breeding due to pleiotropic effects in these regions.Ideally adoption of a marker underlying a QTL cluster of interest could lead to the adoption of a number of desirable effects (Li et al., 2022).Finemapping opens up the potential to identify genes underlying QTL, improving our understanding of the mechanism of controlling absolute concentrations of compounds in a species.

Metabolite QTL
A number of QTL were found relating to nutritional and taste traits.QTL for sugars, and bitter sesquiterpene lactones have been identified.Additionally, we identified QTL for numerous acids, which can increase perceived sweetness through mixing interactions with bitter tasting compounds by complementing sweetness (Drewnowski, 2001) were discovered.A QTL for FRAP antioxidant potential was found, which colocates to QTL for chicoric acid, a phenolic acid and known antioxidant.The primary location for all these traits is on LG 9 in two major hotspots, around 135 cM and 185 cM.We identified two QTL regions where the wild parental allele caused higher levels of glucose (Lg 5 155 cM and Lg 4 167 cM), accounting for around 15 % of the phenotypic variation within the population each.Each of these regions presents potential to increase sugars and improve the taste of existing lettuce cultivars, while increasing phenolic acid levels may compliment the sweetness of commercial cultivars while providing a good source of antioxidants, which are related to improved health (Selma et al., 2009, Wootton-Beard and Ryan, 2011, Del Rio et al., 2013).Preferential breeding for sugar content as a major factor of taste has caused a higher sugar content in the L. sativa parent when compared to the undomesticated L. serriola parent.By comparison, phenolic acid and fatty acid concentration was observed to be higher in the L. serriola genotype, indicating that these compounds may have been selected against.Amino acid concentration varied between amino acids as to which parent contributed more abundance of a particular amino acid.Phenylalanine, leucine, glutamine, and aspartate concentrations were higher in the wild parent, while accumulation of valine, and isoleucine was associated with the domesticated genotype.It is unclear how this change in amino acid profile affects the ability of the plants to tolerate different conditions, though some polyphenols are derived from amino acids via the shikimate pathway (Hisaminato et al., 2001).Hence there may be differences in the ability of a plant to synthesis secondary metabolites such as beneficial phenolic from the amino acid precursors, particularly in nitrogen limited conditions.
Of particular importance to breeding goals was the hotspot in the low nitrogen trial at linkage group 5, at 155 cM.This region contained a series of QTL driven by the L. serriola allele.QTL located here were for glucose, multiple phenolic acids, including and chlorogenic acid and 15p-hydroxyphenylacetyllactucin-15-oxalate.The STL 15-p-hydroxyphenylacetyllactucin-15-oxalate is not believed to make any contribution to taste perception and is one of the few non-bitter STLs (Chadwick et al., 2016).The combination of a glucose QTL and an STL QTL which is not believed to affect taste is of great utility, especially as candidate genes underlying these QTL were identified.These findings offer a clear opportunities for breeders to use marker assisted breeding to generate novel cultivars which taste better and which have enhanced nutritive qualities compared to the Salinas cultivar from which the mapping population used in this study was derived.

QTL x environment Interaction
Three environments were investigated; an indoor controlled growth chamber, and two field trials grown simultaneously where plots were either high or low nitrogen.One major QTL hotspot for fructose at linkage group 9, which also influences sucrose, remained stable in all environments and which is therefore a useful region from which to develop breeding markers for this trait.Other QTL for sugars, and also QTL for phenolic acids were stable over two of the three trials.Of the QTL regions most were not stable across the environments, implying a strong environmental factor was influencing most traits.This can also be seen from the metabolomics data, with broad ranging differences in metabolite profile across the different environments.
120 QTL, approximately two thirds of the total found in this study, were identified from the controlled environment where biotic and abiotic stresses were minimised.As the phenotypic variation was most limited under the stress of the low nitrogen condition this environment gave rise to the least QTL due to the environmental factors taking precedence over genetic factors.

Associating candidate genes with existing marker map
Candidate gene analysis using potential genes of interest directly as markers has utility for rapidly identifying the genes which underlie QTL, by highlighting sequences under the QTL which encode enzymes which are involved in synthesis of such molecules.
Mapping candidate genes in this way will expose the stages of metabolic pathways which have the greatest regulatory impact on final concentration of metabolites within the mapping population, provide powerful potential markers for plant breeders, and lead to better understanding of metabolite biosynthesis.Identifying strong candidates can speed up breeding and help us to understand the pathways being tailored.In the scope of this study we were unable to identify all genes in the region, relying on information from characterised biochemical pathways, and consequently were not able to identify all genes underlying some major loci.
Several of the major QTL hotspots such as that of fructose at LG9 183 cM, chicoric acid on LG9 133 cM and chlorogenic acid at LG9 143 cM did not have candidate genes associated with them.The genes which underlie these loci may be involved in regulation of the major synthesis pathway, with enzyme modulators accounting for only 5 % of the total number of genes (Mi et al., 2013), and a series of other regulators, promoters, and transcription factors may instead lead to the segregation within the population being derived from this locus.RNAseq offers the opportunity to investigate genetic regulators outside the transcription of enzymes involved in metabolite biosynthesis.This technique was nonetheless useful to identify genes of interest underlying some QTL.

Mapping of sugar synthesis genes
The most significant fructose QTL was observed in all three populations on linkage group 9, 183 cM.However, this did not co-locate to any of the candidate genes identified.We can be confident that the location of the QTL is correct owing to its high LOD score and stability across all populations and differing analysis techniques.The sugar candidate genes which did map underneath the QTL were for sucrosephosphate synthase, cellulose synthase, fructose-1,6-bisphosphatase I, and L-iditol 2-dehydrogenase.In the case of L-iditol 2-dehydrogenase and fructose bisphosphatase aldolase I, these genes both mapped under sucrose QTL on linkage group 2, L-iditol 2-dehydrogenase is an oxidoreductose, which converts sorbitol to fructose, is involved in the metabolism of other pentoses (Kanehisa and Goto, 2000).Fructose-1,6 − bisphosphatase and fructose-2,6 − bisphosphatase have previously been implicated in sucrose metabolism (Rufty and Huber, 1983, Stitt and Heldt, 1985, Cho et al., 2012), despite ostentatiously being involved in fructose metabolism; therefore this appears to be a suitable candidate for the driving segregation between the types of sugars at this locus.It is interesting to note that in this population the wild type, which could be expected to have a lower sucrose content due to selective breeding of the domesticated variety favouring sugar production, was instead driving the increase in sucrose concentration at this locus.This can be accounted for understanding that this gene is responsible for converting sucrose to starch, and therefore it is likely that the higher sucrose concentration is a result of lower starch stores in the wild type, driven by a selective pressure for domesticated plants to keep greater store of starch.
Other candidate genes were the sucrose-phosphate synthase which co-locates to the QTL of a peak representing both fructose and sucrose, on linkage group 6, 158 cM, and cellulose synthase on linkage group 9 3.9 cM which is a glycosyltransferase involved in generating cellulose from UDP-glucose.While the cellulose synthase gene co-locates to a QTL for fructose rather than glucose, there may be a similar modification in the fructose pathway.

Mapping of flavonoid synthesis genes
The only flavonoid candidate gene sequence to map to a QTL was the oxidoreductase flavonoid 3′,5′-hydroxylase.This is involved in a number of reactions with nine separate occurrences shown in KEGG pathway, and co-locates to three peaks representing a proton present in multiple phenolics, one for chlorogenic acid and another for caftaric acid.This appears to be a very likely candidate for the gene underlying the QTL as it should be expected to catalyse modifications in a range of compounds, and this was observed in the QTL hotspot.Similarly to the major fructose QTL on linkage group 9, neither of the two linkage group 9 QTL for chicoric acid and chlorogenic acid co-located to any of the candidate genes and, therefore we reason that this is most likely the result of a regulatory gene in this location rather than an enzymatic regulatory step directly involved in metabolism of the phenolic acids.Other genes involved in flavonoid biosynthesis spread across all chromosomes, with highly concentrated regions on linkage group 2 at 113 cM, LG4 at 50 cM, LG5 at 90 cM, which imply heavy co-inheritance of genes at these locations.However, none of these were responsible for segregation within our mapping population.

Mapping of sesquiterpenoid synthesis genes
One candidate gene sequence mapped to a QTL for a sesquiterpenoid.This was the squalene monooxygenase gene sequence AT1G58440 which mapped to NMR peak 6.835 in the low nitrogen trial.This peak was attributed to the sesquiterpene lactone 15-p-hydroxyphenylacetyllactucin-8-sulphate.Squalene monooxygenase is an oxidoreductase thought to be a rate limiting step in sterol biosynthesis in which it is involved (Ma et al., 2007).Squalene is a triterpenoid in a separate pathway to sesquiterpene but also derived from farnesyl pyrophosphate, a precursor to sesquiterpene backbone synthesis.This appears to be a very likely candidate for the gene underlying the QTL as it could be expected to catalyse modifications in a range of terpenoids including those found in lettuce, and while squalene is not present in lettuce, a sesquiterpenoid oxidoreductase would likely have high homology to this triterpenoid oxidoreductase.A sesquiterpenoid oxidoreductase with high homology to squalene monooxygenase is likely the gene influencing this QTL.Only two other QTL for sesquiterpene lactones could be identified in the population, both from HPLC data derived from the controlled environment trial but no candidate genes aligned to either of these QTL.

Conclusion
We demonstrate that NMR spectroscopy can be an effective means of returning metabolomics data for the purpose of QTL mapping.The speed and completeness of the untargeted NMR approach is sufficient to generate robust QTL for compound classes, if not specific compounds.This approach allows QTL mapping of chemotypes applicable to many fields of plant research.We were able to use this method to identify range of QTL without losing the fidelity necessary to direct effective plant breeding.
We identified a significant QTL for fructose on chromosome 9 which was present across all growing conditions, co-locating with a QTL for polyphenols.This QTL accounted for 36 % of variation in fructose across our population, all identified QTL in this population can account for 80 % of the variance in the controlled environment.These QTL have use as breeding targets to generate lettuce varieties which have a sweet taste that appeals to consumer preferences, while maintaining net nutritional benefit.The scope of this study demonstrates that these QTL were stable in response to high and residual nitrogen fertilisation in a UK summer climate, and in controlled environments.QTL stability could not be established for more diverse growing conditions such as in major production areas in Andalucia, Spain and California, USA, nor under novel light programs of modern vertical farming.
We also identified a QTL and candidate gene regulating the sesquiterpene lactone 15-p-hydroxyphenylacetyllactucin-8-sulphate on linkage group 5.This is of particular importance as it has previously been demonstrated to have a lower bitter taste threshold than other sesquiterpene lactones, and represents a suitable target to incorporate into plant varieties to maintain plant defence and health functionality without adversely affecting flavour.This paper provides novel insight into the regulation of key quality traits relating to taste and nutrition in lettuce, with clear candidate genes and breeding targets identified that will enable the genetic improvement of lettuce for the benefit of growers and consumer.

Fig. 1 .
Fig. 1. Biochemical variation in the 1 H NMR spectral profiles of L. serriola and L. sativa.A) Scores plot (PC1 v PC3) from the principal components analysis (PCA) model comparing the species grown under controlled, high nitrogen and residual nitrogen conditions.B) Scores plot (PC1 v PC2) from the PCA model comparing all offspring grown under controlled, high nitrogen and residual nitrogen conditions.C) Orthogonal projection to latent structures-discriminant analysis (OPLS-DA) model comparing the metabolic profiles of L. serriola and L. sativa grown under all conditions (Q 2 Y=0.69; p = 0.001).Coefficient plot indicates how metabolites covary with species with red peaks indicating those that are significantly associated with species (p < 0.05).D) Loadings plot highlighting the metabolic features contributing to the PC1 scores in the PCA model comparing offspring from different growing conditions.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 2 .
Fig. 2. Percentage of L. serriola compound concentration relative to L. sativa in each tested environment.Differences in concentration of each compound identified in each L. sativa and L. serriola in each experimental condition.Plants showed most extreme phenotypes in the controlled environment, with up to ten times difference in metabolite concentration (Pheophytin 9.9 times more concentrated in L. serriola).L. serriola consistently contained more polyphenols than L. sativa.Bars to the right of centre represent higher concentration of metabolite in L. sativa relative to L. serriola, bars to the left of the centre show lower concentrations of metabolites in L. sativa relative to L. serriola.Scores for each parental genotype was derived from averages of all NMR run peak areas.

Fig. 4 .
Fig. 4. QTL map showing all markers.Red bars represent traits within the high nitrogen trial.Green bars represent traits within the low nitrogen trial.Black bars represent traits within the controlled environment trial.Blue bars represent traits within the controlled environment trial which were not identified by NMR but by the appropriate assay.QTL driven by L. sativa are shown as bars labelled with plus (+) symbols before the name and those driven by the wild parent L. serriola are shown with a minus (− ).Map positions are given in cM, listed to the right of each linkage group.QTL are listed by the 1 H NMR peak ppm, and where applicable the identity.Where multiple peaks co-locate exactly they are listed on the same bar; these are typically resultant from multiple NMR peaks corresponding to a single compound.Bars represent 1 LOD interval, with the whiskers representing a 2 LOD interval.
Additive effect indicates which parental allele causes positive change in trait value.Positive values indicate that the domesticated parental allele increased trait value while negative values indicate that the wild type allele increases the trait value.
(continued on next page)