Introduction

The recent interest in selection models that incorporate marker data and kinship to estimate breeding values (Meuwissen et al. 2001) is reviving interest in approaches that allow data for phenotypic traits to be combined and balanced. Multi-trait indices (MTIs) are an approach to select for more than one characteristic. MTIs are expected to be particularly valuable for crops where quality traits are of importance for market success. MTIs based on economic values have been widely used and described in the animal breeding literature (Cottle and Coffey 2013; De Haas et al. 2013; Laske et al. 2012; Visscher et al. 1994). For plants this approach is advocated (Eathington et al. 2007; Merk et al. 2012), but such MTIs are generally not described in the literature. One exception are the models based on net merit that were explored for wheat in order to integrate grain yield, plant characteristics and milling quality with economic weights (Heffner et al. 2011). Trait valuation in the market is lacking for many characteristics considered important for vegetable crops. In the absence of economic data for trait value, an alternative approach may be to develop MTIs based on the historical characteristics of varieties that are proven to be successful in the market.

The performance of processing tomato varieties needs to conform to growers’ and processors’ expectations, the United States Department of Agriculture grade standards (U.S. Department of Agriculture 1990), and consumer preferences. Growers prefer high yield, uniform ripening, long field storage of fruit, resistance to a variety of root and foliar pathogens, and a plant habit suitable for mechanical harvest (Stevens and Rick 1986). Processors require a pH no higher than 4.4 for food safety (Anthon et al. 2011) and prefer high soluble solids content (Brix) to maximize the output of processed product per input of raw product (Nichols 2006). They require paste viscosity and color that meets industry standards (U.S. Department of Agriculture 1990), and the absence of physical damage (Processing Tomato Advisory Board 2003). Consumers purchase based on product appearance, taste and perception of health benefits (Gould 1992). The development of tomato varieties meeting all these expectations is challenging. As many as 50 traits were described as important for processing tomato breeding (Monti 1979), though which of these many characteristics should receive priority in crop improvement is less clear. Many traits are highly affected by the environment, and thus have a low heritability. Examples of such characteristics include the ease of peel removal by steam or lye (“peelability”) (Garcia and Barrett 2006), yield, fruit color, pH, and Brix (Merk et al. 2012). Some traits are negatively correlated such as Brix and yield, which complicates breeding for both traits simultaneously (Grandillo et al. 1999; Merk et al. 2012; Stevens and Rudich 1978). To facilitate the selection of genotypes combining desirable characteristics, the use of MTI was developed early in the history of plant breeding (Smith 1936). Although the use of indices that combine selection criteria weighted by importance has been proposed to classify processing tomato varieties for selection (Merk et al. 2012; Monti 1979), few equations have been described. The product of yield and Brix values (YxB) was introduced as a way to estimate the final quantity of processed tomato for paste (Eshed et al. 1996; Tanksley et al. 1996), and as a way to balance two traits that are strongly and negatively correlated (Grandillo et al. 1999).

In this study, we analyzed historical data in order to evaluate the best combination of phenotypic traits associated with market share and life span of a variety. To accomplish this, we assembled 22 years of quality data from the Processing Tomato Advisory Board (PTAB), and phenotypic measurements collected during the same period by the University of California Cooperative Extension (UCCE) statewide variety evaluation trials. Using a random effect model to combine, and analyze this large historical dataset, we explored patterns of variety use and associated them with specific traits. Our approach makes the assumption that the varieties with the highest use and those maintained in the market for the longest time are the ones that fit growers’ and processors’ expectations. We think that this assumption is consistent with the objective of seed companies which seek to maximize market-share and maintain varieties in the market for as long as possible. Our objectives were: (1) to evaluate the genetic contributions to yield and fruit quality for processing tomato varieties grown in California, (2) to define as a model the traits or combination of traits that correlate with success of tomato varieties in California, and (3) to test the models developed as MTI.

Materials and methods

Data sources and datasets

Two sources of data were used in this analysis. The first was the Processing Tomato Advisory Board (PTAB) statewide reports from 1992 to 2013, retrieved from the PTAB website (Processing Tomato Advisory Board 2013). This dataset summarized results of tests on each load of tomatoes, for each California County, prior to delivery to processing plants in California. The dataset represented 9,254,478 loads of 927 varieties, corresponding to the majority of tomato shipments processed in California during the period considered. The data were collected in 24 different counties. Counties were separated into seven groups based on a North–South gradient. There were two to five counties per group. The groups were, from North to South: (1) Glenn, and Butte, (2) Colusa, Sutter, and Yuba, (3) Yolo, Solano, and Sacramento, (4) Contra Costa, Alameda, and San Joaquin, (5) Santa Clara, Stanislaus, Merced, and Madera, (6) Monterey, San Benito, Fresno, Kings, and Tulare, and (7) Kern, Santa Barbara, Ventura, and Imperial. These groups were considered levels of the location factor for the PTAB dataset. The measurements were collected on two samples of 50 lb (22.7 kg) taken from various parts of a load of approximately 25 tons (22.3 metric tons). Measurements used in this analysis were the number of loads delivered, the soluble sugar content (Brix), fruit color, and pH. Brix was measured with a handheld or digital bench type refractometer with temperature compensation. From 1992 to 1995, color was measured with the Agtron E-5M instrument on homogenized samples. After 1995, a light emitting device (LED) was used (Barrett and Anthon 2008). All colors are expressed as PTAB color scores (Bane 2005). From 2001, pH was measured on raw fruits (Processing Tomato Advisory Board 2003). Measurements for pH were taken for 572 varieties, in 23 counties, representing 5,699,991 loads. The PTAB data were summarized weekly per county, weekly across California, yearly per county, and yearly across California. For our objectives, the most balanced and informative data were the data summarized yearly per county. This dataset also included, for each year, the rank of the varieties based on the tonnage accepted by processing plants in California.

The second dataset was the UCCE statewide processing tomato variety evaluation trial reports from 1996 to 2013. Electronic files covering years 1997–2013 were made available by the California Tomato Research Institute (CTRI). Missing data were filled in using hard-copies of CTRI reports. We transcribed data for each variety and trait by location for observational and replicated trials. Over the 18 years of trials analyzed here, a total of 394 varieties were tested. The data were collected in 13 different locations. These individual field trials were considered levels of the location factor in our random models. Varieties were either tested in each location as a Randomized Complete Block Design (RCBD) with four blocks (replicated trials), or were not replicated (observational trials). The trials were planted in farmers’ fields and were managed with the same practices as used for production. From 1996 to 2001, all trials were directly seeded. From 2002 to 2008, trials were directly seeded or transplanted, depending on location. After 2008 all trials were transplanted. Furrow irrigation was used in all fields from 1996 to 2005. From 2006 to 2008, fields were irrigated by furrow irrigation or by drip irrigation. From 2009 all trials were irrigated with a drip system. Brix, color and pH were evaluated with the same methods as used for the PTAB data. Yield was measured in tons per acre. The data available for each year were Least Square Means (LSMeans) per location for each variety. From 1996 to 1999 and from 2001 to 2003, some varieties had two LSMeans reported for a single location, corresponding to data evaluated from different harvest dates. In the evaluations from 1992 to 1995 only LSMeans for each variety across trials were reported. Therefore, it was not possible to calculate best linear unbiased predictors (BLUPs) adjusted for genotype by environment interactions (described below). These data were not used in our analysis.

Resistances claimed in each variety were extracted from either the UCCE reports or seed catalogues. The average number of disease resistances present was calculated over time across the entire population, and in the varieties ranking in the top 10.

Data were quality checked, with specific attention paid to variety names. Company consolidations, transition of variety status from experimental to commercial, and changes in trial management resulted in numerous varieties being reported with synonymous names. For example Halley, BOS3155, BOS 3155, Halley OS 3155 are variant names for a single variety. Homogenization of names across datasets was performed to insure a unique name for each genetic variety. Entries with names suggesting that the loads contained a mix of experimental varieties were discarded (e.g.: “exp”, “R&D”, “trial”).

Best linear unbiased predictors for the phenotypic data, variance partitioning and heritability

Best linear unbiased predictors (BLUPs) per variety across years and per year across varieties were calculated for each trait for both datasets. The following random model was used:

$$ Y_{ijk} = \mu + g_{i} + l_{j} + yr_{k} + \!g\!:\!l\!_{ij} + g\!:\!yr_{ik} + \!l:yr\!_{jk} + \!g:l\!:\!yr\!_{ijk} + \varepsilon_{ijk} $$

with Yij the value of the phenotypic trait for the ith genotype (gi) evaluated in the jth location (lj), during the kth year (yrk). Interactions were denoted by a column, µ and ɛijk represented the grand mean and the residual, respectively. Analysis was conducted using the R software version 3.01 (R Core Team 2013). The models were fitted with the function lmer of the package lme4 (Bates et al. 2012). BLUPs were extracted from the models using the function ranef within lme4. The variance estimates for each factor were retrieved from the summary table generated by the lmer function. The percentage of total variance explained by genotype, year, location, and their interactions were calculated for each trait in each dataset. The broad sense heritability was calculated using the formula \( H^{2} = \frac{{\sigma_{\text{g}}^{2} }}{{\sigma_{\text{g}}^{2} + \sigma_{\text{gl}}^{2} + \sigma_{\text{gy}}^{2} + \sigma_{\text{gly}}^{2} + \sigma_{\varepsilon }^{2} }} \), with \( {{\sigma}}_{\text{g}}^{2} \), the genetic variance, \( {{\sigma}}_{\text{gl}}^{2} \), \({{\sigma}}_{\text{gy}}^{2} \) and \( {{\sigma}}_{\text{gly}}^{2} \) the variance of the interactions between genotype and location, year, and year by location, respectively. \( {{\sigma}}_{\varepsilon }^{2} \) represents the variance of the error.

The BLUPs calculated for each year across varieties were used to evaluate the trend for each trait between 1992 and 2013 for the PTAB dataset and from 1996 to 2013 for the UCCE trials. The same calculation was done considering only varieties ranking in the top 10 in a given year. Local Polynomial regression (Loess) and moving simple average over five years were used to smooth the plots of the phenotypic traits versus time in order to visualize trends in the data. The same conclusions were derived from the two methods, and Loess results are presented here. The function loess implemented in the R software version 3.01 (R Core Team 2013) was used to estimate the fitted curves and their standard errors. They were visualized using the ggplot2 package (Wickham 2009) in the R software.

Success metrics

Two different variables were used to evaluate success in the market as an estimate of performance: life span and market share. Market share was the percentage of loads produced by a variety during the time it was present in the PTAB data. Life span was the number of years a variety was present in the PTAB data. These values were calculated between 1992 and 2013 across the entire PTAB dataset. Success metrics were transformed with a logarithmic function (loge) in order to evaluate their correlations with other traits and to use them as dependent variable in general linear models.

Combining UCCE and PTAB datasets

We created datasets by combining data for only those varieties with BLUPs for yield from UCCE trials, success metrics from PTAB data, and BLUPs for Brix, color, and pH either from UCCE trials or PTAB data. Two datasets were created. The first one contained 198 varieties with BLUPs for yield, Brix, color and pH from UCCE trials. These data will be referred to as the UCCE set. The second dataset contained 258 varieties with BLUPs for Brix, color, and pH calculated from PTAB reports and yield data from UCCE trials. This dataset will be referred to as the Combined set. There were 196 varieties in common to both datasets.

Correlations between phenotypic traits

Correlations between traits were evaluated on two different sets of data. The BLUPs of the 196 varieties in common between both datasets were used to evaluate correlations in the datasets used to develop the MTIs. In order to test for correlations between other traits considered important for processing, we also used data gathered in 2013, 2014, and 2015 by AgSeeds Unlimited in 38 locations in California (AgSeeds 2016). This dataset included measurements for fruit weight, Brix of raw juice, pH of raw juice, and several viscosity measurement: Juice Bostwick, predicted paste Bostwick, Oswald capillary viscometer measurements, and centistokes measurements. These correlations were based on data available for 92 varieties. The data were summarized by AgSeeds in three regions (North, Central, and South). BLUPs were calculated with a model similar to the one described for UCCE and PTAB datasets, with the exception that the three-way interaction between variety, location, and year was not included. Pearson correlations were calculated using the cor.test function within the R core package of R software version 3.01 (R Core Team 2013).

Development and testing of MTI

The UCCE and the Combined sets, containing BLUPs for Yield, Brix, color, pH, market share and life span values were used to develop models predicting success metrics based on phenotypic data. The equations obtained were then used as MTIs.

The optimization of models was performed with the R function glmulti from the glmulti package (Calcagno and De Mazancourt 2010). Models were fit using each dataset (UCCE or Combined) independently. Models were fit by considering only the main effects, or by also including the two way interactions (option level = 1 or 2, respectively). The initial equation used as an input for the glmulti function was of the form \( Y = \sum_{i = 1}^{n} X_{i} \), with Y the logarithmic transformation of the success metric, Xi the ith independent variable (yield, Brix, color, and pH, and their two way interactions), and n the number of independent variables. For the models with only main effects, n = 4. For models with main effects and two-way interactions, n = 10. We used the option method “h” to run exhaustive screening in which all possible models were tested and fit with a general linear model (option fitfunction = “glm”). The models with either the best Bayesian Information Criterion (BIC) or the best Akaike Information Criterion (AIC) were then selected (option crit = “bic”, or “aic”). The models obtained were of the form \( Y = \sum\nolimits_{i = 1}^{m} {a_{i} X_{i} } \), with Y the logarithmic transformation of the success metric, Xi the ith factor, ai the coefficient weighting the ith factor, and m the number of factors in the final model (m ≤ n).

The MTIs were tested using two different approaches: a leave-one-out cross validation method, and by testing model estimates developed in one dataset with observed values from the other dataset. Models were developed and tested simultaneously during the leave-one-out cross validation process. One variety was taken out of the dataset, and a model was developed using the function glmulti based on the data from the other varieties (training set). The coefficient for each factor was stored. The value of the MTI was calculated for the variety excluded from the training population (tested individual) using its phenotypic data and the coefficients developed with the training set. The process was repeated until each variety was excluded from the training set. Predictions for tested individuals from 198 runs for the UCCE set and 258 runs for the Combined set constituted the testing sets. The prediction abilities of the MTIs were evaluated on the testing set, using the MTI value calculated for each variety when it was excluded from the training set.

The average and standard deviation for each coefficient were calculated from the cross-validation runs, for each dataset separately. The coefficient of some independent variables had standard deviations higher than their averages. This situation happened when the independent variable had a null coefficient for the models developed in most of the runs of the cross validation. These coefficients were considered not reliable and were set to zero for the calculation of the MTIs. MTIs developed based on the UCCE set were then tested on the Combined set, and vice versa.

Predictions based on single traits, on products of pairs of traits, and on MTIs were compared. These models were evaluated based on two criteria: the correlation between phenotypic values and success metrics (prediction accuracy) and the ability to select varieties with proven success in the historical data (percentage of co-selection). The Pearson correlation coefficient between phenotypic values and success metric was used to evaluate the prediction accuracy of the models. We also evaluated the ability of the MTIs to select varieties known to be successful in the market. The upper 20% of varieties based on phenotypic values, and the upper 20% of varieties based on success metrics were selected. The percentage of varieties selected in both pools was calculated and called percentage of co-selection:

$$ \frac{{{\text{No.}}\;{\text{of}}\;{\text{varieties}}\;{\text{selected}}\;{\text{based}}\;{\text{on}}\;{\text{phenotype}}\;{\text{and}}\;{\text{based}}\;{\text{on }}\;{\text{success}}}}{{{\text{No.}}\;{\text{of}}\;{\text{varieties}}\;{\text{in}}\; 2 0 {\text{\% }}\;{\text{of }}\;{\text{the}}\;{\text{population}}}} \times 100. $$

In addition, percentages of co-selection were calculated based on a random sample of varieties. Twenty percent of the population was randomly sampled, and the upper 20% of the total population based on success metrics were selected. The percentage of co-selection between the random sample and the selection based on the success metric was calculated. The random sampling was repeated 10,000 times with replacement, and the mean and standard deviation for the percentage of co-selection of a random sample were calculated across the repetitions.

Results

Phenotypes of varieties

BLUPs for the traits recorded in PTAB or UCCE datasets are summarized for each variety in the supplementary file (Supplementary Table S1). Processing tomatoes grown in California between 1992 and 2013 had an average Brix of 5.21 (standard deviation (std) = 0.43), a color score of 24.7 (std = 2.0) and a pH of 4.41 (std = 0.08). The varieties tested in the UCCE trials between 1996 and 2013 had an average yield of 42.1 tons per acre (std = 11.9 tons per acre), a Brix of 5.15 (std = 0.52), a color score of 24.0 (std = 2.3), and a pH of 4.40 (std = 0.10). Yield significantly increased from 2006 to 2013, going from 40 tons per acre to 49 tons per acre (Fig. 1a). A similar trend was observed for the varieties in the top 10. The most successful varieties tended to have a higher yield between 1999 and 2007 compared to varieties not in the top 10, but this difference was not significant, with an exception in 2007 (P value = 0.02). Over the years Brix stayed relatively constant, with BLUPs ranging from 5.12 to 5.33, when considering the entire population (Fig. 1b). Varieties ranking in the top 10 ranged from 5.07 to 5.29, and tended to have lower Brix than the rest of the population (Fig. 1b), but this difference was not significant. BLUPs of color scores ranged from 24.1 to 25.8, with lower values before 1995 (Fig. 1c). A similar trend was observed for varieties from the top 10 (Fig. 1c). BLUPs for pH calculated for each year showed an increase from 4.38 (2001) to 4.47 (2009), followed by a decrease, with values reaching 4.40 in 2012 (Fig. 1d). Varieties ranking in the top 10 showed a BLUP for pH that was consistently lower by 0.02–0.03 units compared to the entire population. Varieties ranking in the top 10 had a significantly lower pH than the other varieties in 2007 and from 2010 to 2012 (P-value from 0.01 to 0.05) (Fig. 1d).

Fig. 1
figure 1

Trends of the value of yield (a), Brix (b), color (c) and pH (d) over the years. Yield data are from the University of California Cooperative extension (UCCE) trials. Brix, color and pH data are from the Processing Tomato Advisory Board (PTAB) dataset. Grey dots are best linear unbiased predictors (BLUPs) calculated across all varieties evaluated a given year. Black dots are BLUPs calculated for the 10 varieties with the highest number of load produced a given year. Error bars represent the standard errors. The curves are the local polynomial regression (loess curve) and their standard deviation. The BLUPs for each year were calculated with the random model \( Y_{ijk} = \mu + g_{i} + l_{j} + yr_{k} + g:l_{ij} + g:yr_{ik} + l:yr_{jk} + g:l:yr_{ijk} + \varepsilon_{ijk} \), with Yijk the value of the phenotypic traits, gi the ith genotype (variety), lj the jth location, yrk the kth year. Components separated by columns represent interactions. µ and ɛijk denotes the grand mean and the residual respectively

Data for disease resistances was found for 216 varieties out of the 927 varieties in the PTAB dataset. The resistances reported the most frequently by seed companies were for root knock nematode (Mi), bacterial speck (Pto), Verticilium race I, Fusarium wilt races I and II, tomato spotted wilt virus (TSWV), and bacterial canker (partial resistance). More rarely, resistances to bacterial spot (partial resistance), grey leaf spot (partial resistance), tobacco mosaic virus (Tm2 a), Fusarium wilt race III, and dodder (partial resistance) were reported. Varieties included from two to six resistances. On average, over the entire population, a variety had 3.8 resistances. The average number of resistances claimed per variety in 1992 was 2.9 ± 0.9, and increased to 4.0 ± 0.8 in 2013 (Fig. 2). The varieties in the top 10 averaged 2.0 ± 0.0 resistance genes in 1992 compared to 4.5 ± 0.5 in 2013 (Fig. 2). The number of claimed resistances in a variety was significantly different between 1992 and 2013 when considering all varieties (P-value = 0.000052), and when considering only varieties from the top 10 (P-value = 0.00025). The average number of resistances in a variety across the entire population, in a given year, reached four in 2012 (Fig. 2). However, varieties in the top 10 averaged four claimed resistances by 2001.

Fig. 2
figure 2

Trend over time of the average number of reported disease resistance. Data for disease resistance was recovered for 216 varieties. Grey dots are average number of disease resistance calculated across all varieties evaluated a given year. Black dots are average number of disease resistance calculated for the 10 varieties with the highest number of load produced a given year. Error bars represent the standard deviation. The curves are the Local polynomial regression (loess curve) and their standard deviation

Variance partitioning and heritability

The proportion of variance explained by varieties was interpreted as genetic variance. Brix had the highest genetic variance, with 25.5% (H2 = 0.36) and 20.3% (H2 = 0.28) of the variation in the UCCE and PTAB data, respectively (Fig. 3; Table 1). The percentage of genetic variance explained for yield was 6.8%, and its broad sense heritability was 0.17 (Fig. 3; Table 1). The percentage of variance explained by genetics for a given trait was always higher in the PTAB dataset compared to the UCCE data. Variance partitioning and heritability for all traits are summarized in Fig. 3 and in Table 1, respectively.

Fig. 3
figure 3

Proportion of variance explained by the effect of genotype, year, location, their interactions, and error. For a the University of California Cooperative Extension (UCCE) trials between 1996 and 2013, and b the Processing Tomato Advisory Board (PTAB) data between 1992 and 2013. Bar graphs are scaled to percentage of total variance. The variances were calculated from the random model \( Y_{ijk} = \mu + g_{i} + l_{j} + yr_{k} + g:l_{ij} + g:yr_{ik} + l:yr_{jk} + g:l:yr_{ijk} + \varepsilon_{ijk} \), with Yijk the value of the phenotypic traits, gi the ith genotype (variety), lj the jth location, yrk the kth year. Components separated by columns represent interactions. µ and ɛijk denotes the grand mean and the error, respectively

Table 1 Total variance, genetic variance, and heritability for yield, fruit brix, color, and pH

Correlation between phenotypic traits

Pearson coefficients for correlations between traits were calculated with phenotypic BLUPs evaluated in UCCE trials and in the PTAB dataset, using 196 varieties with no missing data. A significant negative correlation was observed between yield and Brix from UCCE trials (P-value = 2.1 × 10−4, ρ = −0.26) (Table 2). This correlation was marginally non-significant when Brix was estimated from the PTAB dataset (P-value = 0.07, ρ = −0.13). Yield was also negatively correlated with pH, but this association was only detected with phenotypic data from UCCE trials (P-value = 1.1 × 10−2, ρ = −0.18) (Table 2). Color was also negatively correlated with pH (P-value = 3.9 × 10−2, ρ = −0.15 with UCCE data, and P-value = 2.1 × 10−3, ρ = −0.22 with PTAB data) (Table 2). High correlations were observed between the values for the same traits when compared between UCCE and PTAB datasets (P-value <1.0 × 10−20, ρ ranged between 0.61 and 0.72) (Table 2).

Table 2 Pearson correlation coefficients for phenotypic traits

The AgSeeds dataset (AgSeeds 2016) contained fewer varieties and replications, but allowed us to examine the correlation between a larger set of fruit quality traits. Pearson coefficients for correlations between these traits are reported in Table 3. A significant correlation between pH and color was again observed. In the AgSeed dataset color was measured as a/b ratio, which has a negative correlation with the PTAB color measurement (Barrett and Anthon 2008). As expected, the a/b ratio showed a positive correlation with pH (Table 3). Raw pH was negatively correlated with Oswald capillary viscometer measurements, and centistokes measurements. Fruit weight was positively correlated with Brix, and negatively correlated with pH, Ostwald viscosity and centistokes viscosity.

Table 3 Pearson correlation coefficients between traits for quality data from the AgSeeds variety trials from 2013 to 2015

Success metrics

Considering all PTAB data between 1992 and 2013, an average of six varieties represented half of the production each year (Fig. 4). Based on the percentage of total loads, the top variety used in a single year ranged from a low of 9.8% (in 2004) to a high of 26.1% (in 1996). Over the twenty-two year period, market share values for varieties represented in the UCCE and Combined datasets ranged from 2.0 × 10−4 to 11.8%. Fourteen varieties had a market share value higher than 2% in the UCCE set and 18 varieties had a market share value higher than 2% in the Combined set. The 927 varieties represented between 1992 and 2013 spent an average of 4.7 years in the market. The most popular varieties, based on loads, tended to stay in the market longer, with those appearing in the top 50 lasting for an average of 8.0 years, and those appearing among the top 10 lasting for an average of 12 years. Among the 258 varieties with complete phenotypic data, BOS Halley 3155 remained in the market during the entire 22 years considered for this study. Twenty-nine varieties in the UCCE set and 53 in the Combined set had a life span longer than 10 years. The histograms for market share, life span and their logarithmic transformation are displayed for the Combined data (Fig. 5).

Fig. 4
figure 4

Variety numbers used in California processing tomato production. a The total number of varieties used per year in California and b the number of varieties representing the top 25% used (square), the top 50% used (triangle), the top 75% used (simple cross), and the top 90% used (double cross). The number of variety were extracted from the Processing Tomato Advisory Board (PTAB) yearly data from 1992 to 2013

Fig. 5
figure 5

Histograms of the success metrics life span (a) and market share (c), and of their logarithmic transformation (b, d) for the 258 varieties used for modeling success based on phenotypic data

Ability of MTIs and phenotypic traits to predict life span and market share

The models developed are summarized in Table 4. Yield was necessary to predict success in all the MTIs. Yield alone lead to the best main effect model predicting market share developed with the UCCE set using either AIC or BIC criterion (models M1 and M3). All other main effect and interaction models had at least one other independent variable (Table 4). The models based only on main effects always had a higher correlation with success metrics compared to the models based on both main effects and interactions (Tables 5, 6).

Table 4 Coefficients for the models predicting market share or life span based on phenotypic traits
Table 5 Prediction accuracy of main effects, products of pairs of traits and multi-trait indices
Table 6 The prediction ability of main effects, two way interactions and Multi-trait indices evaluated relative to percentage of co-selectiona

The selected MTIs and Yield showed significant positive correlation with both market share and life span, in UCCE and Combined datasets (Table 5). The variable pH showed a significant negative correlation for both success metrics in both datasets. Brix was significantly negatively correlated with life span when evaluated in the Combined dataset. Yield alone and MTI13 (0.12*Yield − 8.23*pH) had the highest correlation with market share (Table 5). MTI4 (0.31*Yield − 13.13*pH + 1.04*Brix*Yield + 4.51*Yield*pH + 102.21*Brix*pH), MTI13 (0.12*Yield − 8.23*pH), and MTI15 (0.11*Yield − 0.25*Color − 8.87*pH) had higher or equal correlation with life span, compared to yield (Table 5).

The ability of single phenotypic traits or their combinations to be used to select successful varieties was evaluated based on percentage of co-selection. Yield alone and MTI9 (0.14*Yield + 1.21*Brix) had the highest percentage of co-selection for market share with both the UCCE and Combined datasets (Table 6). MTI13 (0.12*Yield − 8.23*pH) and MTI15 (0.11*Yield − 0.25*Color − 8.87*pH) had higher or equal percentages of co-selection for life span, compared to yield with both datasets (Table 6).

Discussion

With the recent development of high-throughput genotyping tools (Sim et al. 2012a), their use on a large number of accessions (Blanca et al. 2015; Sim et al. 2012b), and the release of sequence data for nearly 440 tomato genomes (Lin et al. 2014; The 100 Tomato Genome Sequencing Consortium et al. 2014), there is an abundance of information to facilitate selection. These resources have generated interest in developing models predicting breeding values for tomato improvement (Duangjit et al. 2016; Hernández-Bautista et al. 2016; Yamamoto et al. 2016a, b). The potential for genomic selection has been investigated in a number of grain crops (Barabaschi et al. 2016; Bassi et al. 2016). Adoption of these approaches for vegetable crops has lagged for a number of reasons including small population sizes dictated by working with perishable crops, the lack of a tradition for collecting objective data, and the need to balance selection for a number of traits. Vegetable breeders have begun to emphasize the collection of quantitative data for an increasing number of production and quality characteristics. The ability to identify, quantify and combine the most valuable traits will be a key for maintaining or increasing genetic gain across production and quality characteristics.

Strengths and weaknesses of the dataset

We used two historical datasets, PTAB and UCCE, to describe phenotypic characteristics of the processing tomato varieties grown in California. In contrast to traditional experimentation where data analyses are planned before the lay out of the experimental design, we worked with data collected for other purposes than this study. Within this context, the two sources of data available have strengths and weaknesses. The PTAB dataset reflected quality traits of fruit delivered to the processing companies, which gave an accurate representation of the phenotypes used by the industry. Furthermore, for each year, the PTAB data reflects variety evaluation across a large number of locations. The PTAB data represented 9.2 M loads, and is thus the most comprehensive set of data for processing tomato quality. The UCCE trials were grown in farmers’ fields, and were subjected to the same agronomic practices as plants used for production and were organized with a traditional experimental design. Both datasets are highly unbalanced because different sets of varieties were evaluated every year. However, many varieties overlapped across years and locations, allowing us to use a statistical model with random effects to partition variance between genetic and environmental factors and to evaluate adjusted means for each variety. Despite the lack of balance, both datasets represent phenotypes associated with the varieties grown in California over a twenty year period, trial environments are relevant, and the data are highly replicated.

The phenotypic traits for each variety were estimated using BLUPs which are adjusted values based on random effects in the statistical models. The random model used to calculate BLUPs comprised location, year, and their interactions with variety. One component of the large error variance in the PTAB data is due to variation within location. Location information in PTAB tracks broadly to counties. We therefore defined “location” based on a North–South gradient and we combined counties into groups of similar latitudes. This criterion reflects one potential level of differences and is more likely to reflect an environmental variable than the somewhat arbitrary political delimitation of a county. However, these “location” levels likely comprised several environments due to cultural practices, soil types, and climatic conditions which also vary within a county or a group of counties. The PTAB data were not collected in a manner that allowed us to account for this detailed level of variation. The PTAB data do represent a very large number of environments and varieties with variation due to local practices, local conditions, and interactions accounted for in the error in our random model.

Despite potential weakness due to a lack of balance and the vague interpretation of the location factor, these data represent the most comprehensive phenotypic information available for any vegetable crop. When we compare analysis from these imperfect datasets with analysis based on a randomized complete block design we obtain similar results. The proportion of phenotypic variance due to genetic estimates from both PTAB and UCCE data were similar to the ones obtained for a large population of modern tomato varieties tested using classic experimental design (Merk et al. 2012).

Historical trends

The evaluation of tomato varieties across years in the UCCE trials showed an increase of yield between 2006 and 2013. The yield increase observed reflects genetic changes influencing traits such as increases in photosynthetic rates, leaf area and nitrogen use efficiency; and architectural changes related to canopy structure (Barrios-Masias and Jackson 2014). There was also an increase in the average number of resistance genes carried by processing tomato varieties from 1992 to 2013, a trend which also reflects genetic improvement potentially leading to higher yield. Changes in agronomic practices could also explain part of the yield increase. During the time-frame of data collection, processing tomato production in California shifted from furrow to drip irrigation, and from direct seeding to transplantation. These techniques also likely increase yield (Leskovar et al. 2014; Mitchell et al. 2012). Brix did not show significant changes over time and was stable at around 5.2 units. The negative correlation between yield and Brix is well documented and thought to be the main factor slowing Brix improvement (Grandillo et al. 1999; Merk et al. 2012; Stevens and Rudich 1978). In addition, the varieties carrying resistance to tomato mosaic virus (ToMV) and to tomato yellow leave curl virus (TYLCV) tend to have a low Brix (Giordano et al. 2000), but are preferred by growers in recent years due to an increased presence of virus vectors. These examples reveal tensions among traits considered important by the industry, yet difficult to combine in a unique variety. The yield increase and the Brix stability observed reflect continued selection for Brix. There was an increase in pH between 2001 and 2009 for the overall population, and for the varieties ranking in the top 10. This increase was previously attributed to the practice of extended field holding of fruits (Anthon et al. 2011). The subsequent decrease of pH observed from 2009 to 2013 suggests the possibility that management was altered as a result of outreach efforts aimed at countering the trend toward undesirable increases in pH. It is interesting to note that despite trend changes due to cultural practices, the most successful varieties consistently showed a lower pH relative to the population as a whole. The association between pH and success metrics will be discussed further, below.

Factors driving variety success

The regressions obtained between phenotype (single trait or MTIs) and the success metrics were significant, but correlation coefficients suggest that the models only explain 20–44% of the variation (Table 5). These moderate correlations indicate that other factors also shape the success of a variety. Genetic factors that were not considered in our MTIs included disease resistance and several processing traits not measured in the datasets. The average number of resistances increased faster for the varieties ranking in the top 10 than for the rest of the population. Thus resistance appears to be a trait valued by the market, and incorporating it into models could lead to a better prediction of variety success. However, data reflecting disease resistance was only available from descriptions in seed catalogues and was incomplete. Resistance data was available for 216 varieties out of the 927 total varieties in the PTAB data. Resistance information was only available for 113 varieties in the datasets used to develop models. Modeling the role of resistance will require a more complete dataset.

Non-genetic factors also influence success of varieties. The role of processors in the choice of the varieties cultivated by the contracted growers is difficult to evaluate. The contracts between growers and processors in California are negotiated by the California Tomato Growers Association (CTGA), but are different from one processor to another and subject to annual changes. The contracts are set with a base price per ton, with a linear deduction or increase depending on quality criteria (Hueth and Ligon 2002). Contracts have an impact on both the agricultural techniques used by the farmers, and on the varieties grown (Alexander et al. 2007). The contracts are confidential, thus it is difficult to use these documents to weigh traits in the context of a breeding program. Similarly, the human factors involved in variety choices, such as trust in a given brand or company, are difficult to evaluate. As evidence of confounding effects, market share was significantly associated with seed company (Kruskal–Wallis test, P-value = 6.0 × 10−8) (Table S1). In addition processing tomato varieties can be divided into several categories based on their properties and use, such as multiuse, solid, and viscosity (Gould 1992; Stevens and Rick 1986). The market share or life span of a variety could be influenced by its market category, particularly for the categories representing niches for the industry. Processing categories were not available for all varieties, and most of the varieties for which we recovered this information were classified as multi-use (83%). Thus the MTIs represent the processing tomato market regardless of the variety categories, and should be considered a multi-use index.

Weights of the multi trait indices

Yield alone ranked among the highest indices for ability to predict variety success in the market. This observation confirms that yield has been the main driver of variety success. The equation Y × B has been suggested to balance selection for two important traits (Eshed et al. 1996; Grandillo et al. 1999; Tanksley et al. 1996). However, the value of this equation does not significantly correlate with variety success (Table 5). The YxB index demonstrated a percentage of co-selection higher than for a random sample, but lower than yield alone and lower than the best MTIs we developed (Table 6).

Three of the MTIs we developed predicted success of varieties with approximately the same or better accuracy than yield alone. The use of these MTIs in breeding programs could increase genetic gains for fruit quality traits. These MTIs were MTI9 [0.14*yield + 1.21*Brix], MTI13 [0.12*Yield − 8.23*pH], and MTI15 [0.11*Yield − 0.25*Color − 8.87*pH]. MTIs with three or more components have a slightly lower ability to predict success compared to their equivalent with two components; however they may be more suitable to use in a breeding program aimed at improving fruit quality. Yield consistently appeared as an important trait to predict success. MTI13, and MTI15 weight pH negatively with larger coefficients than we expected. Although this result is consistent with food safety goals of low pH (Anthon et al. 2011), it was surprising because pH is rarely mentioned as a trait of high importance by plant breeders. It is possible that pH is also an indirect measure of other traits. For example in the AgSeed data, low pH was found to be correlated with high fruits weight, which is desirable for diced products (Barrett et al. 2006). Fruit with low pH also tended to have high viscosity, which is desirable for paste production (Hui and Evranuz 2015). In addition low pH was correlated with low a/b ratio, an indication of good color. More importantly, pH differed significantly by company, with Heinz and De Ruiter (Monsanto), two companies with varieties ranking high for market share, having lower pH than the rest of the market. It is therefore possible that pH is a variable that reflects trait-structure within the landscape of breeding companies serving the California market. MTI9 [0.14*yield + 1.21*Brix], is particularly adapted in the context of a breeding program for varieties used for tomato product such as pastes and would be a desirable substitute for the YxB index. MTI15 [0.11*Yield − 0.25*Color − 8.87*pH], favors tomatoes with deeper color and may be well adapted for breeding tomatoes used for whole-peel or diced products. An alternative strategy would be to select for yield while using a threshold for quality traits, with Brix set at a minimum threshold of 5.2, and PTAB color scores at a maximum threshold of 24 units.

Conclusion

In the absence of consistent economic data valuing attributes, weighing traits based on their ability to predict success provides an alternative for processing tomato breeders to develop MTIs. We used historical data to develop models combining and weighting yield and quality traits in order to predict variety success in the market. For processing tomato, selected models using yield, fruit Brix, color, or pH, correlated with market share with coefficients ranging from 0.30 to 0.43. These models were also superior to random sampling for their percentage of co-selection. The models developed can be used as MTIs to select superior individuals in a breeding population. This method could be applied to other crops for which accurate measurement of success and phenotypic data are available for a large number of varieties.