Genome-wide association study and genomic prediction in citrus: Potential of genomics-assisted breeding for fruit quality traits

Novel genomics-based approaches such as genome-wide association studies (GWAS) and genomic selection (GS) are expected to be useful in fruit tree breeding, which requires much time from the cross to the release of a cultivar because of the long generation time. In this study, a citrus parental population (111 varieties) and a breeding population (676 individuals from 35 full-sib families) were genotyped for 1,841 single nucleotide polymorphisms (SNPs) and phenotyped for 17 fruit quality traits. GWAS power and prediction accuracy were increased by combining the parental and breeding populations. A multi-kernel model considering both additive and dominance effects improved prediction accuracy for acidity and juiciness, implying that the effects of both types are important for these traits. Genomic best linear unbiased prediction (GBLUP) with linear ridge kernel regression (RR) was more robust and accurate than GBLUP with non-linear Gaussian kernel regression (GAUSS) in the tails of the phenotypic distribution. The results of this study suggest that both GWAS and GS are effective for genetic improvement of citrus fruit traits. Furthermore, the data collected from breeding populations are beneficial for increasing the detection power of GWAS and the prediction accuracy of GS.


Results
Linkage disequilibrium and population structure. The levels of linkage disequilibrium (LD) in the parental population (111 varieties; Supplementary Table S1) and combined parental and breeding populations (Supplementary Table S2; 787 genotypes in total) were evaluated by calculating the squared allele-frequency correlation (r 2 ) between each pair of SNPs on the same chromosome. The r 2 values were plotted against the physical distance for the parental population and the combined population; curves fitted for the relationships between r 2 and physical map distance indicated that LD extended over 1 Mb (r 2 > 0. 25) and decayed to relatively low levels (r 2 < 0.10) within 20 Mb in both populations (Fig. 1). The mean LD between adjacent SNPs was similar (r 2 = 0. 45) in both populations, although the decay of LD was slightly faster in the combined population. The patterns of LD decay were similar in all chromosomes ( Supplementary Fig. S1).
The genetic structure of the parental population was estimated by using hierarchical clustering, ADMIXTURE clustering, and principal component analysis (PCA). Hierarchical clustering divided the parental population into two major clusters ( Fig. 2A). The smaller cluster contained mainly pummelos, which are large-fruit varieties ( Fig. 2C; Supplementary Table S1). The larger cluster was further divided into two major sub-clusters ( Fig. 2A). The smaller sub-cluster contained mainly mandarins, which are small-fruit varieties ( Fig. 2C; Supplementary  Table S1). According to ADMIXTURE clustering, the optimal number (K) was estimated as 4, because cross-validation (CV) error rapidly decreased until K = 4 and changed little thereafter, although K = 6 corresponded to the lowest CV error (Supplementary Fig. S2). The deduced optimal K value was similar to those reported by Curk et al. 17 . The cluster containing large-fruit varieties but not the sub-cluster containing small-fruit varieties showed clear genetic differentiation (Fig. 2B). Similarly, PCA indicated the presence of two main clusters, one with large and the other one with small fruit size ( Supplementary Fig. S3). Whereas part of the parental population belonged to the clusters of large-or small-fruit varieties, most varieties showed no clear separation in hierarchical clustering, ADMIXTURE clustering, or PCA. PCA also suggested low stratification in the combined population ( Supplementary Fig. S4).
Two genes, Ciclev10031587 m.g and Ciclev10031003 m.g, were found between the significant SNPs on Chr. 4 for ColorP and ColorF; these genes were located 490 kb and 17 kb apart from the nearest significant SNPs, respectively, within a high-LD region ( Supplementary Fig. S7). These genes are annotated as phytoene desaturation 1 (pds1) and carotenoid cleavage dioxygenase 4 (ccd4), respectively. A significant SNP on Chr. 3 for FruH and Peeling resided in the gene Ciclev10018456 m.g, which is annotated as callose synthase. This gene was also located in the region flanking a significant SNP for FleH. The common significant SNP on Chr. 4 for Weight, ColorP, Bitter, and Acid resided in the gene Ciclev10031681 m.g, which is annotated as glutamate dehydrogenase.
Accuracy of prediction models for 17 fruit traits in the parental population. Ten-fold CV showed that prediction accuracy was high for Weight, FruH, ColorP, Peeling, ColorF, and FirmLM (r ≥ 0.7), intermediate for fruit shape (Shape), smoothness of pericarp (SmoothP), Aroma, FleH, Juicy, Seed, Bitter, taste (Taste), Brix, and Acid (0.3 ≤ r < 0.7), and low for appearance (Appear) (r < 0.3) (Fig. 4A). GBLUP (RR and GAUSS), Ridge Regression, and Bayesian regressions were more accurate for most traits than the other methods, whereas the accuracies of Random Forest, Elastic Net, and Lasso were trait-dependent. While the prediction accuracy level of a single method considerably varied among traits, the mean prediction of all 11 methods showed consistent accuracy, i.e., an upper-middle level of accuracy among the methods, for all traits. The non-linear method GBLUP (GAUSS) had the highest prediction accuracy for Acid. Because most of the traits (except Weight, Brix, and Acid) were scored as ordinal categorical traits, several prediction models specific for categorical responses were used. Because the prediction accuracy of these models was lower than that of models for continuous responses ( Supplementary Fig. S8), the latter models were used for subsequent analyses. We built prediction models that included the information from GWAS for the 17 traits. For 15 traits (except Appear and Bitter), GBLUP (RR) that treats all SNPs as random effects was more accurate than multiple linear regression (MLR) that treats the three most significant SNPs (r < 0.6; red points in Supplementary Fig. S5) as fixed effects, or GBLUP (RR) that treats the three most significant SNPs as fixed effects and other SNPs as random effects, even though MLR had high prediction accuracies (r > 0.7) for ColorP, ColorF, and FirmLM (Fig. 4B).
A multi-kernel model considering dominance and additive genetic effects was also evaluated. For Acid and Juicy, this model was more accurate than a model with only additive effects, whereas opposite results were obtained for Appear, Shape, SmoothP, FleH, Seed, and Taste (Fig. 4C). For the other traits, accuracies were similar for both models. The additive genetic effects were the major factor contributing to genetic variation for 14 traits (except Appear, Juicy, and Bitter) (Supplementary Table S6).
Prediction for distribution tails. As described above, a diversity of fruit weight was found in the parental population ( Fig. 2; Supplementary Fig. S9A). Because the fruit weight of pummelos, such as 'Egami buntan' , 'Banpeiyu' , 'Mato buntan' , and 'Hirado buntan' , was very large (>800 g), we hypothesized that a genetic system differs between the varieties with extremely large fruits and other varieties. To validate the hypothesis, we tried to predict fruit weight in the large-fruit group (Fig. 2) by using a model built based on the data of other citrus varieties. The results showed a high accuracy of such prediction (Fig. 5A). Higher accuracy was attained by GBLUP (RR) (r = 0.89) than by GBLUP (GAUSS) (r = 0.74), whereas the accuracy of GBLUP (RR) was slightly lower than that of GBLUP (GAUSS) for all ranges of fruit weight (Fig. 4A). The extremely large fruit weight in pummelos was properly predicted, especially with GBLUP (RR). Similarly, GBLUP (RR) had a slightly higher accuracy for the small-fruit group (Fig. 5B). We also applied rank-ordered 4-fold CV for all 17 traits. GBLUP (RR) attained higher accuracy than did GBLUP (GAUSS) (including the number of outliers) for distribution tails for all the traits ( Supplementary Fig. S9).

Prediction for the breeding population.
To evaluate the prediction accuracy in the breeding population, three different training populations for modeling were compared. For 13 traits (except Shape, ColorP, Bitter, and Taste; Fig. 6, Supplementary Fig. S10), the accuracy of the model based on the breeding population was on average 1.35 times that of the model based on the parental population, indicating that information from the breeding population improved prediction accuracy. The highest accuracy was attained in the model based on the combined population for 14 traits (except Aroma, Bitter, and Taste). The results were similar when prediction accuracy was evaluated on a single-family basis ( Supplementary Fig. S11).

Discussion
The LD pattern is a key factor influencing the power of GWAS and the accuracy of GS, because these two approaches are based on LD between markers and causal polymorphisms 9,11,13,18 . Generally, the range of LD is smaller in heterozygous outcrossing species than in homozygous selfing species 19,20 , and therefore a large number of markers are needed to cover the genome of outcrossing species. In the citrus population used in this study, we observed a wide range of LD (r 2 > 0.25 at 1 Mb; Fig. 1), even though most genotypes obtained from conventional cross-breeding are highly heterozygous 21 . Because the citrus population has experienced a population bottleneck, i.e. it had a limited number of founders 3 , the wide range of LD 19,20 was expected to increase. Citrus cultivars are developed through a small number of generations, and are mainly maintained through vegetative propagation. These breeding and management systems make recombination unlikely to occur and may also maintain LD 22 . The mean r 2 between adjacent markers has been used to determine target marker densities for GS. For high-heritability traits, a mean r 2 of 0.15 is sufficient, but for low-heritability traits increasing the r 2 to 0.2 improved prediction accuracy 23 . In the citrus population used in this study, mean r 2 (0.45) was higher than the above value, suggesting that marker densities in the population would be sufficient for GS.
Population structure extends the range of LD 24 and can cause spurious associations in GWAS 11 . Hierarchical clustering, ADMIXTURE clustering, and PCA showed that some varieties in the parental population formed clusters related to fruit weight, but most varieties showed no clear separation. The wide range of LD observed in the parental population was probably due to its small size rather than to the population structure. Given the wide range and high level of LD and the weak population structure, GWAS and GS would be effective in citrus populations. For most traits, the number of significant SNPs and the -log 10 (p) values of peaks detected in GWAS were greater with the combined (787 genotypes in total) than with the parental population (111 varieties), which would increase the chance of finding good markers for MAS. These results indicate that GWAS using the combined population has higher statistical power for detecting QTLs than GWAS using only the parental population, and that the QTLs detected were shared between the parental and breeding populations. For some traits, the significant SNPs differ between the combined and parental populations, possibly because of the differences between the populations in detection power, allele frequency, or population structure.
In addition to the sample size, the difference in the LD pattern between the combined and parental populations can influence the resolution of GWAS. For example, multi-breed GWAS had a higher resolution than within-breed GWAS in dairy cattle 25 . This difference was explained by narrower QTL regions because of the short conserved regions of LD across breeds 26 . In the present study, the LD decay was rapid in the combined population, although the mean LD between adjacent SNPs was similar (r 2 = 0.45) in both populations, implying that the LD in the combined population is conserved over shorter distances than that in the parental population. The short conserved regions of LD could improve the precision of QTL detection in the combined population by narrowing down the QTL regions.
GWAS in the combined population detected SNPs significantly associated (false discovery rate <0.05) with 11 traits. The most significant association was detected for Weight on Chr. 8 where one QTL named FW8 for fruit weight has been identified in a bi-parental QTL mapping study that used a mandarin F 1 population derived from 'Fortune' × 'Murcott' 27 . The most significant SNP detected for Weight resided in the same region as FW8, confirming a single major QTL for fruit weight on Chr. 8.
Identification of seven common significant SNPs for ColorP and ColorF suggests a close genetic linkage between QTLs controlling these traits, resulting in high phenotypic correlation. Common significant SNPs for other traits could be explained similarly. Significant SNPs for ColorP and ColorF on Chr. 4 were located in multiple QTLs for fruit flavedo and juice color identified on the linkage group MUR4.1 27 . Two genes, Ciclev10031587 m.g and Ciclev10031003 m.g, annotated as pds1 and ccd4 respectively, are colocalized with the QTLs on MUR4.1 27 . The pds1 gene is involved in the carotenoid biosynthetic pathway in citrus 28,29 and Arabidopsis 30 , whereas ccd4 is involved in carotenoid accumulation in peach 31,32 . The pericarp and flesh color of citrus fruits is determined by carotenoid accumulation 3 .
A significant SNP for FruH and Peeling on Chr. 3 resides in the callose synthase gene (Ciclev10018456 m.g). The polysaccharide callose is a component of plant cell wall 33 . Cell wall polysaccharides are associated with cell wall elasticity and viscosity, which influence fruit firmness. Softening of fruit tissues in tomato and avocado is caused by the degradation of cell wall polysaccharides 34, 35 .
A single significant SNP for Acid detected on Chr. 4 is consistent with those for Weight, ColorP, and Bitter, and with the largest (although not significant) peak SNP for Brix. This SNP resides in the gene Ciclev10031681 m.g, which is annotated as glutamate dehydrogenase. In avocado, the activity and protein content of GDH remain constant during fruit development, but increase ~4-fold during fruit ripening 36 . Generally, the acidity decreases while the sugar content increases during fruit maturation 3 . GDH could have pleiotropic effects on fruit quality traits involved in citrus fruit maturation. If so, common significant SNPs could be used as common markers for such traits in MAS programs.
The prediction accuracy of GBLUP (RR) that treats all SNPs as random effects, was highest for all the traits except Appear and Bitter, implying that all the traits except Appear and Bitter are controlled by many genes and/ or QTLs. For Shape, SmoothP, Aroma, Taste, and Brix, for which the levels of prediction accuracy were intermediate, even though no significant association was detected. These results suggest that GS is more accurate for these Figure 6. Prediction accuracy of the breeding population. Prediction accuracy was measured as the Pearson's correlation coefficient (r) between predicted genotypic values and phenotypic values. The prediction accuracy was calculated for combined all families. Only the mean prediction accuracy of all the methods is shown (Supplementary Fig. S10). Training populations are shown in the figure key. When the breeding population or combined parental and breeding populations were used as training sets, one family was excluded and its phenotype was predicted.
Scientific RepoRts | 7: 4721 | DOI:10.1038/s41598-017-05100-x traits than MAS based on significant SNPs. On the other hand, high prediction accuracies (r > 0.7) of MLR for ColorP, ColorF, and FirmLM imply that MAS performed by using a few markers enables the selection at low-cost genotyping for these traits.
A multi-kernel model that considers both additive and dominance effects improved the accuracy for Acid and Juicy, suggesting that the effects of both types are important for these traits. The highest prediction accuracy of GBLUP (GAUSS) for Acid supports this idea. Whereas GBLUP (RR) captures additive genetic effects, GBLUP (GAUSS) captures non-additive genetic effects (e.g., the dominance effect) by modeling non-linear relationships between markers and the phenotype 37,38 . For the other traits, prediction accuracy was similar or lower with the model considering additive and dominance effects, suggesting that additive models are sufficient to explain genetic variations in the population. Consistent with this assumption, additive genetic effects were the major factor contributing to genetic variation. Using models that consider both additive and dominance effects may not always be beneficial, as suggested by several previous studies. Adding dominance and epistatic effects to additive genetic effects did not help genomic prediction in rice, suggesting a high correlation between multiple variance components 39 . In wheat, considering only additive effects resulted in equal or even higher prediction accuracy than considering both additive and dominance effects 40 , even though considering both types of effects increased the prediction accuracy in another simulation study that used a large population 41 . These results suggest that models considering dominance effects are quite sensitive to the number of individuals in the population 40 .
The prediction for the tails of fruit weight distribution was accurate for large-fruit varieties. The extremely large fruit weight of pummelos was properly predicted, especially with GBLUP (RR), even though the accuracy of GBLUP (RR) for the whole range of distribution was slightly lower than that of GBLUP (GAUSS). This implies that the genetic architecture of the extremely large fruit weight of pummelos can be explained by the additive accumulation of fruit weight-related genes of "non-large-fruit" varieties, which include a number of mandarins and sweet oranges. Cultivated pummelos were originally selected from a single progenitor species, C. maxima, whilst some of the cultivated mandarins were developed by introgression from C. maxima to the ancestral mandarin species C. reticulata 42 ; sweet oranges (C. sinensis) originated from pummelo and mandarin 43 . Thus, mandarins and sweet oranges are related to pummelos, which may have contributed to the high prediction accuracy for the extremely large fruit weight of pummelos. For other traits, RR also outperformed GAUSS in the prediction of the distribution tails, suggesting that RR is much more robust and accurate than GAUSS when used for this purpose. This is an important implication for the use of GS in breeding programs, because the prediction of distribution tails is important for increasing or decreasing genotypic values of target traits.
For the breeding population, the prediction accuracy was lower in the model built using only the parental population than in the model built using the combined population. This result suggests that the high genetic diversity in the parental population is not always useful to predict a small variation in the breeding population. Generally, the prediction accuracy across populations that are of low relevance is lower than that within a population 44 . Lower prediction accuracies across populations than within populations were also observed in apple 45,46 , grapevine 47 and maize 48 . On the other hand, across-population evaluation is reportedly preferable to that within populations when the populations are closely related to each other, marker density is high, or the number of phenotypic records is small 49,50 . The advantage of using data from multiple populations should be discussed from the viewpoint of the costs of genotyping and phenotyping, as pointed out by Muranty et al. 46 . In the present study, the prediction accuracy was improved by combining the parental and breeding populations. The shorter conserved region of LD in the combined population than in the parental population could improve not only GWAS resolution but also the prediction accuracy by narrowing down the QTL regions. Our results suggest that multiple populations closely related to each other are useful to predict a small variation in the breeding population.
The results of this study suggest that GWAS and GS have a good potential for genetic improvement of fruit quality traits in citrus. Declining sequencing cost enables genomics-assisted breeding with very large numbers of DNA markers, and it has been anticipated that this approach would improve fruit tree breeding by avoiding constraints imposed by the long juvenile period and large plant size 51 . In the present study, the GWAS power was increased and the prediction accuracy was improved by combining the breeding and parental populations. The results support the idea of breeding-assisted genomics, in which the data collected from actual breeding programs will be beneficial for plant functional genomics 16 . Because the phenotypic data are accumulated every year, combining marker genotypic data in a breeding program and gathering them will increase the detection power of GWAS and the prediction accuracy of GS. The combination of genomics-based approaches, GWAS and GS, and the data from actual breeding programs will facilitate both functional genomics and fruit tree breeding.  Table S1), and 35 full-sib families with a total of 676 F 1 individuals, called the breeding population (Supplementary Table S2), were used in this study. The breeding population was derived from crosses among 45 varieties, of which 41 were included in the parental population. All genetic materials were maintained at NIFTS (Nagasaki and Shizuoka, Japan).

Plant materials and fruit assessment.
We evaluated 17 fruit quality traits for the parental and breeding populations by using instrumental, sensory, or visual assessment methods (Table 1). Five fruits were sampled from one tree of each genotype in December of each year, when many genotypes were maturing. The mean values obtained from 2008 to 2014 were estimated by fitting a mixed linear model (MLM); year was treated as a fixed effect and variety as a random effect to remove the year effect. The best linear unbiased prediction (BLUP) estimated by the MLM was used for subsequent analysis as the dependent variables of regression models for GWAS and genomic prediction models. The MLM was implemented in the "lmer" function of the R package lme4 ver. 1.1-7 52 . All traits except Weight, Brix, and Acid had ordinal categorical values (Table 1).
Next-generation sequencing data of 12 citrus varieties was obtained with Illumina HiSeq 2000 in paired-end mode for the development of a SNP Array for genotyping 768 SNPs 53 , and a SNP array for genotyping 1,536 SNPs was designed from the sequencing data as described by Shimizu et al. 53 . SNP genotyping for 768 or 1,536 arrays was performed as described by Shimizu et al. 53 . Parentage tests were performed using a function of GUGS (General Utilities for Genotyping Study) software (T. Shimizu, in preparation) with known hybrid trios. Finally, 1,841 verified SNPs were obtained (Supplementary Data S1). For further analysis, each of the 1,841 SNP genotypes was converted to 1 (AA homozygotes), −1 (BB homozygotes), or 0 (AB heterozygotes). The rate of missing SNP genotypes was 0.17. We used BEAGLE ver. 3.3.2 54 to impute the missing genotypes; the mean imputation accuracy (r 2 ) was 0.97.
Estimation of linkage disequilibrium. To estimate LD between each pair of SNPs within the same chromosome, we calculated the squared correlation coefficients (r 2 ) of SNP genotypes and plotted them against physical distance between the corresponding markers in Mb. Local polynomial regression with kernel weight was conducted using the "locpoly" function in the R package KernSmooth ver. 2.23-13 55 to test the relation between the r 2 values and physical map distances. Physical distances between adjacent markers were 0.056-2,243 kb (mean, 156.4 kb).
Population structure analysis. The genetic structure of the parental population was estimated using hierarchical clustering, ADMIXTURE clustering, and PCA. Hierarchical clustering based on Ward's method 56 with Euclidean distance was conducted by using the R function "hclust". Model-based clustering implemented in the software ADMIXTURE ver. 1.3.0 57 infers population structure by estimating individual admixture proportions from multi-locus SNP data by using a maximum-likelihood method. The number of ancestral populations (K) in the parental population varying from 1 to 10 was assumed when using ADMIXTURE. To choose the optimal K value, 5-fold CV was performed. PCA was conducted using the R function "prcomp". The optimal number of principal components (PCs) in the parental population was determined by estimating the variances of PC scores. To locate the breeding population in the PCA space of the parental population, the PC scores of the breeding population were estimated based on PCA of the parental population. A heat map was generated to estimate the phenotypic population structure of fruit weight (log-converted Weight) by using R package ggplot2 ver. 1.0.0 58 .
Regression models for GWAS. Association analysis was conducted by using an MLM 59 implemented in the "GWAS" function of the R package rrBLUP ver. 4.3 37 . To avoid spurious associations due to population structure, a kinship matrix and four PCs were included in the MLM as fixed effects. The kinship matrix was computed by using the "A.mat" function of the R package rrBLUP ver. 4.3 37 . The number of PCs was selected based on the variances of PC scores. The variance of PC score rapidly decreased until PC4, and gradually thereafter ( Supplementary Fig. S6). Annotations of the genes containing significant SNPs were obtained from the Phytozome version 12.0.1 (https://phytozome.jgi.doe.gov/pz/portal.html/) and CitrusCyc Pathways version 3.0 databases (http://pathways.citrusgenomedb.org/). The LD heatmaps of the regions surrounding the peaks on Chr. 4 for ColorP and ColorF were constructed by using the R package LDheatmap ver. 0.99-1 60 .
Genomic prediction models. We used 12 methods to evaluate prediction accuracy. A MLM-based method, genomic best linear unbiased prediction (GBLUP), was performed by using the "kinship.BLUP" function of the R package rrBLUP ver. 4.3 37 with linear ridge kernel regression (RR) and non-linear Gaussian kernel regression (GAUSS). We also used the R package glmnet ver. 1.9-8 61 for three different linear regression methods: Ridge Regression (alpha = 0), Lasso (alpha = 1), and Elastic Net (alpha = 0.5). The latter is a combination of ridge regression and lasso. A non-linear decision tree-based ensemble learning method, Random Forest, was run in the R package randomForest ver. 4.6-10 62 . The Bayesian linear regression models Bayesian Ridge Regression (BRR), Bayesian Lasso (BL), BayesA, BayesB, and BayesC were also used for the modeling. These models were implemented in the R package BGLR ver. 1.0.3 63 .
Because 14 of the 17 traits analyzed in this study were scored as ordinal categories (Table 1), seven regression models for the categorical data were also tested. The categorical data was obtained by rounding off the mean continuous values. We used the ordinal probit model-based BayesA and BayesB 64,65 , and the R package BGLR for BRR, BL, BayesA, BayesB, and BayesC. The BGLR package supports models for categorical traits.
To evaluate the effect of peak SNPs with high -log 10 (p) values detected in GWAS, the accuracies of three models were compared. The first model was MLR implemented in the R function "lm" 66,67 . The top three peak SNPs with high -log 10 (p) values of each trait were entered in MLR unless the correlation coefficient between the SNPs was ≥0.6 to prevent multicollinearity. The proportions of the variance explained by the top three peak SNPs for each trait were also estimated by the MLR model. The second model was GBLUP (RR) that treats the effects of the top three SNPs as fixed effects and the effects of other SNPs as random effects. The last model was the commonly used version of GBLUP (RR), which treats all SNPs as random effects.
To take into account the high observed heterozygosity (H o ) of the parental population (>0.50) 21 , a multi-kernel model that considers both additive and dominance genetic effects was compared with a single-kernel model that considers only additive effects. Both models were implemented in the R package of BGLR ver. 1.0.3 63 . The additive relationship matrix (kinship matrix) was computed by the "A.mat" function of the R package rrBLUP, whereas the dominance relationship matrix was calculated as D in equation (6) in Vitezica et al. 68 using a modified "A.mat" function. The additive (σ a 2 ) and dominance (σ d 2 ) genetic variances and residual variance (σ e 2 ) were estimated by using the multi-kernel model. Narrow-sense heritability (h 2 ) of each trait was computed as the ratio of σ a 2 to the total phenotypic variance σ σ σ σ σ σ σ To verify the models, we evaluated the prediction accuracy by 10-fold CV using the parental population randomly partitioned into each fold. The CV was repeated 5 times, and the same partition patterns were adapted to all prediction models in each CV. The prediction accuracy was defined as Pearson's correlation coefficients (r) between observed and predicted genotypic values and regarded as r = 0 at r < 0.
To evaluate the prediction accuracy for the tails of the distribution of fruit weight, we employed the linear ridge kernel regression model GBLUP (RR) and the non-linear Gaussian kernel regression model GBLUP (GAUSS). All varieties except the large-and small-weight groups were used as a training set, whereas the largeand small-weight groups were used as a test set to assess the difficulty of the prediction of the distribution tails. The training set was also used for leave-one-out CV as a control. The prediction accuracy for the distribution tails was also evaluated for other traits. Rank-ordered 4-fold CV of the parental population was performed. The prediction accuracy was calculated in each partition. To show the tails of the distribution, phenotypic variations of the traits were visualized as jitter plots superimposed onto boxplots by using the R packages ggplot2 ver. 1.0.1 58 and Rmisc ver. 1.5 69 .
To evaluate the prediction accuracy in the breeding population, all 12 methods were applied. Three different training populations for modeling were compared: only the parental population, only the breeding population (with one family excluded), and both the parental and breeding populations combined (with one family excluded); the phenotype of the excluded family was predicted to mimic actual breeding in which the phenotypic information of a family targeted by GS is generally not available. The prediction accuracies for each family or combined all families were calculated.