Identification of SNPs and InDels associated with berry size in table grapes integrating genetic and transcriptomic approaches

Berry size is considered as one of the main selection criteria in table grapes breeding programs, due to the consumer preferences. However, berry size is a complex quantitive trait under polygenic control, and its genetic determination of berry weight is not yet fully understood. The aim of this work was to perform marker discovery using a transcriptomic approach, in order to identify and characterize SNP and InDel markers associated with berry size in table grapes. We used an integrative analysis based on RNA-Seq, SNP/InDel search and validation on table grape segregants and varieties with different genetic backgrounds. Thirty SNPs and eight InDels were identified using a transcriptomic approach (RNA-Seq). These markers were selected from SNP/InDel found among segregants from a Ruby x Sultanina population with contrasting phenotypes for berry size. The set of 38 SNP and InDel markers was distributed in eight chromosomes. Genotype-phenotype association analyses were performed using a set of 13 RxS segregants and 41 table grapes varieties with different genetic backgrounds during three seasons. The results showed several degrees of association of these markers with berry size (10.2 to 30.7%) as other berry-related traits such as length and width. The co-localization of SNP and /or InDel markers and previously reported QTLs and candidate genes associated with berry size were analysed. We identified a set of informative and transferable SNP and InDel markers associated with berry size. Our results suggest the suitability of SNPs and InDels as candidate markers for berry weight in seedless table grape breeding. The identification of genomic regions associated with berry weight in chromosomes 8, 15 and 17 was achieved with supporting evidence derived from a transcriptome experiment focused on SNP/InDel search, as well as from a QTL-linkage mapping approach. New regions possibly associated with berry weight in chromosomes 3, 6, 9 and 14 were identified.


Background
Grapes (Vitis vinifera L.) is the main fruit crop of temperate regions, with a harvested area of 6.9 million of hectares in 2017 and 74.3 million tons of fruit produced throughout the world in the same period [1]. The grape is a long-lived liana adapted to a diverse range of climates [2][3][4][5]. It has a high level of genetic and phenotypic diversity, represented by the estimated 5000-10, 000 existing cultivars of grapes [6,7], including the whole genus Vitis with over 50 species [8][9][10].
Although high genetic diversity is an advantage for the success of breeding programs [11], it also represents a challenge. The characterization of the existing genotypes is it is very useful to optimize the breeding programs as well as to assist in the development of new varieties. However, juvenility of grapes delays phenotypic evaluation a few years, in particular when fruit characteristics are the main target. This represents a high cost in labor and land space dedicated to the segregant maintenance and evaluation. Therefore, the use of molecular tools providing phenotypic information at early stages of development for new lines can assist the selection (or discarding) of a subset of genotypes, and thus save resources and time.
Due to consumer preferences, the main selection criteria in table grape breeding include berry size and seedlessness (or a reduced seed production) [12], along with organoleptic traits such as flavor and aroma, among others. A strong correlation between berry size and seed content has been described, which is explained by the expression of a particular growth hormone family (gibberellins) in this organ [13][14][15]. Berry size is frequently handled in the field using exogenous applications of gibberellic acid (GA 3 ), with different purposes in wine and table grapes varieties; in the former, the aim is to augment bunch loosening and aeration, while in the latter it is used mainly to increase berry size [16,17], being especially important in seedless grape varieties to achieve commercial size.
Berry size is a polygenic quantitative trait. Numerous processes such as cell multiplication, cell wall modification, water and sugar transport probably define its expression [18][19][20], implying that numerous genes could contribute to explain berry size phenotypic variance. Despite its relatively high heritability and productive impact, the genetic determination of berry size is not yet fully understood [18,19].
The genes VvCEB1 and VvNAC26 were recently reported as major factors determining berry size. VvCEB1 is a bHLH transcription factor involved in the regulation of cell size in cv. Cabernet Sauvignon [26], while VvNAC26 was proposed as a factor determining berry size variation in a wide collection of grape genotypes [27]. Recently we reported and identified a group of candidate genes associated with berry size using transcriptomic data. The candidate genes encode for bHLH transcription factors, a GDSL esterase/lipase, a stilbene synthase, HSP17.9-D, among others [20]. Despite these findings, the genetic determinism and the molecular mechanisms behind berry development are complex and futher analysis and data are required to reveal their genetic architecture. Also, it is necessary to take into account the differential domestication process for table and wine grapes regarding berry size [28]. In genotypes being selected for the production of wine, the tendency was to smaller berries, in order to acummulate more anthocyanin pigments that are concentrated in the skin [29]. Wine cepages (in particular red ones) require certain degree of astringency, what is provided by the tannins present in seeds [28]. Thus, wine and table grapes diverged also in the growing response to the presence of gibberellins, provided mainly by seeds.
Berry size is an economically important trait for grape varieties [17]; it constitutes one of the main selection criteria for table grapes breeding programs among berry quality traits, based on tendency to select larger berries and clusters, in contrast to wine varieties harboring small seeded berries. Therefore, the identification of the genetic basis of berry size variation would provide invaluable knowledge about its genetic determinants [30], in search of molecular tools that could improve the breeding of the species by marker-assisted selection [20].
The development of molecular markers for early selection of seedlings is undoubtedly highly valuable [31,32]. However, markers for fruit quality traits are still scarce [33]. Among the few examples, the marker for presence/ absence of palatable seeds has been successfully applied in MAS [34], as well as the markers related to Muscat aroma [33,35] and to fungus diseases, such as powdery mildew resistance [36][37][38], which are in use mostly in breeding programs intended for table grape varieties.
SNP/InDel represent a valuable source of genetic variability as drivers of new genes or allelic variants, which might be selected by natural or artificial ways when resultant phenotypes exhibit advantageous traits [28]. Marker search based on transcriptome sequencing using high-throughput sequencing has led to rapid discovery of polymorphisms focused on coding regions, avoiding highly repetitive genome regions. It has also become a powerful tool for the detection of causal variants/mutations [39][40][41][42]. Despite the large efforts and current wealth of information from transcriptome sequencing data (RNA-seq), its application for the identification of molecular markers such as SSR, SNP and InDel is still limited [43]. However, some few examples have been reported [42][43][44].
The aim of this study was to perform marker discovery using a transcriptomic approach, in order to identify and characterize SNP and InDel markers associated with berry size in table grape. For this we used a 'Ruby seedless' x 'Sultanina' (RxS) progeny (N = 139), a population containing segregants with contrasting phenotypes for berry size. Among the hundreds of markers initially identified, 30 SNPs and eight InDels were selected and experimentally validated in (RxS) segregants, in table grape varieties and in representative genotypes of a V. vinifera core collection.

Results
Global identification of SNPs and InDels related to berry size in the V. vinifera genome A global search of single nucleotide polymorphisms (SNPs) and insertion/deletion (InDels) related to berry size was performed, using available data from an RNA-Seq experiment based on 14 samples collected at fruit setting and 6-8 mm berry diameter stages, as previously described [20] (Additional file 1: Table S1).
The SNP/InDel calling was performed following the best practices recommended by the GATK software [45] and previously aligned to sequences of the Vitis vinifera reference genome using Tophat [46] (Additional file 2: Figure S1). A total of 771,118 SNPs and 100,205 short InDels were identified. Of the InDels, 55,989 corresponded to insertions and 44,216 to deletions (Fig. 1). The SNPs were rather homogeneously distributed along V. vinifera chromosomes (chr); the largest number was found in chr 18 (57,443) and the lowest in chr 17 (26,455), with an average of 37,709 SNPs per chromosome (Additional file 3: Figure S2A). Many of them were located in intronic regions (35.7%) of the reference genome, followed by exonic regions (14.7%), intergenic (14.3%), downstream (16.3%) and upstream (10.3%) (Additional file 4: Figure S3). There were 7.2% nonsynonymous coding and 8.6% intragenic SNPs. The substitution rate between non-synonymus and synonymous was 0.98 in exonic regions. A total of 488,517 transitions and 284,341 transversions were observed, corresponding to 63 and 37%, respectively, with a transition/transversion ratio (Ts/Tv) of 1.7. The largest number of InDels was also found in chr 18 (7809) and the lowest on chr 10 (3251), with an average of 5095 InDels per chromosome (Additional file 3: Figure S2B). InDels were located frequently in intron regions (34.4%), following by downstream (16%), intergenic (11.6%), upstream (10.4%), and 8.5% were found in exon regions (Additional file 5: Figure S4). SNP and InDel density varied among chromosomes; however, the highest density was observed for both polymorphisms in chr 8 (2.15 SNPs per Kb, 0.32 InDels per Kb), while the lowest density for both was observed in chr 9 (1.32 SNPs per Kb, 0.18 InDels per Kb) (Additional file 6: Figure S5).
In silico selection of SNPs and InDels associated to berry size In order to select putative polymorphisms associated to berry size, we defined a set of criteria that candidate makers should fulfill (See methods section). Missing genotype data (e.g., the absence of reads for the identified polymorphisms) represented the most significant filter, corresponding to 84.5% in the case of SNPs and 77% in the case of InDels (Additional file 2: Figure S1). After this filtering, 41,034 SNPs and 7446 InDels were obtained. The remaining SNPs and InDels were selected using a fixation index (F st ) of 1 [47], which implies that the allele frequency for each SNP or InDel of the same phenotypic class in 'large berry' (LB) segregants differed from those observed in 'small berry' (SB) segregants. F ST analysis selected 382 SNPs and 16 InDels associated to berry size. Putative SNPs were mainly distributed in five chromosomes: 17 (37%), 9 (24%), 15 (12%), 6 (9%) and 8 (7%). The putative InDels were located mainly in chr 6 and 9 (25% each) and chr 11 (12.5%).
Functional anntotation was performed using snpEff software [48], according to the V. vinifera reference genome (PN40024) (Additional file 2: Figure S1). In addition, a functional enrichment analysis (Gene Ontology) was performed to determine main over-represented biological processes in the 232 genes where the 382 SNPs were located. We used the ShinyGO platform [49,50] (Additional file 7: Figure S6). Our results showed that GO categories for biosynthetic process, response to stimulus, developmental process, localization and methylation were over-represented in the candidate gene set (p < 0.01) (Additional file 7: Figure S6).
Subsequently, 177 putative SNPs were selected due to their effects on CDS, 84 of them as 'non-synonymous coding' and 93 as intragenics (snpEff software, see methods). InDels were analyzed for frameshift, intragenics, intronics, as well as 'splice site donor' and 'splice site acceptor'. After the depth coverage analysis, defined as the total number of reads overlapping a given genomic position [51], 68 SNPs and 12 InDels were selected to design specific primers to perform their validation using High Resolution Melting Analysis (qPCR-HRM).
Therefore, there were experimental steps that caused the reduction of the markers to be evaluated, including sequences where primers and PCR products were not possible to be obtained.
In summary, primers were obtained for 45 SNPs and 12 InDels. SNP/Indel primers were discarded considering: 1) no amplification products were observed, or 2) ambiguous or not discriminative HRM/melting curves were obtained. Subsequently, sets of 34 and 10 primers were selected, respectively, for SNPs and InDels to be confirmed by Sanger sequencing. From them, 30 SNPs and eight InDelS were confirmed, representing a validation rate of 88% (30/34) and 80% (8/10), respectively.

Experimental validation of SNP and InDel candidates associated with berry size by qPCR-HRM
To confirm and validate SNPs and InDels associated with berry size by qPCR-HRM, primers were designed for 30 SNPs and eight InDels, respectively (Additional file 8: Table S2, and Additional file 9: Table S3). SNP-calling quality indexes varied between 3899 and 110,535. Primer pairs amplifying fragments among 79 to 183 bp were selected, optimizing HRM and melting curves in order to have confident results.
Then polymorphisms were evaluated in a set of six seedless RxS segregants with contrasting phenotypes for berry weight, plus the parents. An equal number of transitions and transversions were validated; the most frequent changes were T-> C/C-> T and A-> Segregation patterns of parentals were 'abxaa' and 'abxab', indicating in the first case that each parental grouped with small berry (SB) or large berry segregant (LB) groups, according with [20]; in the case of the latter, both parents were heterozygous for the polymorphism and grouped with the SB or LB segregant sets.
This group of 38 markers was linked to 32 genes, 27 genes harboring SNPs and eight genes showing InDels (Tables 1 and 2). Interestingly, two genes presented SNPs and InDels in their sequences, one coding for a probable serine incorporator (Vitvi09g01251, two SNPs/ one InDel), and a gene coding for chloroplastic pheophorbide oxygenase (Vitvi06g00063, one SNP/one InDel); the latter gene has been related to plant senescence [52].
The group of 32 genes presenting polymorphisms associated to berry weight is a diverse group (Tables 1 and  2), including genes coding for histone deacetylase 6 (Vit-vi17g00894), reported as a chromatin remodelling which contributes to transcriptional repression [53]. In addition, five genes coding for proteins with unknown function, three for transcription factors corresponding to the Myb family, a BTF3 homolog 4 and repressor MYB5 were identified. None of these genes showed significantly different expression when comparing large and small berry genotypes at different berry development times [20]. GO categories biosynthetic process and localization were over-represented in this set of genes (p < 0.05) (Additional file 10: Figure S7).

Genotype-phenotype association analysis of SNPs/InDels for berry weight
The results were analyzed using a global linear model (GLM) [54], to determine if there is a statistical association of phenotypic data regarding berry weight with the information provided by the candidate markers. Individuals from the RxS progeny showed significant associations for SNP and InDel markers located in chr 6, 8, 15 and 17 with berry fresh weight (BFW) (p < 0.05) (Table 3a). A group of six candidate SNPs located in chromosome 17 encompassing a region of 6.8 Mbp exhibited high association with berry weight (Table 3a).
Significant associations between SNP and InDel markers and berry quality traits including bunch weight (BWe), berry length (BL), berry width (BWi) and berry volume (BV) were determined (Table 3b, c, d, and e). Markers significantly associated with BWe were found only in chr 17 (Table 3b). Markers significantly associated to BL and BWi were found in chr 6, 8, 15 and 17 (BL) and in chr 6, 9, 15 and 17 for BWi (Table 3c and d). Markers significantly associated with BV were detected in chr 6, 14 and 17 (Table 4e).
The previous results are concordant with the findings using Random Forest (RF) analysis (Additional file 11: Figure S8) and unbalanced ANOVA (Additional file 12: Table S4), which support complementary information of markers with association to BFW as well as estimating their possible effects on this trait. According to the RF regression results, nine candidate SNPs including seven SNPs located on chr 17, one on chr 14 and one on chr 3, plus one InDel on chr 6 (TSRNAINDELS120073669) were significantly associated to berry weight using 1000 permutations (p < 0.05) (Additional file 11: Fig. S8). In fact, six SNPs located in chr 17 ( Fig. 2) were the best candidate markers for BFW (p < 0.001) (Additional file 11: Fig. S8). The marker contributions to BFW trait estimated by the unbalanced ANOVA suggest that they ranged from 1.4 to 33.0% (Additional file 12: Table S4), showing a non-additive participation.
Subsequently, SNP and InDel markers were evaluated to determine their informativeness in a set of 41 varieties with different genetic backgrounds (Fig. 3), including 21 table grapes varieties representing the most planted in the Chilean vineyard and other countries (Additional file 13: Table S5), and a core collection of 20 varieties harboring a large proportion of the genetic diversity for the Prole 'Orientalis' of V. vinifera (Additional file 14: Table S6).
Results obtained using a GLM model showed several degrees of association of these markers with BFW (10 to 30%) (Table 4a). In fact, SNPs and InDels significantly associated to BFW (p < 0.05) were identified on chr 3, 6, 8 and 15 (Table 4a). Significant associations between SNP and InDel markers and berry length (BL) and berry width (BWi) were also identified (p < 0.05) (Table 4b and  Association results for BL and BWi traits varied from 11 to 25% and 11 to 32%, corresponding respectively, to the proportion of explained variation. TSRNASNPS120468689 and TSRNASNPS120206839 markers, located in chr 3 and 15, respectively, showed significant associations with the three evaluated traits in at least one season.

Integration of differentially expressed genes, SNP/InDel polymorphisms and QTLs for berry weight
Physical coordinates were compared between markers and candidate genes to evaluate the co-localization of  Table S7. Figure 4 shows the co-localization between polymorphisms and DE genes. DE genes significantly correlated with component 1 of the principal component analysis (PCA-1), which explains the largest phenotypical variance between large (LB) and small berry (SB) segregants in fruit setting and berry 6-8 mm diameter stages, are shown in orange and cyan, indicating over-expression in SB and LB, respectively.
Four SNPs and 25 DE genes were found in chr 8, one of them significantly correlated with PCA-1 (− 0.96), coding for a putative mitochondrial 2-oxoglutarate/malate carrier (Vit-vi08g01801) suggesting over-expression in LB segregants. Eight SNPs plus three InDels and 25 DE genes were found in chr 9, two of them coding for 18.6 kDa class III heat shock protein (HSP) (Vitvi09g00045) and anthranilate Nbenzoyltransferase protein 2 (Vitvi09g01229), both negatively correlated with PCA-1 (− 0.95 and − 0.96) and overexpressed in LB segregants. One SNP and 28 DE genes were identified in chr 14, two of them coding for a copper chaperone (Vitvi14g00270) and a probable galacturonosyltransferase 13 (Vitvi14g02028), both significatively correlated with PCA-1 (− 0.97 and 0.96,), over-expressed respectively in LB and SB segregants. Five SNPs and one InDel, as well as 15 DE genes were identified in chr 15, one of them coding for one O-acyltransferase WSD1 (Vitvi15g00960) highly correlated with PCA-1 (1), indicating its over-expression in SB segregants. Seven SNPs and 25 DE genes were found in chr 17, seven of them significantly correlated with PCA-1. Two Table 1 Summary of 30 SNPs with association to berry fresh weight (BFW), derived from transcriptomic approach based on seedless segregants exhibiting contrasting phenotypes for berry weight. All of them were validated in RxS segregants and grapevine varieties as well (Continued)   (Fig. 4).

Discussion
Identification of SNPs and InDels comparing segregants with contrasting phenotypes for berry size Single Nucleotide Polymorphisms (SNPs) and insertion/ deletions (InDels) are the most abundant polymorphisms in every species studied, including plant genomes [55,56]; both are ubiquitous in genomes, and although less frequent than SNPs, InDels exhibit greater diversity in size [57][58][59]. The massive utilization of SNP markers in genetic studies in plants is based on their stability, availability, discriminative power and reproducibility [18,[60][61][62][63].
Our approach was based on the comparison of the transcriptomes of contrasting individuals (segregants of RxS cross) in early stages of berry development (Additional file 1: Table S1). These stages correspond to the first part of the double sigmoid growing curve characteristic for grape berries. Both stages are coincident with  Table 3 Genotype-phenotype association analysis results including markers that showed significant associations with berry fresh weight (BFW) (A), bunch weight (BWe) (B), berry length (BL) (C), berry width (BWi) (D) and berry volume (BV) (E), as determined in a set 13 RxS segregants with contrasting phenotypes for berry size. P-values of associations and variance explained by each SNP/InDel marker (R 2 ) are indicated for the GLM models obtained for seasons 2010, 2011 and 2012. Statistically significant associations are highlighted with an asterisk (p < 0.05) A.
The first approach in this study was directed to the study of SNPs located in DE genes inferred from transcriptomic analyses that compared seedless segregants with contrasting berry sizes [20]. However, these SNPs showed a rather modest association with berry size. Due to this, in a second approach all polymorphisms initially identified were included in the analysis regardless of their genomic location, following the pipeline for filtering and selection presented in Additional file 2: Figure   S1. The observed distribution of SNPs between V. vinifera chromosomes was similar to that previous reported by [30], the largest number in chr 18 (57,443) and the lowest in chr 17 (26,455) (Additional file 3: Figure S2A).
Considering the results of the analysis of the transcriptomic data, it is relevant to discuss the identification of SNPs (14.3%) and InDels (11.6%) in non-coding regions such as intergenic sequences, some of which could be regulatory regions. The most plausible explanation for this finding could derive from the fact that the grapevine annotation, as in other plant genomes, has been basically an automated process, and because of that possibly harboring some errors. The detection of polymorphisms in Table 4 Genotype-phenotype association analysis results including markers that showed significant associations with berry fresh weight (BFW) (A), berry length (BL) (B) and berry width (BWi) (C) determined in a set of 41 grape varieties including table grape varieties (21) as well as core collection varieties (20), harboring a large proportion of the genetic diversity for Prole 'Orientalis' of Vitis vinifera. P-values of associations and variance explained by each SNP/InDel marker (R 2 ) are indicated for the GLM models obtained for seasons 2016, 2017 and 2018. Statistically significant associations are in bold and highlighted with an asterisk (p < 0.05) A. intergenic regions is concordant with findings previously reported using similar RNA-seq approaches [43,69,70]. These results might imply that some coding regions or even complete genes have not been considered in the current annotations [69]. In fact, several efforts have been developed to curate the Vitis vinifera gene model annotation [71], and RNA-Seq data represent a valuable tool that could be useful to confirm gene model. Furthermore, evidence obtained studying the Sultanina genome [28], allowed the identification of many transcripts that did not match to the ones deduced from the reference genome, PN-40024. This inconsistency in part could be explained by the presence of larger than expected transcripts in 'Sultanina'. Therefore, in using the reference genome for the description of our transcripts, it is possible that some of these "intergenic" sequences are indeed part of genic or regulatory sequences, being in fact "false intergenic" sequences. In any case, considering the difficulties inherent to the annotation of gene functions or determine the functional impact of polymorphisms located at non-coding or regulatory regions, SNPs located in coding regions were selected for further experimental evaluations [43]. Interestingly, SNPs and InDels were also found within intronic regions accordingly with our RNA-Seq data. Even though transcriptomic protocols specifically enhance mature mRNA by the selection of polyadenylated RNA or reduce the presence of ribosomal RNAs [72], the presence of polymorphisms at introns could be explained by pre-mRNAs retention [70,72], at least in part. Moreover, our results are in agreement with previously reported studies in Populus nigra were 15% of SNPs were located at intronic regions, using a similar approach to analyze transcriptomic data for polymorphisms discovery [70]. In fact, one of the InDels described here, located at an intronic region (TSRNAIND ELS120073761), was experimentally confirmed (Table 2).

BFW
By using a series of filtering approaches, we identify 38 polimorphisms for berry size, 30 SNPs and eight InDels (Tables 1 and 2). They were biallelic and with a MAF of at least 20%. Twenty-one percent of the SNPs were located in exons, with non-synonymous and intragenic changes representing 7.2 and 8.6%, respectively. These results agreed with data reported in maize using Hap-Map2, in which 21% of the SNPs were present in intragenic regions. This evidence suggests that polymorphisms in coding sequences are less frequent than in non-coding regions [73]. The polymorphisms were located in eight chromosomes of V. vinifera, chr 3, 6, 8, 9, 14, 15, 17 and 19 (Fig. 2). Interestingly, 82% of these 38 polymorphisms are in only four chromosomes (6, 9, 15 and 17). In addition, co-localizations of SNPs and InDels in chr 6, 9 and 15 were verified (Fig. 2).
SNPs located in coding regions with non-synonymous and intragenic effects were preferently selected. Functional non-synonymous SNPs were selected in coding regions based on possible effects of these changes over genes function, which may result in changes in aminoacidic residues of the encoded protein, interfering with transcription or protein synthesis [73,74], as well as changes in structural properties or altered functions in proteins. Due to their significant effect, nonsynonymous coding SNPs are considered the most significant drivers of expression regulation, and therefore ideal markers for association studies of quantitative complex traits [73,75]. However, they exhibit a lower frequency than silent substitutions [73,74]. Using a similar strategy in rice ca. 200 genes containing nonsynonymous SNPs related to Rhizoctonia solani were identified [73], and non-synonymous SNPs associated to drought tolerance in maize were also reported [76]. The selection criteria for InDels was less strict due to the much lower number of this type of polymorphism.

Evaluation of genotype-phenotype association of SNPs/ InDels markers and berry size
The platform we selected for the validation SNP and InDel markers was PCR coupled with High Resolution Melting (qPCR-HRM) analysis, a post-PCR detection methodology. It was chosen because of its high sensitivity, speed, accuracy and feasibility for the analysis of double-stranded PCR products dissociation, such as SNPs, InDels [28,66,77,78] and SSRs [79]. The amplification conditions were suited for each SNP/InDel marker (Additional file 8: Table S2, and Additional file 9: Table S3), obtaining reliable HRM and melting curves for segregants and varieties analysed.
Genotype-phenotype association was performed using a subset of 13 segregants of RxS cross, including six segregants with contrasting phenotypes for berry weight which were sequenced, and seven additional segregants Fig. 4 Co-localization of SNP/InDel candidates for berry size and DE genes between large and small berry segregants in V. vinifera chromosomes. Thirty-eight candidate polymorphisms for berry weight were identified using a search of structural variants based on a transcriptome experiment [20]. SNPs and InDels are represented in blue and red, respectively. A subset of candidate genes associated with berry weight was also co-  (Table 3a). We observed complementary information from GLM [54], RF and unbalanced ANOVA which confirmed the 10 most important polymorphisms associated to berry weight and estimated their possible effects on this trait. Six of the seven candidate SNPs identified in chr 17, within a region that encompasses 6.8 Mbp, presented a genotype-phenotype association up to 50%; one InDel described in chr 6 (TSRNAINDELS120073669) was significatively relevant for berry weight. The four SNPs identified in chr 8 covering a region of approximately 1.8 Mbp, they presented a genotype-phenotype association up to 50%, but less stable than other markers.
Estimates of the polymorphism contribution in berry weight using unbalanced ANOVA suggested that they ranged between 1.4 to 33% (Additional file 12: Table S4), along with the possible polymorphic linkage associated to specific regions in V. vinifera chromosomes. These results could imply that their contribution in berry weight is not additive, something that could be explained considering that ANOVA is an univariant analysis, evaluating one single marker per test, and that possible allelic interactions between markers, such as epistasis, were not considered in the model. Altogether, these results suggest that there is a good chance for both groups of markers, SNPs and InDels, to be used as a substrate to develop selection markers applicable in table grape breeding programs. SNP and InDel markers significantly associated to other quality traits were also identified, including BWe, BL, BWi and BV (Table 3b, c, d, and e), which might suggest a possible interplay between genetic determinants of grape berry-related traits [30].
The multiple evidence observed considering distribution and association of polymorphisms to berry size suggested the presence of regions probably linked to this trait, which explained different levels of the phenotypic variance for berry weight, based on the similar associations observed between polymorphisms, behaving as blocks or QTLs, probably derived from low recombination events.
In relation to the informativeness of SNP and InDel markers on berry weight trait in grapes varieties, our results emphasize the importance of a wide validation using bi-parental populations as well as varieties with different genetic backgrounds. Analysis performed in biparental populations are frequently population-specific and need to be tested in wider genetic backgrounds [80].
Significant associations between markers identified on chr 3, 6, 8 and 15 and BFW were determined using a GLM model (10 to 30%) (p-value< 0.05) (Table 4a). In addition, our results also showed SNP and InDel markers significantly associated to grape berry-related traits such as BL and BWi (Table 4b and c). The next step will be focused in the evaluation of this set of markers in a broad group of table grapes varieties, in order to select the best combination of them as selection tools for BFW.
Integration of transcriptome analysis, regions containing candidates SNPs/InDels and QTLs described for berry size SNP and InDel markers identified in this study were allocated in eight chromosomes of V. vinifera (Fig. 2) and associated which genes involved in complex metabolic pathways. These results are concordant with overrepresented GO categories identified in these set of genes such as biosynthetic and developmental processes, response to stimulus, transport and localization as well as chromatine remodeling (Additional file 7: Figure S6, and Additional file 10: Figure S7).
The co-localization of regions containing SNP and/or InDel markers with previously reported QTLs for berry weight were analyzed. Significant QTLs have been reported in nine linkage groups. For example, QTLs for berry weight located in LG8, LG11 and LG17 were reported [18], explaining 7 to 31% of phenotypic variance. Our study identified possible co-localizations in three LGs: LG8 and LG17 [18] and LG15 [23].
The physical distance between the QTL described in LG8 by [18], identified in the SxG population and the region where four candidate SNPs for berry weight were located, corresponding to approximately 0.8 Mbp. Five SNPs and one InDel were identified for the QTL associated with berry weight in LG15 [23]. However, the physical coordinates of this QTL were not available in order to estimate the distance between them.
The main QTL located in LG17 was identified in two populations. The first was in seedless table grapes (MTP3140), which included cv. 'Sultanina' in one parental's pedigree, while the second corresponded to the cross of the wine varieties 'Syrah' × 'Grenache' (SxG). The estimated distance between the QTL interval in LG17 and the region of 6.8 Mbp where the group of six SNPs with highest genotype-phenotype association was 1.9 to 2.1 Mbp, depending on the population analyzed [18]. The identification of this QTL in wine (100% seeded) and table grapes (partially seedless) populations might suggest that the association with berry weight is independent of differential major characteristics such as the presence/absence of seeds. The identification of these QTLs in the same genomic regions for wine and table grape genetic backgrounds strongly suggests that this is a key genetic factor related to berry size.
Thus, the multiple evidence presented in this study confirm previously reported QTLs associated with berry size in LG15 [23], LG8, LG11 and LG17 [18], reinforcing the idea that several genes distributed in different genomic regions (LG) are probably supporting the final expression of this trait.
However, neither SNP nor InDel markers associated with BFW were identified in this study for the QTLs described in LG1 and LG12 [21], LG5 and LG13 [22], and LG18 [24,34], although the last is related to a major QTL controlling seed content [34]. A plausible explanation could be based in the selection of seedless segregants with contrasting phenotypes for BFW, which might mask the effect of QTL located at LG18.
Significant associations between SNPs and BFW were recently reported in chr 1 related to candidate gene NAC26 [27] and in chr 17, 18 and 19 [30]. Considering the polygenic control of BFW, it has been proposed that different causal polymorphisms might segregate among populations [18,30].
The observed distribution of SNPs and InDels identified in this study allowed us to find four new regions associated with berry weight in chr 3, 6, 9 and 14. The 38 SNP and InDel markers associated with berry size were distributed in 32 genes; none of which was reported as differentially expressed (DE) between segregants with contrasting phenotypes for berry size in the previous work of our research group [20]. The co-localization of DE genes derived from transcriptome analysis and regions where polymorphisms were identificated allowed us to find at least 190 DE genes to be considered as potentially related to berry size (Fig. 4). Twenty-one of them are particularly interesting due to their relationship with PCA-1 and showing a higher discrimination value between large and small seedless segregants. This group of genes included functional categories associated to transcription regulation, cell wall modification, transport of metal ions, water and organic acids, response to biotic/abiotic stress, protein degradation and protein-kinase activation [20]. Therefore, the colocalization among DE genes and the regions containing SNPs and InDels might support the observed association between candidate polymorphisms and berry size, specially in the case of the 68 DE genes involved in berry size determination [20].
For instance, the single SNP reported at chr 3 was associated to a predicted Ca2 + −dependent phospholipidbinding protein (Table 1) possibly related to annexins which have a relevant role in signalling and adaptation as key factors in cold, oxidative, saline, and abscisic acid (ABA) stress responses [81]. Four DE genes significantly correlated with PCA-1 were also identified in chr 3, coding for a F-box At2g16365 protein, a putative secretory protein, an AMP-activated protein kinase gamma regulatory subunit and a pathogenesis-related protein 5 (Additional file 15: Table S7); the first and third are possibly involved in protein degradation/proteasome and protein modification/kinase, respectively, and the last in stress/ defense responses [20]. Chr 6 presented four InDels and three SNPs, colocalizing a gene coding for GCN5-related Nacetyltransferase (GNAT) considered as a possible transcription activator associated with chromatin assembly and DNA replication [82] as well as expansin-A8 possibly associated with cell expansion (Fig. 4). Four SNPs were identified in chr 8, co-localizing putative mitochondrial oxoglutarate/malate transporter protein 2, also upregulated in large berry segregants, possibly associated with vacuole transport and cell turgor in plants [83] (Fig.  4; Additional file 15: Table S7).
Of the eight SNPs and three InDels identified in chr 9, two genes were up-regulated in large berry segregants coding for heat shock protein 18.6 kDa class III and anthranilate N-benzoyltransferase 2 protein, both related to stress and defense response, co-localized with the polymorphism. Genes coding for WAX2 protein and a putative boron transporter 2, respectively associated to drought in plants [84] and cell wall integrity [85,86] are also located in the same genomic region (Fig. 4).
Five SNPs and one InDel found in chr 15 where related to genes coding for a beta-fructofuranosidase (invertase), a phosphatidate cytidylyltransferase, a scarecrow-like protein 6, a transketolase 10 and a serine/threonine-protein kinase AtPK2/AtPK19, possibly involved in signaling, activation and biosynthetic process. Seven out of 21 DE genes found in chr 17 were related to berry size determination [20] including expansin-A15, up-regulated in large berry segregants which could be finally responsible for the large berry phenotype and also between SNP variants described in this region and berry size ( Fig. 4; Additional file 15: Table S7). SNPs markers and the QTL on LG17 colocalize with a candidate domestication locus of 5 Mbp described by [10] in the study of the genetic structure of V. vinifera, supporting the idea of a positive selection of allele(s) for larger fruit, such as the SNPs described here, during the domestication process. The putative orthologue in V. vinifera of the gene coding for cytochrome P450 78A, a gene that partially controls fruit weight in tomato and has been associated with its domestication [87], also co-localized in LG17 with the major QTL for berry weight described by [18]. This may suggest conservation of the mechanism of fruit weight control among physiologically and phylogenetically distant species.
Our results are in line with the hypothesis that variations in coding regions as well as in the promotor of the genes are equally important in the determination of complex traits [73]. Equivalent results were observed in maize, where the analysis of quantitative traits showed that 79% of the variations might be explained by SNPs and InDels located in the genes or up to 5 Kb upstream of them [76].
Finally, using an integrated analysis based on RNA-Seq, SNP/InDel search and validation on table grape segregants and grape varieties with different genetic backgrounds, we identified a set of informative and transferable SNP and InDel markers associated with berry size. These results are of interest to achieve better understanding of a complex trait like berry size and could facilitate successful transferability of molecular markers suitable to be integrated in marker-assisted grape breeding programs.

Conclusions
We identified a set of informative and transferable SNP and InDel markers associated with berry size, using a combined analysis based on RNA-Seq, SNP/InDel search and validation on table grape segregants and grape varieties with different genetic backgrounds. Our results suggest the suitability of SNPs and InDels as candidate markers for berry weight in seedless table grapes. The identification of regions possibly associated with berry weight in chromosomes 8, 15 and 17 was achieved with supporting evidence derived from a transcriptome experiment focused on a SNP/InDel search, as well as a QTL-linkage mapping approach. In addition, new regions possibly associated with berry weight were identified in chromosomes 3, 6, 9 and 14.

Plant material
The population derived of 'Ruby seedless' x 'Sultanina' cross (RxS, n = 139) is planted at the La Platina Experimental Station of the Instituto de Investigaciones Agropecuarias (INIA-La Platina), located in Santiago, Chile; vines are conducted through a trellis system ('Spanish parron'), grafted over cv. 'Sultanina', with two to four replicates being managed under standard conditions as previously described by [20]. This germplasm belongs to the table grape breeding program of INIA. It is available for experimental purposes and, as such, has been used in a number of previous studies [15,20,24,34]. In this case, it has been phenotyped during the 2010 to 2012 seasons for berry size and related subtraits, as well as for a number of traits such as sugar content and titratable acidity, seed weight, rachis architecture and GAresponsiveness, among others.
To evaluate the informativeness of the SNP and InDel markers, a set of 41 varieties with different genetic backgrounds was genotyped, including 21 table grape varieties which include the most planted in Chilean vineyards and other countries (Additional file 13: Table  S5) and 20 varieties harboring a large proportion of the genetic diversity for Prole 'Orientalis' of V. vinifera (Additional file 14: Table S6). This germplasm is established at INIA-La Platina, and is managed by the Genetic Resources Group; it has been phenotyped during several seasons (for instance, from 2016 to 2018) for a number of traits such as berry and seed fresh weight, berry length and berry width, among others (Fig. 3). Vines are conducted through trellis system ('double cordon'), with three to five replicates per genotype. Grapevine varieties are managed under standard conditions for watering, fertilization, pest and disease control and pruning. All field experiments followed institutional guidelines as well as local legislation.

Experimental design and sample collection
Both parents plus six segregants of the RxS cross (N = 139) with contrasting phenotypes for berry size and weight (Fisher test, p < 0.05), i.e., small berry (SB) and large berry (LB), all seedless, were selected and sampled for transcriptome analysis according to [20] (Additional file 1: Table S1). Berry samples were collected at fruit setting (FST) and berries of 6-8 mm diameter (B68), also corresponding to 30 and 45 days after flowering (phenological stages of E-L 27 and E-L 31 [88,89]). Each genotype was sampled in two or three replicates (clones), later considered as biological replicates.

RNA extraction, sequencing and data analysis
Pericarp and mesocarp tissues were homogenized and analyzed together for the RNA-Seq experiments, as previously described by [20]. Total RNA was isolated from 3 to 4 g of frozen tissue using the modified hot borate method [90]. RNA integrity was determined using a BioAnalyzer prior to sequencing, selecting samples with RNA Integrity Number (RIN) values over 7.0. Sample sequencing was performed as described [20]. A total of 14 samples of V. vinifera were obtained from a group of six segregants and parents from the RxS cross (Additional file 1: Table S1). This subset was part of an experiment previously described [20]. Segregants sharing the same phenotype (SB or LB) were considered as biological replicates in sequencing experiments. Approximately 10 million single-end reads were obtained per sequenced library, with an average length of 50 bp. Quality trimming (Q20) and alignment to the grapevine reference genome (PN40024 12X.v1) [91] was done as described [20]. Multiple reads with more than 20 hits were discarded. The reference grapevine genome and the gene annotation were downloaded from the GENOSCOPE database [92]. Subsequently, the gene annotation was updated using VCost.v3 annotation [93,94]. The RNA-Seq data used in this study is available at the NCBI's Sequence Read Achieve [95] (http://www.ncbi.nlm.nih.gov/sra) with SRA Study accession number SRX366617 [96].

SNP and InDel calling and annotation
A global search of SNP and InDel polymorphisms was developed using reads aligments and the Tophat mapper, with previous removal of PCR-duplicates using remove_duplicates tool from Picard Tools [97,98] (Additional file 2: Figure S1). The Structural Variant Calling of Genome Analysis Toolkit (GATK) [46] and the Haplotype caller tool were used to identify both SNPs and InDels; for the latter, re-alignments were performed in the regions analyzed (Additional file 2: Figure S1). Polymorphisms are named as 'TSRNASNP' or 'TSRNAI NDEL' using consecutive numbers. Polymorphisms were subsequently filtered using Variant Call Format (VCF 4.0) tools [47]. The following criteria were used to select putative polymorphisms associated with berry size: i) selection of biallelic polymorphisms; ii) no missing data; iii) a minimun separation of 100 bp ('thin'); and and iv) a MAF (Minor Allele Frequency) of 0.2, previously evaluated at four levels of 0.1, 0.2, 0.3 and 0.4 (Additional file 2: Figure S1). Also, a fixation index (Fst) of 1 or 0 was also applied among libraries of small berry (SB) and large berry segregants (LB), considering the expected genotypical class for each SNP or InDel. For example, the same genotypic class for one SNP or InDel predicted for the three SB indivuals should be different than the genotypic class in LB individuals.
The functional annotation of selected polymorphisms was performed using the Variant Annotation and Effect prediction tool SnpEff software [48,99], according to the gene model based on the Vitis vinifera genome reference (PN40024 12X.1) (Additional file 2: Figure S1). Intragenic and non-synonymous coding SNPs were selected.

Gene ontology analysis
A gene ontology (GO) enrichment analysis was performed ShinyGO v0.61 tool [49,50]. The frequency of query genes was compared with the complete reference genome for V. vinifera (PN40024), searching for possible enrichment in biological processes. Enrichment analyses were based on a hypergeometric distribution followed by FDR correction. Significant GO terms (p < 0.05) were reported.

Primer design
Specific primers were designed to validate SNPs and InDels using PRIMER 3 software [103] with the following parameters: amplicon length among 80 to 160 bp, primers length between 18 and 23 bp, and annealing temperature (Tm) range within 58°C-62°C. Primer dimer formation as well as primer complementarity were checked in silico using Operon software [104]. Primers were synthetized by Integrated DNA Technologies (IDT) (Coralville, Iowa, US). Additional file 8: Table S2 and Additional file 9: Table S3 summarized the primers used in High-Resolution Melting analysis (qPCR-HRM).

DNA extraction
Total DNA was extracted from young leaves according with [28]. The purified DNA was treated with DNAasefree RNAase A and measured as described [28]. DNA samples with concentrations higher or equal to 40 ng/uL were used in the analyses.
Experimental validation of SNPs/InDels associated to berry size using High-Resolution Melting analysis (qPCR-HRM) The validation of SNPs was developed according to the protocol of [28], with modifications. Reactions were optimized for each pair of primers considering annealing temperature (58°C, 60°C and 62°C) and primer concentration (0.2 μM, 0.4 μM, and 0.6 μM) (Additional file 8: Table S2). For InDels, the conditions reported by [77,78] were optimized considering annealing temperature (58°C, 60°C and 62°C) and primer concentration (0.3 μM, 0.5 μM and 0.7 μM) (Additional file 9: Table  S3). Real time PCR and HRM reactions were done and scored as described by [28]. Reactions were based on EvaGreen®, using a primer concentration that ranged from 0.2 μM to 1.1 μM, plus 1 ng of DNA template in a total reaction volume of 10 μL. Three replicates were used per individual. Genotypes were manually assigned by examining melting and HRM plots previously normalized and derivatized.
Polymorphisms were confirmed by sequencing amplification products (Macrogen Inc., Korea), and sequenced samples were used as controls. The qPCR-HRM amplicons of varieties 'Sultanina' and 'Ruby seedless' as well as RxS segregants were quantified using the method described by [28]; samples were sequenced only if DNA concentration was higher or equal to 20 ng/uL. SNP/InDel amplified fragments were aligned to the reference grapevine genome using Sequencher software (Gene Codes Corporation).
Genotype-phenotype association analysis between SNPs/ InDels and berry weight using random Forest analysis, GLM model, unbalanced variance analysis (ANOVA) Three methods were used to determine significant associations between SNP/InDel markers and berry quality traits: GLM based on TASSEL, Random Forest and unbalanced variance analysis (ANOVA), using a subset of 13 RxS segregants. Phenotypic data were normalized using logarithmic transformation to fulfill the normality assumption in the statistical analyses [27].
Genotype-phenotype association analysis were performed using SNP/InDel markers for berry fresh weight (BFW), bunch weight (BWe), berry length (BL), berry width (BWi) and berry volume (BV), in seasons 2010, 2011 and 2012, using the Global Linear Model (GLM) implemented in software TASSEL v.5.0 [54].. The group of 13 seedless segregants from RxS cross and both parentals were genotyped with using the 38 SNP and eight InDel markers. The group of segregants included the six segregants previously sequenced as well as seven segregants of the RxS cross, all of them phenotyped during three consecutive seasons. GLM finds the ordinary least squares solution for each marker-trait combination [105,106]. Variance explained and p-values for each SNP/InDel marker were recorded. Subsequently, a Random Forest (RF) analysis as well as unbalanced ANOVA were developed in order to determine significant associations between SNP/InDel markers and BFW. RF corresponds to a multivariate analysis mainly used for statistical classification and regression. An RF model is an ensemble of unpruned classification or regression trees created using bootstrap samples of the training data and random feature selection in tree induction [107]. The importance of the variable was estimated for SNPs and InDels markers with association to berry fresh weight, which can be interpreted according to the relative ranking of the significant predictors [108] based on the RF regression of the phenotypic values observed in the group of 13 seedless segregants with contrasting phenotypes for berry weight and the genotypical data. The analysis was performed using 1000 permutations test, the 'randomForest'package of R statistical software and a p-value of 0.05.
Then a one-way variance analysis (ANOVA) was performed using unbalanced data and the group of 38 candidate markers in order to determine their individual effect (R 2 ) on berry weight according with the proposed model, using the stats package of R (R 3.6.2) [109]. The coefficient of determination of the model was performed using R and considered the sum of mean squares, where yij is the observation i of berry fresh weight phenotype according to the genotype of j for marker x; μ is the overall mean; xj represents genotype j of the marker x; and ε is the error term.
Estimation of the effect of the SNP and InDel markers associated with berry weight using GLM model and a set of 41 grapes varieties Genotype-phenotype association analyses were performed using phenotype and genotype data derived from a set of 41 varieties and SNP/InDel markers for berry fresh weight (BFW), berry length (BL) and berry width (BWi), in seasons 2016, 2017 and 2018, using the Global Linear Model (GLM) implemented in software TASSEL v.5.0 [54]. For each marker-trait combination, GLM finds the ordinary least squares solution as described in [105]. Variance explained and p-values for each SNP/InDel marker were recorded.