Genome-Wide Association Mapping of Correlated Traits in Cassava: Dry Matter and Total Carotenoid Content

Cassava is a starchy root crop cultivated in the tropics for fresh consumption and commercial processing. Primary selection objectives in cassava breeding include dry matter content and micronutrient density, particularly provitamin A carotenoids. These traits are negatively correlated in the African germplasm. This study aimed at identifying genetic markers associated with these traits and uncovering whether linkage and/or pleiotropy were responsible for observed negative correlation. A genome-wide association mapping using 672 clones genotyped at 72,279 single nucleotide polymorphism (SNP) loci was performed. Root yellowness was used indirectly to assess variation in carotenoid content. Two major loci for root yellowness were identified on chromosome 1 at positions 24.1 and 30.5 Mbp. A single locus for dry matter content that colocated with the 24.1 Mbp peak for carotenoids was identified. Haplotypes at these loci explained 70 and 37% of the phenotypic variability for root yellowness and dry matter content, respectively. Evidence of megabase-scale linkage disequilibrium (LD) around the major loci of the two traits and detection of the major dry matter locus in independent analysis for the white- and yellow-root subpopulations suggests that physical linkage rather that pleiotropy is more likely to be the cause of the negative correlation between the target traits. Moreover, candidate genes for carotenoid (phytoene synthase) and starch biosynthesis (UDP-glucose pyrophosphorylase and sucrose synthase) occurred in the vicinity of the identified locus at 24.1 Mbp. These findings elucidate the genetic architecture of carotenoids and dry matter in cassava and provide an opportunity to accelerate breeding of these traits.


INTRODUCTION
Balding, 2009). To control for these confounding factors, three standard GWAS models were 188 compared: a simple one-way ANOVA model with no correction (naïve model); a general 189 linear model (GLM) with the first five PCs of the SNP matrix as covariates (GLM + 5PCs); and 190 a mixed-linear model (MLM) using the five PCs and marker-estimated kinship matrix (Yu et 191 al., 2006). The models correcting for kinship and 5 PCs had the lowest inflation-factors as 192 determined from quartile-quartile (QQ) plots and therefore the lowest false-discovery rate 193 (Supplementary Figure 1). The association analyses were implemented in TASSEL (Bradbury 194 et al., 2007;Zhang et al., 2010). Association test P-values were considered significant when 195 more extreme than the Bonferroni threshold (with experiment-wise type I error rate of 196 0.05).

198
The patterns and extent of linkage disequilibrium (LD) in a population not only determines 199 the obtainable resolution in association mapping studies (Hamblin et al., 2011) but also has 200 strong implication in the interpretation of association peaks. Therefore the level of LD decay 201 and the local patterns of LD along each chromosome were determined by calculating intra-202 chromosomal pairwise squared correlation (r 2 ) using PLINK (Purcell et al., 2007). Phenotypic variability: We investigated the phenotypic variation in dry matter content as 211 well as carotenoid-based intensity of yellow root color using a visual color chart and 212 chromameter. The dry matter content varied widely in the Genetic Gain population ranging 213 from 8.4% to 45% (average 28.6%, Table 1). About two-thirds of the evaluated clones have 214 white storage roots while the remaining showed a range of yellow color suggesting varying 215 levels of carotenoid content. On average, the visual score was 1.7 and ranged from 1 (white) 216 to 7 (yellow). The average chromameter measure of yellow color intensity (b* value) was 217 20.8 and ranged from 11.1 (white) to 40.8 (yellow). Dry matter was approximately normally Exploration of the LD landscape along Chromosome 1 uncovered a mega-base-scale region 283 of low recombination extending from 22Mb to 33 Mb surrounding the association peaks for 284 dry matter content and yellow color (Figure 7). This region was recently shown to coincide 285 with a large Manihot glaziovii introgression segment (Bredeson et al., 2016) that traces back 286 to early breeding for resistance to cassava mosaic and cassava brown streak viruses in the 287 1930's (Hahn et al., 1980). Clustering of the Genetic Gain population based on identity-by-288 descent relationship (i.e. a measure of how many alleles at any marker in each of the two 289 samples came from the same ancestral chromosomes) calculated using only markers from 290 this extensive LD region (2150 SNPs from markers S1_21567540 to S1_34950326) revealed 291 at least two major groups of accessions (Supplementary Figure 2), indicating presence of 292 few major haplotypes associated with the LD blocks.

294
GWAS for dry matter content in white root and yellow root subpopulation: If the 295 phenotypic association between dry matter and carotenoid contents and the colocation of 296 their association signals (~ 24.1 Mbp region) is largely caused by physical linkage rather than 297 by pleiotropy, the major dry matter locus should be detectable in both white root and 298 yellow root germplasm when analyzed independently. We therefore split the Genetic Gain 299 dataset into white root (n=210) and yellow root (n=427) subpopulations and repeated the 300 GWAS analysis. Clones that were at the borderline between yellow-root and white-root 301 were excluded from these analyses. To mitigate the loss of power as a result of double-302 fitting markers in the MLM model both as a fixed effect tested for association and as a 303 random effect as part of the kinship (Lippert et al., 2011;Listgarten et al., 2012), the MLM 304 analysis was carried out using a kinship matrix calculated excluding markers from 305 chromosome 1.

307
We recovered the major dry matter content association signal in both the white root and 308 yellow-root subpopulation (Figure 8). Though coinciding with the locus identified in the 309 population-wide GWAS, the association signal in the white subpopulation was much 310 broader, extending from 24 to 33 Mbp and generally overlaps with the broad LD region of 311 the chromosome 1 (Figure 8). On the contrary, association signal for the yellow 312 subpopulation was relatively narrow. Survey of the underlying LD pattern in the same 313 chromosome region for the yellow subpopulation showed a recombination spot.

315
approximately 1:1 ratio for white versus light-yellow roots, suggesting that the yellow-root 376 parent was heterozygous for the functional allele at the PSY2 locus. Kizito et al. (2007) 377 reported a QTL for dry matter content in a bi-parental population genotyped using SSR 378 markers that also corresponds to this region on chromosome 1. More recently, Esuma et 379 al., (2016) reported a single genomic region on Chromosome 1 underlies the variation in 380 total carotenoid content in eight S1 and S2 partially inbred families. This peak, around 24.66 381 Mbp, is close to our first locus tagged by SNP S1_24121306. However that study did not look 382 at genetic architecture for dry matter content.

384
While the amount of total carotenoids in the Genetic Gain collection was not directly 385 estimated, previous studies of diverse cassava germplasm have consistently reported a 386 strong linear relationship between yellow color and carotenoid content (Pearson's 387 coefficient, r, ranging from 0.81 to 0.89) (Iglesias et al., 1997;Chávez et al., 2005;Marín 388 Colorado et al., 2009;Akinwale et al., 2010;Sánchez et al., 2014). Hence the results 389 obtained here should be useful for breeding efforts targeting breeding for improved 390 carotenoid content. Nevertheless, we propose to quantify total carotenoids and its 391 constituents as a future study to corroborate the current findings.

393
Given the importance of dry matter content in cassava, and the fact that we found a single 394 genomic region associated with this trait, further studies are warranted to fine-map and 395 validate the identity of the causal locus. To do this effectively would require different 396 populations that are lacking the wild introgression segments in chromosome 1. This will lead 397 to reduced LD and allow higher mapping resolution. Additionally, special crosses such as 398 nested-association mapping population design (Yu et al., 2008) using strategically selected 399 sets of parents will reduce the confounding effect of population structure. Given our marker 400 density and sample size, this study is sufficiently powered to find large effect alleles that are 401 common in the studied germplasm. To detect more QTLs of small effects will require a 402 larger association panel genotyped at higher density.

404
The use of a broad cassava diversity panel in GWAS not only provides the foundation to map 405 genomic regions associated with natural variation in dry matter and carotenoid content but 406 also allows us to unravel the genetic cause of the negative correlation between these traits, 407 that is, pleiotropy versus genetic linkage. In the context of breeding to simultaneously 408 increase carotenoid and dry matter content, the observed negative association between 409 these traits in our germplasm is undesirable. Several lines of investigation pointed to a 410 possibility of genetic linkage rather than pleiotropy to be the cause of the observed 411 association. Firstly, the genomic region harboring the QTLs for yellow color and dry matter 412 content was found to occur in chromosomal segments that exhibits low overall broader suggesting that the favorable alleles were located in non-recombining haplotype.

419
Thirdly, strong candidate genes for dry matter (UDP-glucose pyrophosphorylase and 420 sucrose synthase) and carotenoid content (phytoene synthase) were found in the vicinity 421 of the major association region (24.1 Mbp) of chromosome 1. Presence of these genes hints at possibly distinct biological causes of the observed associations with the two traits. These 423 hypotheses need to be tested through functional genetics studies at these candidate genes.

424
Taken together, these findings suggest that the phenotypic correlation between dry matter 425 and carotenoid content is mainly caused by physical linkage of loci underlying these trails. that are legacies of the historical breeding program in East Africa (Hahn et al., 1980;445 Jennings, 1994). From these results, it is expected that the mapping resolution will vary 446 widely across the cassava genome depending mainly on whether a locus-of-interest occurs 447 in or outside the large-LD blocks.

449
This study presents a significant progress toward dissecting the genetic architecture of two 450 key breeding goal traits in cassava. The major loci associated with carotenoid content 451 variation and a single locus associated with dry matter content represents markers that will 452 be useful for marker-assisted selection in these traits. Although the results of the present 453 study suggests genetic linkage is more likely to be responsible for the negative correlation 454 between the studied traits, there is need for further investigations to confirm or reject this        Figure 1. Distribution of phenotype for TCHART, Dry matter content, and chromameter (b*).