Regional Heritability Mapping Provides Insights into Dry Matter Content in African White and Yellow Cassava Populations

The HarvestPlus program for cassava (Manihot esculenta Crantz) fortifies cassava with β-carotene by breeding for carotene-rich tubers (yellow cassava). However, a negative correlation between yellowness and dry matter (DM) content has been identified. We investigated the genetic control of DM in white and yellow cassava. We used regional heritability mapping (RHM) to associate DM with genomic segments in both subpopulations. Significant segments were subjected to candidate gene analysis and candidates were validated with prediction accuracies. The RHM procedure was validated via a simulation approach and revealed significant hits for white cassava on chromosomes 1, 4, 5, 10, 17, and 18, whereas hits for the yellow were on chromosome 1. Candidate gene analysis revealed genes in the carbohydrate biosynthesis pathway including plant serine–threonine protein kinases (SnRKs), UDP (uridine diphosphate)-glycosyltransferases, UDP-sugar transporters, invertases, pectinases, and regulons. Validation using 1252 unique identifiers from the SnRK gene family genome-wide recovered 50% of the predictive accuracy of whole-genome single nucleotide polymorphisms for DM, whereas validation using 53 likely genes (extracted from the literature) from significant segments recovered 32%. Genes including an acid invertase, a neutral or alkaline invertase, and a glucose-6-phosphate isomerase were validated on the basis of an a priori list for the cassava starch pathway, and also a fructose-biphosphate aldolase from the Calvin cycle pathway. The power of the RHM procedure was estimated as 47% when the causal quantitative trait loci generated 10% of the phenotypic variance (sample size = 451). Cassava DM genetics are complex and RHM may be useful for complex traits.

and planted mostly inclined on ridged soils (Keating et al., 1988). Botanical seeds are used mainly in breeding programs with up to three seeds produced per pod (Iglesias et al., 1994, Iglesias andHershey, 1991). Storage roots are generally harvested 7 to 24 mo after planting (El-Sharkawy, 2003). Dry matter is the major product from cassava roots apart from moisture and traces of water-soluble vitamins and pigments (Holleman and Aten, 1956;Barrios and Bressani, 1967;Lim, 1968). On average, cassava DM is made up of about 90% carbohydrates (mainly starch), 2% protein, 1% fat, 3% minerals and ash, and 4% fiber (Holleman and Aten, 1956;Barrios and Bressani, 1967;Lim, 1968). This starch deposit makes cassava attractive for the food industry and other industries that rely heavily on starch as their primary raw material (Lim, 1968). The value of cassava derives from a combination of fresh root yield and the percentage of DM that can be extracted from fresh roots, referred to as dry yield. Fresh cassava roots with a high DM content are also preferred by local farmers and processors (Kawano et al., 1987;Safo-Kantanka and Owusu-Nipah, 1992;Enidiok et al., 2008), who transform cassava roots into valuable staples consumed by many in developing countries. With 263 million metric tons produced in 2012 (Food and Agriculture Organization of the United Nations, 2013), cassava has become an indispensable staple in the world and improvement of cassava for high dry yield is needed. This improvement should also endeavor to increase micronutrient content, as it is much needed in the cassava-consuming regions of the world. Biofortification is a successful genetic improvement technique for increasing micronutrient content in staple crops (Meenakshi et al., 2010;Bouis et al., 2011) and represents a promising approach for solving the problem of micronutrient malnutrition around the world (Meenakshi et al., 2007(Meenakshi et al., , 2010Pfeiffer and McClafferty, 2007).
The target of biofortification is to increase the content of essential micronutrients such as iron, zinc, and Vitamin A (Meenakshi et al., 2007(Meenakshi et al., , 2010Pfeiffer and McClafferty, 2007), hence improving the health of millions of people who depend on these staples for daily nutrition. The biofortification process is facilitated by plant breeding (Meenakshi et al., 2010;Bouis et al., 2011). Since the early 2000s, the HarvestPlus initiative (Meenakshi et al., 2007;Pfeiffer and McClafferty, 2007) has been tasked with biofortification of staple crops including cassava, sweet potato [Ipomoea batatas (L.) Lam.], maize (Zea mays L.), rice (Oryza sativa L.), and wheat (Triticum aestivum L.). Biofortification of cassava is geared toward breeding varieties containing increased levels of Provitamin A, or β-carotene, a precursor for Vitamin A. The so-called 'yellow cassava' (Liu et al., 2010;HarvestPlus, 2009;Aniedu and Omodamiro, 2012;La Frano et al., 2013) is designed to address public health issues including child mortality, impaired vision and night blindness, reduced immunity to diseases, and other consequences of vitamin A deficiency (Liu et al., 2010;HarvestPlus, 2009).
Breeding for the required levels of Provitamin A necessitates the accumulation of β-carotene in cassava roots (Aniedu and Omodamiro, 2012;La Frano et al., 2013). Many breeding programs use yellow flesh color as a proxy for measuring the β-carotene levels in cassava despite the fact that yellowness is more of an indication of total carotenoids in the root (Chávez et al., 2005;Ssemakula et al., 2007;Akinwale et al., 2010). This protocol is used to visually preselect lines containing β-carotenoids prior to quantification of different carotenoid levels with high-performance liquid chromatography protocols (Kimura et al., 2007;Adewusi and Bradbury, 1993). Breeding for farmer-preferred biofortified cassava involves the development of high yielding clones with high DM and high β-carotene accumulation in a single clone or variety (Ceballos et al., 2004;Raji et al., 2007). Incorporating all these characteristics in a single variety of cassava makes for a challenging breeding task. Some studies have shown that there is a negative genetic correlation between DM and yellow root flesh color in cassava, making this breeding task even more challenging, since the target is toward full adoption of Provitamin A varieties by local farmers and processors (Akinwale et al., 2010;Vimala et al., 2009). It is therefore useful to understand the genetic control of DM content and β-carotene accumulation in cassava to facilitate the breeding of farmer-preferred varieties.
Regional heritability mapping is a relatively new procedure for identifying loci affecting quantitative traits (Nagamine et al., 2012;Riggio and Pong-Wong, 2014;Riggio et al., 2013;Shirali et al., 2015). Unlike singlemarker genome-wide association analysis (GWAS) methods, which the lack power to detect rare genetic variants (Bodmer and Tomlinson, 2010;Gibson, 2012;Wood et al., 2014), RHM can capture both rare and common genetic variants, giving it more power to identify loci that cannot be detected by standard GWAS (Nagamine et al., 2012;Riggio and Pong-Wong, 2014;Riggio et al., 2013). Regional heritability mapping has been shown to detect both common and rare genetic variants implicated in disease traits in human genomics (Shirali et al., 2015;Uemoto et al., 2013;Zeng et al., 2016) and recently in tree genomics (Resende et al., 2017). Regional heritability mapping is a suitable method for capturing the effect of a genomic block or segment, since it can identify genomic segment-trait associations for regions spanning multiple loci (Nagamine et al., 2012;Riggio and Pong-Wong, 2014;Riggio et al., 2013;Caballero et al., 2015). A multimarker mapping approach like RHM may identify both common and rare variants involved in the expression of DM in white and yellow subpopulations of African cassava. To the best of our knowledge, this is the first attempt to use the RHM procedure in an annual crop.
The objectives of this study were:

Cassava Phenotypic Data for Discovery
We used phenotypic data collected from the Genetic Gain (GG) population trials conducted by the cassava breeding program at the IITA, Ibadan, Nigeria for our analysis. The GG population (713 clones) is an elite population bred from the 1970s to 2007 by the cassava breeding program at the IITA Okechukwu and Dixon, 2008;Ly et al., 2013). Most GG clones are of African origin with such good performance that they were advanced to multienvironment uniform yield trials. For this study, we used clonal evaluation trials of the GG population planted in an augmented design. The clonal evaluation trials use an unreplicated incomplete block design consisting of a layout of between 18 and 30 blocks with 22 accessions and two checks in each block. Accession plots were a single row (1 by 1m spacing) of five-plant stands without borders. All checks were included in the analysis. A few trials were replicated twice. These trials were conducted in three locations in Nigeria: Ibadan (7.40°N, 3.90°E), Mokwa (9.3°N, 5.0°E), and Ubiaja (6.66°N, 6.38°E) between 2013 and 2015. Three core agronomic traits were measured for these trials, including the fresh weight of harvested roots expressed in tons per ha (t ha -1 ) (fresh root yield, FYLD); the percentage of DM of storage roots, which measures root dry weight as the percentage of the root fresh weight; and pulp color a binary trait rated on a scale from 1 (white to light cream root flesh) to 2 (deep cream to yellow root flesh). The DM trait was measured via the oven method: 100 g of grated root sample (with thorough mixing of 10-15 randomly selected roots from a plot) were collected per accession and oven-dried. Dry matter content was then measured as residual weight after oven drying. We further divided the GG population (713 clones) into two subpopulations of white (451 clones) and yellow (262 clones) cassava with the pulp color trait where clones with a score of 1 for this trait were grouped into the white population and those with score 2 into the yellow population.

Cassava Phenotypic Data Used for Validation
To validate the results from the RHM analysis, we used data from a population called the GS-C1, which consisted of the progeny of clones from the GG population described above. Phenotypes from the GS-C1 were obtained from clonal evaluation trials of 1651 clones split into trials at three locations: Ibadan, Mokwa, and Ikenne (6°52ʹN, 3°43ʹE). These trials were planted with an augmented design consisting of 20 to 30 blocks with 22 to 24 clones and two checks in each block. Plots were a single row of five-plant stands (1 by 1m spacing) without borders and without replication, and trials were planted during 2014 and 2015. Cassava trait measurements for this population were as described earlier, except that no strict distinction between yellow and white flesh color was used because the GS-C1 plants were mostly white and cream clones; thus we performed validation analysis with all clones.

Cassava Genotype Data
DNA was extracted with the DNeasy Plant Mini Kits (Qiagen, Germantown, MD) from 713 clones from the 2013 Genetic Gain trial at IITA and was quantified with PicoGreen (Thermo Fisher Scientific, Waltham, MA). Genotyping-by-sequencing was used for genotyping (Elshire et al., 2011) these clones. Six 95-plex and one 75-plex ApeKI libraries were constructed and sequenced on an Illumina HiSeq (Illumina, San Diego, CA) , with one lane per library. Single nucleotide polymorphisms were called from the sequence data using the TASSEL pipeline Version 4.0 (Glaubitz et al., 2012) with an alignment to the M. esculenta Version 6 reference genome (Goodstein et al., 2012). The marker data were converted to a dosage format (0, 1, 2) and missing genotypic data were imputed with Beagle software (Ayres et al., 2011). The final dataset consisted of 177,201 SNPs scored in 713 clones. Members of the GS-C1 population used in the validation analysis were genotyped in 2014 as described above. Single nucleotide polymorphisms from both populations were called together with the TASSEL pipeline (Glaubitz et al., 2012) and missing genotypes also imputed with Beagle (Ayres et al., 2011), yielding the same number of SNPs as above.

Data Analysis
Genome-wide RHM Regional heritability mapping was performed using the following procedure (Nagamine et al., 2012;Riggio and Pong-Wong, 2014;Riggio et al., 2013 where y is a response variable (DM), X is a known incidence matrix for fixed effects β (including grand mean and a nested effect of trial within year within location), Z is a known incidence matrix for the clonal additive genomic effects u 1 for the target genomic segment and u 2 for the whole-genome SNPs. K u1 and K u2 are the genomic relationship matrices calculated from the SNPs with the procedure of VanRaden (2008) as: where G is the genomic relationship matrix; M is a centered marker matrix coded as -1,0,1; and p is the major allele frequency vector. Other components of the model include the genomic variance for the target genomic segment ( 1 2 u s ) and the total genomic variance for the whole genome ( 2 2 u s ); 2 e s is the genomic error variance and e indicates the residuals from the model. Model 1 was fitted using the R EMMREML package (Akdemir and Okeke, 2014). Note that the K u2 genomic relationship matrix serves to control statistically for population structure effects, like the kinship matrix does in standard GWAS. c. Following model fitting in Step (b) above, the genomic heritability for each target genomic segment was computed as follows: where h 2 is genomic heritability for a target genomic segment and the variance components are as described above.  (Akdemir and Okeke, 2014). P-values were obtained with the pchisq function in R (R Development Core Team, 2016). e. Local false discovery rate (LFDR) was estimated with the R qvalue package (Storey and Tibshirani, 2003;Storey et al., 2015).
f. Genomic segment LFDRs were then plotted across the genome in a Manhattan plot with a cutoff of 0.05 being used to assess significance. We performed the RHM procedure separately for the white and yellow cassava subpopulations of GG. No defined population structure was found on in the GG population in a previous GWAS study (Wolfe et al., 2016). Therefore, the genomic relationship matrix from the whole-genome SNPs in the RHM was sufficient to account for structure in this analysis (in fact, we refer to this more as background effect).

Candidate Gene Analysis
We identified candidate genes from the significant hits of the RHM analysis based on annotations for the Version 6 M. esculenta genome on phytozome (Goodstein et al., 2012). We used plant physiology information to narrow down the list of genes associated with carbohydrate biosynthesis, including genes that are functional in starch and sugar biosynthesis, cell wall loosening and degradation, and root sink and plant growth pathways. We performed validation tests on candidates selected on the basis of their prediction accuracies in the GS-C1 population as described below.

Validation Models and Procedures
We conducted validation analyses for the significant hits from the RHM analysis and for the RHM procedure itself. Validation here was geared toward understanding the prediction accuracies obtained from genes and gene families on significant RHM segments. Validation proceeded as described below.

Validation with SnRK Genes (a Candidate Gene Family)
To obtain genotypic data for this analysis, we searched the Phytozome M. esculenta Version 6.1 web portal (Goodstein et al., 2012) using the keyword 'serine threonine kinases' to recover all instances of this in the cassava genome, resulting in 2408 hits. We filtered the resulting list to remove all hits not containing gene ontology or eukaryotic orthologous group function definitions for the keyword 'serine threonine kinase'. We then manually added genes containing known serine threonine kinases that did not contain a function definition, such as the sucrose nonfermenting 1 gene (a list of these genes is provided in the Supplemental Table S1). We extracted all markers within 2.5 kb of the start and end of each gene model with the Bedtools intersect function (Quinlan and Hall, 2010), resulting in 7203 unique SNPs. We refer to these SNPs as candidate SNPs below. For validation of these candidate SNPs on the GS-C1 data, we fitted the following model ( where y is a vector of the raw phenotypic values for DM, X is the known incidence matrix for fixed effects β (including grand mean and a nested effect of trial within year within location), Z is known incidence matrix for clonal additive candidate genomic effects s and whole-genomic effects g. For K s and K g , we used the candidate SNPs and the remaining SNPs from the whole genome excluding the candidate SNPs, respectively, to generate genomic relationship matrices for the 1651 clones of the GS-C1 population as above. A third kinship matrix, K rand , was generated as a control from 7203 SNPs anchored to 2000 randomly selected genes from the cassava genome and used in Model 2 in place of K s , whereas we calculated K g by using SNPs from the whole genome excluding those in K rand . Other components of the model include the SnRK candidates' genetic variance ( 2 s s ) and the genetic variance from other parts of the genome ( 2 g s ), 2 e s is the error variance, and e represents the residuals from the model. Model 2 was fitted with the EMMREML package. To assess prediction accuracies, we fitted another model as follows (Model 3, Eq. [17]-Eq. [21]): where most components of Model 3 remain same as those in Model 2, apart from the genetic effect u having an identity matrix I as its covariance matrix, signifying that the 1651 GS-C1 validation clones were unrelated. Model 3 was also fitted with the R EMMREML package. Model 2 was fitted using a fivefold a cross-validation scheme with 10 repeats, and prediction accuracies were obtained for this cross-validation scheme by a correlation of ŝ of each clone from Model 2 to its û value from Model 3.

Validation with 53 Candidate Genes Extracted from Plant Physiology Literature and 53 Randomly Selected Genes from the Significant RHM Regions
We performed a second procedure to validate the 53 candidate genes identified in significant hit regions in the RHM analysis based on plant physiology literature (Table  1). Using the cassava genome's unique gene identifiers from Phytozome (Goodstein et al., 2012), we extracted all markers within 2.5 kb flanking the start and end of each gene as before, resulting in 400 unique SNPs. We refer to these SNPs as 'likely candidate SNPs'. We also picked 53 single-copy genes at random from within the significant RHM regions and anchored them to 395 SNPs as controls for the likely candidate SNPs. We term these the 'unlikely candidate SNPs'. To validate these, we also fitted the Genomic Best Linear Unbiased Prediction Model 2 with the following modifications: (i) for K s , we used K 53, which was a genomic relationship matrix calculated from the 400 likely candidate SNPs for the 1651 clones of the GS-C1 population (as above); (ii) we calculated K g with SNPs from the whole genome excluding the likely candidate SNPs; (iii) K rand was also calculated as above (as a control) from 402 SNPs anchored to 53 randomly selected genes from the cassava genome (with 7.5 kb flanking the start and end of these genes); (iv) K unlikely was calculated from the 395 unlikely candidate SNPs. These were also used in place of K s in Model 2 with their appropriate K g being calculated as other SNPs in the genome excluding those in K rand and K unlikely . Other components of the model were as described for Model 2 and prediction accuracies were obtained in the same way.
To assess the prediction accuracy of the whole-genome SNPs, we also fitted a model analogous to Model 3 with the covariance of u coming from a genomic relationship matrix with whole-genome SNPs. We term this the predictive accuracy of the whole-genome SNPs.

Validation With All Genes within 1 Mb of the Significant RHM List and an a priori List of Starch Genes in Cassava
We performed another validation procedure to provide a validation for all the genes identified in the significant hit regions in the RHM analysis, including those shown in Table 1 and those not shown because they were not selected on the basis of the information from the literature. By using the cassava genome's unique gene identifiers from Phytozome (Goodstein et al., 2012), we extracted all SNPs within a 1-Mb region centered on each of these candidates with Bedtools (http://bedtools.readthedocs.io/en/latest/, accessed 20 Nov. 2017), resulting in 2297 SNPs from 650 unique genes. We refer to these SNPs as the RHM-region SNPs. In addition, we extracted the SNPs anchored to 123 unique genes in the cassava starch pathway compiled by Saithong et al. (2013), resulting in 419 SNPs. We refer to these SNPs as cassava starch SNPs. To validate these SNPs, we fitted Model 2 with genomic relationship matrices calculated as above from the RHM-region and cassava starch SNPs, in place of K s , with their appropriate K g calculated from remaining SNPs. We also picked 650 single-copy genes at random, excluding the significant RHM regions, and anchored them to approximately 2300 SNPs as controls for the RHM-region and cassava starch SNPs. We refer to these as Random-650 SNPs. We calculated K random-650 from these SNPs and an appropriate K g . These kernels were also fitted in Model 2 as K s and K g , respectively. In addition to the prediction accuracies from these candidates, we validated genes in the RHM-region set by searching for them in two a priori lists compiled by Saithong et al. (2013), including one for the cassava starch pathway and another for the Calvin cycle pathway. The RHM-region genes that made this list were considered to be validated.  Assessing the RHM's Power via the "Hide a Causal SNP" Procedure To validate the RHM procedure, we performed an analysis similar to the classical "hide a causal SNP" approach as follows: a. Chromosomes were divided into 100 SNP segments in sliding windows with 50 SNPs overlapping between adjacent segments.
b. Five adjacent segments were randomly selected on each chromosome.
c. On the third segment, effects were added to a random SNP to inflate the phenotypic variance of the DM trait by 10%.
d. Genomic relationship matrices were made for these segments but for Segment 3, the random pseudocausal SNP was excluded during calculation of the genomic relationship matrix.
e. Subsequently, Steps (b) to (d) of the RHM procedure above were performed, resulting in P-values for these five adjacent segments. Steps (a) to (e) were repeated twelve times, resulting in 216 tests.
f. We then calculated the P-value from the RHM analysis on our data that corresponded to the LFDR threshold of 0.05 and used this as the significance threshold.
g. The power of the RHM analysis was then calculated as the number of times any of the five segment P-values were significant, given the significance threshold from Step (f) above.
h. To make a decision on the bounds set for extracting adjacent candidate genes from the M. esculenta genome for a significant segment in the RHM analysis, the number of times either the first or fifth segment's P-values were significant, conditional on the third segment having a higher P-value, was also calculated. This reflected how far away adjacent segments captured causal variants.

Regional Heritability Mapping for DM in White and Yellow Cassava Populations
The genomic heritabilities for DM in white and yellow cassava, based on whole-genome SNPs, were 0.57 and 0.48, respectively. These heritabilities are somewhat higher than those found by Ly et al. (2013), presumably because they worked with more locations and years, and thus experienced higher genotype × environment interactions. We observed different genetic control patterns for DM in the white and yellow cassava subpopulations, as shown by the Manhattan plots from the RHM analysis (Fig. 1). Significant genomic segments for the white cassava DM were observed on chromosomes 1, 4, 5, 10, 17, and 18; for the yellow cassava, a significant segment was only observed on chromosome 1 (Fig. 1). Because of the difference between the sample sizes of both subpopulations, it is unclear if the DM genetic control patterns between these subpopulations were different. A nonsignificant but strong signal was also observed on chromosome 9 of both cassava subpopulations.

Candidate Gene Analysis
By using information from the estimates of the mean linkage disequilibrium between genomic segments per chromosome ( Fig. 2A), the distribution of the length of genomic segments in our analysis (Fig. 2B), and information on the number of times adjacent segments captured causal variations in the simulation analysis, we set the bound for the region where candidate genes were sought to 1.0 Mb (500 kb flanking each hit), representing from two to three genomic segments adjacent to the top hit genomic segment.

Candidates for the White and Yellow Cassava Subpopulations
For the top RHM hits in both cassava gene pools, we identified possible candidate genes and transcriptional regulators adjacent to these hits based on their involvement in the carbohydrate biosynthesis pathway, including members of the SnRKs family, members of the UDP-glycosyltransferase family (including starch and sucrose synthases), and UDPsugar transporters; specific plant transcriptional factors including members of the β helix-loop-helix (bHLH) family and mini-zinc fingers; and other genes involved in cell wall processes, root storage, and development including pectinases and β-vacuolar processing enzymes. We show a list of these genes in Table 1. An additional candidate gene, phosphofructokinase, was associated with the nonsignificant peak on chromosome 9, which was more pronounced in the yellow cassava germplasm.

Validation Results for SnRKs
The predictive accuracy of the whole-genome SNPs was 0.54 (SD of the cross-validation repeat cycle = 0.03). With the set of candidate SnRK SNPs, the prediction accuracies from the cross-validation with Model 2 were 0.26 (SD of the cross-validation repeat cycle = 0.04) and 0.12 (SD of the cross-validation repeat cycle = 0.06) for the candidate and random SNPs, respectively. The predictive ability of the genome-wide SnRK candidates (7203 SNPs) had ~50% of the total prediction accuracy from our set of genomewide SNPs (177,201) for the GS-C1 population.

Validation With 53 Likely Candidate Genes Extracted from Plant Physiology Literature and 53 Unlikely Candidate Genes from the Significant RHM Regions
With the likely candidate SNPs from the genes identified for all the top hitting genomic segments genome-wide (shown in Table 1), prediction accuracies from the cross-validation with a modified Model 2 were 0.17 (SD of the cross-validation repeat cycle = 0.03), those for the 53 unlikely genes randomly selected from the top hitting genomic segments genome-wide were 0.14 (SD of the cross-validation repeat cycle = 0.02) and those for the SNPs from 53 random genes from the cassava genome were 0.06 (SD of the cross-validation repeat cycle = 0.08).

Validation With all Genes within 1 Mb of the Significant RHM List and an a priori List of Starch Genes in Cassava
Using the RHM-region, cassava starch, and Random-650 SNPs, the prediction accuracies from the cross-validation with a modified Model 2 were 0.17 (SD = 0.04), 0.18 (SD = 0.03), and 0.03 (SD = 0.01), respectively. Based on two a priori lists compiled by Saithong et al. (2013), including one for the cassava starch pathway and another for the Calvin cycle pathway, we found three RHM-region genes on the cassava starch pathway list, including an acid invertase (Manes.01G076500), a glucose-6-phosphate isomerase (Manes.18G060600), and a neutral or alkaline invertase (Manes.04G006900). However, in the Calvin cycle pathway list, we found one RHM-region gene, namely fructose-biphosphate aldolase (Manes.04G007900). These genes are known to play key roles in starch biosynthesis and storage (Junker, 2004;Ap Rees, 1992;Appeldoorn et al., 1997;Renz et al., 1993).
To assess if these genes were significantly enriched in the RHM regions, we performed a simple calculation by multiplying the 650 genes in the RHM region by the 123 genes in the cassava starch pathway (Saithong et al., 2013) and divided them by the total number of genes in the cassava genome (33,030). The result was 2.4, which is the expectation of a Poisson process of obtaining the genes in the cassava starch pathway. However, we calculated the probability of drawing three cassava starch pathway genes from the genome at random, resulting in P = 0.22, indicating no significant enrichment.

Assessing the RHM's Power via the "Hide a Causal SNP" procedure
We calculated the statistical power of the RHM procedure to detect simulated causal effects from 216 analyses as the number of times any of the five segments' P-values were significant. The P-value from the RHM analysis on our data that corresponded to the LFDR threshold of 0.05 was 0.00024, which became our significance threshold for this analysis. We found that 102 tests were significant out of a total of 216, representing 47% statistical power to detect the simulated causal region. To set the bounds for how far in the genome to cover when extracting candidate genes from a significant RHM segment, we also calculated the number of times the P-values from the first or fifth genomic segments were significant, conditional on the third segment's P-value being higher. With a total of 216 analyses, 27 cases had significant P-values on Segment 3 and 15 cases had significant P-values from Segments 1 or 5 when the P-values from Segment 3 were higher. This represents 15% coverage further away from the causal segment. With this information, we chose an adjacent span of 500,000 kb pairs flanking a significant RHM segment as the bounds for extracting adjacent candidate genes. A summary of the prediction accuracies from validated candidates are shown in Table 2 DISCUSSION The RHM results in the high DM and white cassava populations clearly demonstrate the polygenic nature of the DM trait. Dry matter is composed of carbohydrates (mostly starch), cell wall components, and fiber, as well as other nonstarchy polysaccharides. Thus it is not surprising that this trait is complex and controlled by many genes. In addition, the RHM procedure in this study showed a 47% power for detecting association with a sample size of less than 500, given the polygenic nature of this trait.

Serine-Threonine Protein Kinases may be Involved in the Regulation of Cassava Carbohydrate Biosynthesis
The SnRK gene family in plants is homologous to the sucrose nonfermenting 1 protein kinase family in yeast and the adenosine monophosphate-activated protein kinase gene family in mammals. Its members have gained recognition as critical elements in transcriptional, metabolic, and developmental regulation in plants Halford and Hardie, 1998;Polge and Thomas, 2007;Xue-Fei et al., 2012;Crozet et al., 2014;Jossier et al., 2009). The most studied member of this family is SnRK1 (Halford and Hardie, 1998;Polge and Thomas, 2007). Serine-threonine protein kinases play a vital role as global regulators of C metabolism and mediate cross-talk between metabolic and other plant signaling pathways (Halford and Hardie, 1998;Polge and Thomas, 2007;Xue-Fei et al., 2012). SnRK1 was shown to play a key role in seed filling and maturation and in embryo development in pea (Pisum sativum L.) (Radchuk et al., 2006(Radchuk et al., , 2010. In potato (Solanum tuberosum L.) and wheat, SnRK1 phosphorylates and inactivates key enzymes in the sugar and starch biosynthesis pathway, affecting sucrose synthase, trehalose phosphate synthase, and α-amylase (Purcell et al., 1998;Laurie et al., 2003). In potato, it stimulates the redox activation of adenosine 5'-diphosphate (ADP)glucose pyrophosphorylase in response to high sucrose levels (Geigenberger, 2003;Tiessen et al., 2003). Antisense expression of SnRK1 resulted in a reduction in the expression of sucrose synthase in potato tubers (Purcell et al., 1998) and α-amylase in cultured wheat embryos . However, the overexpression of SnRK1 in potatoes resulted in a significant increase in starch accumulation in tubers and a decrease in glucose levels resulting from a dramatic increase in the activity and expression levels of sucrose synthase and ADP-glucose pyrophosphorylase (McKibbin et al., 2006). SnRK1 is activated by high cellular sucrose or low glucose, or a dark period (Rolland et al., 2002). The model of sugar and starch biosynthesis in potato from McKibbin et al. (2006) showed SnRK1 to be at the heart of these processes. By using RHM analysis in the white cassava population, we identified significant genomic segments containing some of the proteins or enzymes in the model given in this illustration (McKibbin et al., 2006) including SnRKs, UDP-glycosyltransferases and UDP-sugar transporters, an ADP-type starch synthase 2, and a neutral invertase. Glycosyltransferases are a family of enzymes involved in carbohydrate biosynthesis of which sucrose and starch synthases are members (Momma and Fujimoto, 2012). With the RHM procedure and candidate gene analysis, several of these known carbohydrate biosynthesis enzymes (Table 1, Fig. 3) were putatively associated with the cassava DM trait.

Other Possible Candidates that are Involved in Sugar and Starch Biosynthesis in Cassava
Other proteins located within significant genomic segments that are also involved in the carbohydrate biosynthesis pathway include invertase inhibitors, which have been shown to form complexes with SnRKs and lead to reduced accumulation of reducing sugars and increased accumulation of starch in potatoes (Lin et al., 2015), and BAK1, a brassinosteroid insensitive 1-associated receptor-like kinase and a member of the somatic embryogenesis receptor-like kinase (SERK) subfamily involved in regulation of root development (Du et al., 2012). BAK1/SERK1 positively controls starch granule accumulation in Arabidopsis thaliana (L.) Heynh. root tips (Du et al., 2012). With a transgenic sweet potato overexpressing a DNA-binding one zinc finger protein encoded by a SRF1 gene [a member of the mini-zinc finger family of plant-specific transcription factors (Takatsuji, 1998(Takatsuji, , 1999], Tanaka et al. (2009) showed that transgenic roots had significantly higher DM content in storage roots, increased starch content per unit of storage root fresh weight, and a drastic decrease in glucose and fructose levels. SRF1 was shown to modulate carbohydrate metabolism in sweet potato storage roots via negative regulation of vacuolar invertase (Tanaka et al., 2009). Several enzymes, including pectinases, pectin esterases, cellulase synthase, and galacturonosyltransferases, found in the significant RHM regions in white and yellow cassava may be involved in plant cell wall loosening and degradation which may be linked to C partitioning in cassava. In fact galacturonosyltransferase, a member of the CAZy (Cantarel et al., 2009) GT8 family of glycosyltransferases, is involved in pectin and hemicellulose biosynthesis (Cantarel et al., 2009;Atmodjo et al., 2011;de Godoy et al., 2013). Galacturonosyltransferase-silenced tomato (Solanum lycopersicum L.) fruits showed altered pectin composition and decreased starch accumulation (de Godoy et al., 2013). Cassava galacturonosyltransferases may interfere with C metabolism, partitioning, and allocation, as seen in tomato (de Godoy et al., 2013). In their expression profile study using samples from different stages of cassava root development, Yang et al. (2011) found a significant upregulation of the enzymes involved in plant cell wall loosening and degradation. The bHLH family of transcription factors is a large family in plants involved in flavonoids, the carotenoid pathway, and anthocyanin pigmentation of tuber skin and flesh (from yellow to white and purple) in potato (De Jong et al., 2004;Zhang et al., 2009;Tai et al., 2013) and may interact with sucrose transporter to perform this function (Krügel et al., 2012). Phytochrome-interacting factors form a subfamily of bHLH transcription factors and PIF1 (a member of this subfamily) has been shown to directly regulate the expression of phytoene synthase (Toledo-Ortiz et al., 2010), a major driver of carotenoid production in plants and the first and main rate-determining enzyme of the carotenoid pathway (Toledo-Ortiz et al., 2010;Maass et al., 2009). It is not clear how bHLH may link with sugar biosynthesis and transport or play a role in starch accumulation in yellow cassava clones, but this may translate to the frequently observed negative correlation between DM and yellow root flesh color in African cassava (Esuma et al., 2016;Akinwale et al., 2010). Interestingly, cassava breeders in Colombia have not found any negative correlation between carotenoids and DM in their germplasm, and, in fact, have made gains in both traits by using a rapid cycling recurrent selection scheme (Ceballos et al., 2013).

Some Experimental Studies that Reflect Possible Roles of Candidate Genes in the Cassava Tuber
By using the RHM analysis, we identified ( Fig. 3) a number of cassava genes in the heterotrophic plant cell starch or sucrose metabolism pathway (Junker, 2004). We describe a few steps in this pathway, concentrating mostly on where we have identified candidate genes (candidate genes are in parentheses henceforth with phytozome gene identifiers). After sucrose is imported into the cytosol by a sucrose transporter (Manes.05G099000, Manes.18G054200), it is converted into hexose sugars via two paths involving the enzymes sucrose synthase (shown in the center of Fig. 3) and invertase (shown to the left in Fig. 3) (Manes.04G006900, Manes.01G076500) (Junker, 2004;Ap Rees, 1992;Appeldoorn et al., 1997;Renz et al., 1993). Sucrose transport is much more pronounced in the sink tissues that switch to storage mode (Weschke et al., 2000(Weschke et al., , 2003. A transgenic study using sucrose transporter 4-RNAi mutant potato plants showed an increase in tuber yield and starch accumulation, and also induced early tuberization (Chincinska et al., 2008). It is worth noting that the cytosolic neutral invertase tends to play a larger role in sink organs than does the vacuolar acid invertase. Studies on maize null mutants of the cytosolic invertase (Mn1) had miniature seeds caused by arrested endosperm development (Miller and Chourey, 1992), whereas overexpression of Mn1 increased grain yield and starch content (Li et al., 2013). Similar studies in rice, tomato, and cotton (Gossypium hirsutum L.) have also found consistent phenotypes with cytosolic neutral invertase (Wang et al., 2008;Zanor et al., 2009;Wang and Ruan, 2012). Other studies on vacuolar invertase inhibitors showed a significant reduction of cold-induced sweetening in potato tubers (via a reduction in sucrose accumulation in tubers) by restricting the activities of vacuolar acid invertase (McKenzie et al., 2013;Brummell et al., 2011). These studies suggest the importance of sucrose unloading to sink organs; hence vacuolar acid and cytosolic invertases are targets for post-translational Key enzymes including sucrose transporter, invertase, phospho-glucose isomerase, and phosphofructokinase were within 500 kb of significant regional heritability mapping (RHM) segments. regulation toward starch storage and DM accumulation (Tang et al., 2016).
The hexoses cleaved from sucrose are rapidly phosphorylated into hexose monophosphates by hexokinase and fructokinase (Junker, 2004;Ap Rees, 1992;Appeldoorn et al., 1997;Renz et al., 1993) and they proceed to the starch biosynthesis or glycolytic pathways. As shown in the central pathway in Fig. 3, the resulting hexose monophosphates (including glucose-1-phosphate, glucose-6-phosphate and fructose-6-phosphate) are interconverted by the enzymes phosphoglucose mutase and phosphoglucose isomerase (Manes.18G060600) (Junker, 2004). Phosphoglucose isomerase connects the Calvin cycle pathway with the starch biosynthetic pathway in illuminated plant leaves (Bahaji et al., 2015). It also plays a key role in the glycolytic pathway and in the regeneration of glucose-6-phosphate in the oxidative pentose pathway in heterotrophic organs and nonilluminated plant leaves (Bahaji et al., 2015). It is strongly inhibited by light (Heuer et al., 1982) and by an intermediate Calvin cycle molecule, 3-phosphoglycerate (Dietz, 1985), which accumulates in the chloroplast during illumination and allosterically activates AGPase (Kleczkowski, 1999(Kleczkowski, , 2000. The second phosphorylation step in the glycolytic pathway is the phosphorylation of fructose-6-phosphate to fructose-1,6-bisphosphate by phosphofructokinase (Manes.09G077800). Interestingly, transgenic studies overexpressing 6-phosphofructokinase in potato found no changes in the transgenic tuber phenotype compared with the controls but had an increased flux of cytosolic 3-phosphoglycerate that did not affect the amount of starch that accumulated in the tubers (Sweetlove et al., 2001;Burrell et al., 1994). It is noteworthy that our RHM results identified a signal on chromosome 9 in both yellow and white cassava that corresponds to the position of a phosphofructokinase in cassava.
Fructose-bisphosphate aldolase (FDA), a candidate from the Calvin cycle pathway (Manes.04G007900), is known to play a key role in carbohydrate biosynthesis. Changes in FDA activity have marked consequences for photosynthesis, C partitioning, growth, yield, and improved uniformity of solids in potato and other plants (Haake et al., 1998;Barry et al., 2002). Transgenic plants [including potato, maize, rice, canola (Brassica napus L.), and other crops] that expressed the Escherichia coli FDA gene in their chloroplasts had significantly higher root mass, leaf phenotypes with significantly higher starch accumulation, and lower leaf sucrose than control plants expressing the null vector (Barry et al., 2002).

Implications for the Breeding of High DM White Cassava Varieties or High-DM, High-β Carotene Yellow Cassava Varieties
The RHM results presented in this study suggest that DM content is under complex genetic control, particularly in the white cassava population. A network of genes and transcriptional regulons that are at the heart of sugar and starch biosynthesis were positionally associated with significant RHM regions in white and yellow cassava populations. The 'hide a SNP' analysis performed to validate the RHM results indicated that spurious associations caused by linkage may have been avoided in the RHM analysis even when large segments were involved (Fig. 2B). Given the genetic complexity of the cassava DM trait, we suggest that candidate genes, including invertases (neutral and acid) and FDA, may be targeted for gene editing or transgenic techniques to substantiate the role of these genes in DM and starch accumulation in cassava and to provide a clear path for their use in cassava breeding programs.
Dry matter content must work together with FYLD to make cassava production profitable and provide value for farmers and processors. To investigate whether some of the genes and gene families identified in the RHM analysis are also involved in the biological processes that lead to cassava FYLD, we validated their effects on FYLD using the same validation procedures and populations as above. The results showed prediction accuracies of 0.03 (SD = 0.02) for SnRKs on FYLD, 0.02 (SD = 0.02) for 53 likely candidates, 0.006 (SD = 0.03) for 53 unlikely candidates, 0.03 (SD = 0.02) for RHM-region genes, and -0.009 (SD = 0.02)for cassava starch pathway genes. These results suggest no single biological pathway that controls DM and FYLD. This is not surprising, since there is little genetic correlation between DM and FYLD (Kawano et al., 1987). It appears from the negative correlation between carotenoid content in roots and DM content in African cassava germplasm (Esuma et al., 2016;Akinwale et al., 2010) and from the link between bHLH and sugar biosynthesis (Krügel et al., 2012) that yellow flesh color is associated with the accumulation of reducing sugars in edible roots (Eleazu and Eleazu, 2012). This poses a more complex challenge for improving DM in African yellow cassava and shifts attention toward finding recombinant yellow cassava progenies that have high DM. Ceballos et al. (2015) states that the search for the appropriate recombinant is difficult in cassava breeding and advocates for the use of inbred progenitors when breeding for hybrid cassava.
In this paper, we have used candidate gene analysis in an attempt to understand the function of the genes or gene families positionally associated with the RHM hits. We do not make the claim that these candidates are causal genes detected by the RHM hits but rather we have shown, by using prediction accuracies, that these RHM hit loci were positionally associated with the DM trait in cassava ( Fig. 1 and Fig.4A,B) thus resulting in better predictability than if random genes were used as controls. To validate the hypotheses presented in this paper regarding the candidate genes underlying DM accumulation in cassava, and to elucidate the physiological mechanisms involved in the expression of the DM trait in both yellow and white cassava, we recommend the use of genome editing or transgenic technology, and in-depth analysis of sugars and carbohydrates in cassava roots, stems, and leaves. Similar studies in potato have benefited and informed potato breeding, and the same will be true of cassava as new insights become available. of significant regional heritability mapping (RHM) segments. Circos plots of carbohydrate biosynthesis candidate genes or gene families and significant RHM segments shown by paired dotted lines. Points are randomly scattered along the y-axis to avoid overlaps and visualize gene families better. (B) Zoom-in plot of candidate genes and significant RHM segments in a 21 Mb region of chromosome 1. The same genes or gene families as in (A) are shown along with two significant RHM segments. The double line separates candidate genes with random y-axis positions from-log 10 (local false discovery rate) plotting of the significance of RHM segments.

CONCLUSION
By using RHM analysis, we demonstrate the complex genetic architecture of DM content in high-DM white African cassava. Candidate gene analysis revealed the possible roles of SnRKs, vacuolar and neutral invertases, phosphoglucose isomerase, and FDA in the regulation of sugar and starch biosynthesis in cassava. The RHM analysis indicated that inheritance of DM content in the high-DM white cassava population is more polygenic than that in the low-DM yellow cassava population. We examined the utility of models based on the genome-wide candidate genes found in this study with prediction accuracies in a different but related population and found appreciable predictive ability compared with what is obtained when whole-genome markers were used. Transcriptional regulators such as bHLH may be involved in flesh root color and sugar biosynthesis in cassava, as shown in potato. We recommend further studies using genome editing or transgenic technology to better understand these mechanisms and to inform and accelerate breeding efforts for cassava.

Supplemental Information
Supplemental Table S1. Phytozome IDs and other genomic information about genes annotated as members of the SnRK family used for the validation of SnRKs in this study.