Genome-Wide Association Analysis of Mucilage and Hull Content in Flax (Linum usitatissimum L.) Seeds

New flaxseed cultivars differing in seed mucilage content (MC) with low hull content (HC) represent an attractive option to simultaneously target the food and feed markets. Here, a genome-wide association study (GWAS) was conducted for MC and HC in 200 diverse flaxseed accessions genotyped with 1.7 million single nucleotide polymorphism (SNP) markers. The data obtained for MC and HC indicated a broad phenotypic variation and high (~70%) and a moderate (~49%) narrow sense heritability, respectively. MC and HC did not differ statistically between fiber and oil morphotypes, but yellow-seeded accessions had 2.7% less HC than brown-seeded ones. The genome-wide linkage disequilibrium (LD) decayed to r2 = 0.1 at a physical distance of ~100 kb. Seven and four quantitative trait loci (QTL) were identified for MC and HC, respectively. Promising candidate genes identified include Linum usitatissimum orthologs of the Arabidopsis thaliana genes TRANSPARENT TESTA 8, SUBTILISIN-LIKE SERINE PROTEASE, GALACTUROSYL TRANSFERASE-LIKE 5, MUCILAGE-MODIFIED 4, AGAMOUS-LIKE MADS-BOX PROTEIN AGL62, GLYCOSYL HYDROLASE FAMILY 17, and UDP-GLUCOSE FLAVONOL 3-O-GLUCOSYLTRANSFERASE. These genes have been shown to play a role in mucilage synthesis and release, seed coat development and anthocyanin biosynthesis in A. thaliana. The favorable alleles will be useful in flaxseed breeding towards the goal of achieving the ideal MC and HC composition for food and feed by genomic-based breeding.


Introduction
Flaxseed (Linum usitatissimum L.), one of the oldest crops, has been used as human food and animal feed since ancient times [1]. The two main morphotypes of cultivated L. usitatissimum are oil morphotype (flaxseed) and fiber morphotype (fiber flax). Flaxseed plants are shorter, more branched, and larger seeded, and branches cover a greater proportion of the main stem compared to fiber flax. Flaxseed currently enjoys new prospects in the functional food market because of growing consumer interest in food with health benefits [1]. Flaxseed is rich in bioactive compounds, such as α-linolenic acid (omega-3) that have cardiovascular benefits, lignans with anticancer properties, and insoluble and soluble fiber (mucilage) that is capable of lowering cholesterol and insulin [2].
Flaxseed mucilage is a heterogeneous polysaccharide composed of xylose, arabinose, glucose, galactose, rhamnose, and fructose [3] that can be purified into neutral and acidic polymers. Mucilage abounds in the seed coat, where it makes up to 8-10% of the seed weight [4]. Mucilage synthesis is tightly linked to seed coat development [5] and both tissues form the seed hull, a structure representing 37-48% of the seed weight [6,7]. These two fractions, rich in polysaccharides, are components of the flaxseed meal, primarily used as a protein rich livestock and poultry feed [6,8]. Absorption of flaxseed meal s advantageous 31-45% protein content [9] may be hindered by mucilage and cell wall polysaccharides. This is due to the swelling capacity of polysaccharides in the digestive tract of monogastric animals that causes concomitant growth depression and reduced feed efficiency [7,10]. In that context, reduction of mucilage (MC) and hull (HC) contents in flaxseed meal is desirable to achieve improved feeding value for livestock and poultry. Studies of flaxseed mucilage degradation are focused on chemical retting, enzyme retting, and steam explosion [11]. Reduction of the hull content in flaxseed and rapeseed meal has been achieved through dehulling methods [12] and the use of yellow-seeded genotypes [7,13]. Food and feed markets demand flaxseed cultivars differing in mucilage and hull content. It is, therefore, crucial to decipher the genetic factors underlying these complex traits in order to accelerate the development of market-specific flaxseed cultivars.
In the model plant Arabidopsis thaliana, the genes necessary for the synthesis, modification, and release of mucilage, as well as seed coat development, are well understood [5,14].
Genetic variation for MC and HC in flaxseed has been assessed [4,7,[16][17][18] but no quantitative trait loci (QTL) have been reported so far. QTL for Fusarium wilt resistance [19], powdery mildew [20], iodine value, palmitic, linoleic, and linolenic acids [21,22], and seed and flower color [23], were reported. QTL for seed protein, cell wall, straw weight, yield-related traits, and phenological traits, have also been reported using bi-parental mapping and association mapping [22,24,25]. Recently, genome-wide association studies (GWAS) have been conducted for agronomic and seed quality traits using thousands of single nucleotide polymorphism (SNP) loci [26,27]. GWAS mines the natural sequence diversity within a species and captures historical recombination events. It is therefore a suitable approach to discover loci that control complex traits, leading to a higher mapping resolution to facilitate the identification of candidate genes [28]. Thus, the suite of genomic tools available for flaxseed genetic studies [21,22,26,27,[29][30][31][32] make genomic evaluation of MC and HC feasible.
The objective of this research was to identify QTL and candidate genes contributing to mucilage content and hull content that could be capitalized upon to assist in breeding flaxseed cultivars with different mucilage content and with reduced hull content. Improving these traits will increase seed value of this important cash crop.

Phenotypic Evaluation
Two hundred flax accessions from the Canadian flax core collection were planted in two environments. Evaluation of MC and HC showed a normal distribution across the two environments according to the Shapiro-Wilk normality test and normality plots (Table S1, Figure S1). Variance component analysis indicated significant effects of genotype, environment, and genotype × environment interaction, according to the Wald statistic (p < 0.001). The phenotypic variation for MC in Vilcún location 2014 (CAR2014) ranged from 23.52 to 103.57 mg g −1 with an average of 58.67 mg g −1 .
A lower variation was observed for MC in Huichahue location 2015 (HU2015), which ranged from 18.88 to 91.90 mg g −1 with an average of 55.04. mg g −1 HC variation ranged from 35.56 to 48.59% in CAR2014 and from 35.73 to 48.59% in HU2015 ( Figure 1, Table S1). MC and HC were significantly positively correlated in CAR2014 and HU2015 with coefficients of 0.28 and 0.29, respectively. Narrow sense heritability (h 2 ) for MC attained 70.7 and 73.8% in CAR2014 and HU2015, respectively. Lower h 2 of 51.4 and 46.2% for HC at CAR2014 and HU2015 were observed. MC did not differ statistically between flax morphotypes nor seed color classes, according to the Kruskal-Wallis non-parametric test. The average MC was 55.33 and 56.63 mg g −1 for the fiber and oil morphotypes, respectively (p = 0.651) ( Figure S2a). The average MC registered values of 56.63 and 59.22 (p = 0.517) for the brown and yellow seeded classes, correspondingly. The average HC did not differ statistically between flax morphotypes (fiber = 43.41%, oil = 42.79%; p = 0.373). On the other hand, yellow-seeded genotypes averaged 2.66% less HC than brown seeded accessions (p = 3.2 × 10 −5 ) ( Figure S2b).  Figure S2b).

Population Structure and Linkage Disequilibrium
The dendrogram based on 771,914 SNPs and the STRUCTURE plot grouped the 200 individuals into two major clusters arbitrarily assigned as "red" and "blue" (Figure 2a,b). In the K against ΔK plot, a break in the slope was clearly observed at K = 2 ( Figure 2b). The red cluster comprised almost exclusively genotypes belonging to the oil morphotype, while the blue cluster included both flax morphotypes. The coefficient of population differentiation (FST = 0.08) indicated a weak population structure between the two clusters.
The genome-wide linkage disequilibrium (LD) decayed to r 2 = 0.1 at a physical distance of ~100 kb (Figure 2c). Intrachromosomal LD decayed to r 2 = 0.1 at a distance between marker pairs ranging from ~40 kb on chromosome 8 to ~210 kb on chromosome 13 ( Figure S3). A highly significant positive correlation (r = 0.75, p = 0.0012) between marker density and the intrachromosomal LD blocks was observed ( Figure 3). For example, chromosomes 4 and 8 with the smallest number of markers, and chromosomes 6 and 13 with the largest, displayed the fastest and slowest LD decays, respectively (Figures 3 and S3). The fast LD decays observed in this association panel are indicative of its advantageous potential for reducing QTL intervals and fine mapping of candidate genes for MC and HC.

Population Structure and Linkage Disequilibrium
The dendrogram based on 771,914 SNPs and the STRUCTURE plot grouped the 200 individuals into two major clusters arbitrarily assigned as "red" and "blue" (Figure 2a,b). In the K against ∆K plot, a break in the slope was clearly observed at K = 2 ( Figure 2b). The red cluster comprised almost exclusively genotypes belonging to the oil morphotype, while the blue cluster included both flax morphotypes. The coefficient of population differentiation (F ST = 0.08) indicated a weak population structure between the two clusters.
The genome-wide linkage disequilibrium (LD) decayed to r 2 = 0.1 at a physical distance of 100 kb (Figure 2c). Intrachromosomal LD decayed to r 2 = 0.1 at a distance between marker pairs ranging from~40 kb on chromosome 8 to~210 kb on chromosome 13 ( Figure S3). A highly significant positive correlation (r = 0.75, p = 0.0012) between marker density and the intrachromosomal LD blocks was observed ( Figure 3). For example, chromosomes 4 and 8 with the smallest number of markers, and chromosomes 6 and 13 with the largest, displayed the fastest and slowest LD decays, respectively ( Figure 3 and Figure S3). The fast LD decays observed in this association panel are indicative of its advantageous potential for reducing QTL intervals and fine mapping of candidate genes for MC and HC.

Genome-Wide Association Analysis
Three GWA models were tested, including GLM-Q, GLM-PCA, and MLM-K. According to the quantile-qualtile (Q-Q) plot results, the GLM-Q model showed a strong skew toward significance

Genome-Wide Association Analysis
Three GWA models were tested, including GLM-Q, GLM-PCA, and MLM-K. According to the quantile-qualtile (Q-Q) plot results, the GLM-Q model showed a strong skew toward significance

Genome-Wide Association Analysis
Three GWA models were tested, including GLM-Q, GLM-PCA, and MLM-K. According to the quantile-qualtile (Q-Q) plot results, the GLM-Q model showed a strong skew toward significance for every trait ( Figure S4a), indicating that the Q matrix was insufficient to account for population structure and cryptic relatedness. Conversely, the MLM-K, which only used the kinship matrix, led to an overcorrection of these confounding factors, particularly for HC ( Figure S4b). The GLM-PCA was tested with 5 and 10 PCA covariates for HC and MC, accounting for 30.1 and 37.1% of the variation, respectively. Both GLM-5PCA and GLM-10PCA models performed well in controlling the rate of false positives, providing suitable statistical power to identify significant marker-trait associations for MC and HC ( Figure S5). Therefore, the GLM-PCA model was applied for GWA in this study.
GWA analysis identified 12 and 17 significant associations for MC in CAR2014 and HU2015, respectively (p < −log 10 (P) = 6.88), and markers Lu5-3808878, Lu7-13225294, and Lu11-2498303 were significant in both environments (Table 1, Figure S5). Various significant SNP markers fell into the same LD blocks. For example, five other significant markers surrounded the peak SNP Lu5-3808878 ( Figure S5), thus, they were considered the same QTL. Following this criterion, seven QTL were delineated on chromosomes 2, 3, 5, 7, and 11. The peak SNPs of these QTL accounted for 11.8 to 17.3% of phenotypic variation, and the combined three consistent QTL accounted for 43.6% of the MC variation (Table 1). A total of three and four significant associations were detected for HC in CAR2014 and HU2015, respectively (p < −log 10 (P) = 6.88). Markers Lu7-6577527 and Lu13-2803224 were significant in both environments ( Table 1). The four QTL identified on chromosomes 7, 10, 12, and 13 explained between 13.8% and 17.8% of the HC variation. The two consistent QTL Lu7-6577527 and Lu13-2803224, accounted for a combined 33% of the HC variation ( Table 1).
The peak SNPs effect for MC and HC were all significant according to the non-parametric Kruskal-Wallis test (p < 0.05), except for Lu3-26033342 associated with MC (Figure 4a,b and Figure S6). Accessions with a thymine (T) allele at Lu2-22298066 displayed, on average, an increase of 15.3 and 9.4 mg g −1 in MC, compared to accessions with a cytosine (C) allele in CAR2014 and HU2015, respectively (Figure 4a). Similarly, accessions with a "T" allele at Lu3-7398487 had, on average, 6.64 and 8.4 mg g −1 higher MC compared to accessions with a "C" allele in CAR2014 and HU2015, correspondingly. Accessions with a guanine (G) allele at Lu7-13225294 had, on average, 8.56 and 7.71 mg g -1 more mucilage compared to accessions carrying an adenine (A) allele in CAR2014 and HU2015, respectively (Figure 4a). The allelic effect for the other four peak SNPs is illustrated in Figure S6. The allelic effect of peak SNPs for HC revealed that accessions harboring a "C" allele at Lu7-6577527 had, on average, 1.4 and 1.3% less HC compared with "A" allele genotypes in CAR2014 and HU2015, correspondingly (Figure 4b). On average, HC was reduced by 1.4 and 1.3% (Lu7-6577527) to 2.6 and 2.7% (Lu13-2803224) in CAR2014 and HU2015, respectively (Figures 4b and S6).
The combined QTL effect revealed that the MC of accessions harboring none of the favorable QTL alleles averaged 44.6 and 48.9 mg g −1 , compared to 72.1 and 67.6 mg g −1 , for those with five favorable alleles in CAR2014 and HU2015, respectively (Figure 4c). No accession had all seven Accessions with a guanine (G) allele at Lu7-13225294 had, on average, 8.56 and 7.71 mg g −1 more mucilage compared to accessions carrying an adenine (A) allele in CAR2014 and HU2015, respectively (Figure 4a). The allelic effect for the other four peak SNPs is illustrated in Figure S6. The allelic effect of peak SNPs for HC revealed that accessions harboring a "C" allele at Lu7-6577527 had, on average, 1.4 and 1.3% less HC compared with "A" allele genotypes in CAR2014 and HU2015, correspondingly (Figure 4b). On average, HC was reduced by 1.4 and 1.3% (Lu7-6577527) to 2.6 and 2.7% (Lu13-2803224) in CAR2014 and HU2015, respectively (Figure 4b and Figure S6).
The combined QTL effect revealed that the MC of accessions harboring none of the favorable QTL alleles averaged 44.6 and 48.9 mg g −1 , compared to 72.1 and 67.6 mg g −1 , for those with five favorable alleles in CAR2014 and HU2015, respectively (Figure 4c). No accession had all seven favorable QTL alleles. The combined QTL effect for HC indicated that genotypes with none of the favorable QTL alleles averaged 45.5% and 45.3% HC compared to genotypes with four favorable QTL alleles, in which HC averaged 42.7% and 42.9% in CAR2014 and HU2015, respectively (Figure 4d).

Identification of Candidate Genes
The LD blocks harboring the peak SNPs were mined for genes relevant to MC and HC using the L. usitatissimum v.1.0 reference genome. A total of 204 and 118 candidate genes were identified for MC and HC, respectively ( Table 2 and Table S2). Several genes ascribed to carbohydrate metabolism, seed mucilage synthesis, modification, and release, and cell wall synthesis and modification were identified at the MC QTL loci ( Table 2). Five particularly promising candidate genes were identified. The SNP marker Lu3-26033342 was located 58.92 and 49.60 kb from Lus1007101 and Lus10007083 that encode the ortholog of A. thaliana's TRANSPARENT TESTA 8 (TT8) and SUBTILISIN-LIKE SERINE PROTEASE (SBT1.7) ( Figure S5a, Table 2). In another independent QTL on chromosome 3, the SNP marker Lu3-25559600 was located 64.41 and 67.02 kb from Lus10009311 and Lus10009288 that encode the ortholog of A. thaliana's GALACTUROSYL TRANSFERASE-LIKE 5 (GATL5) and MUCILAGE-MODIFIED 4 (MUM4). On chromosome 5, Lu5-3508878 is located 100.78 kb from Lus10008285, an ortholog of another A. thaliana gene implicated in mucilage transcriptional regulation, NAC-REGULATED SEED MORPHOLOGY 1 (NARS1) ( Figure S5a, Table 2). Genes related to embryo, endosperm, and seed coat development, cell wall biogenesis/degradation, anthocyanin biosynthesis, and seed dormancy, were found at QTL loci associated with HC ( Table 2 and Table S2). Among the relevant candidate genes, Lus10035456 encodes the ortholog of A. thaliana's AGAMOUS-LIKE MADS-BOX PROTEIN AGL62 (AGL62) and is located 11.40 kb from the SNP marker Lu7-6577527 ( Figure S5b, Table 2). On chromosome 12, Lu12-5267706 was situated 39.93 kb from Lus10018306 that encodes the ortholog of A. thaliana's GLYCOSYL HYDROLASE FAMILY 17 (GH17). Two other interesting candidate genes, Lus10026902 and Lus10026926, were situated 96.87 and 238.19 k, respectively, from the SNP marker Lu13-2803224. Lus10026902 and Lus10026926 encode the ortholog of A. thaliana's LARIAT DEBRANCHING ENZYME (DBR1) and UDP-GLUCOSE FLAVONOL 3-O-GLUCOSYLTRANSFERASE (UGT79B1), respectively.

Phenotypic Variation of Mucilage and Hull Contents
Flaxseed mucilage and seed hull possess valuable nutritional and rheological attributes [33,34] but are also known to affect animal performance [7]. The presence of mucilage and fiber components (i.e., acid detergent lignin) in flaxseed meal reduces the energy uptake in both monogastric and ruminant animals [35]. Therefore, knowledge about the phenotypic variation and genetic control of seed mucilage content (MC) and hull content (HC) is pivotal to better design breeding strategies aiming to improve the overall food and feed value of flaxseed. The broad phenotypic variation of MC and HC in the association panel and the degree of additivity of the genetic components hint at the potential for improving flaxseed for either high or low MC and reduced HC through marker-assisted selection.
Kaewmanne et al. [4] reported MC ranging from 1.8 to 2.9% in seven Italian flaxseed cultivars, while Oomah et al. [16] found that MC ranged from 3.6 to 8.0% in 109 flaxseed accessions. We found a slightly wider range from 2 to 10% in our diversity panel. Little information exists for HC variation in large collections of flaxseed. In general, HC ranges from 22-27% to 36-48% were reported in mechanically treated and hand-dissected seeds, respectively [7,36], which is much higher than canola at 18.6% and soybean at 16.1% [6]. Reduction of HC can be achieved through the use of yellow-seeded cultivars, known to contain higher oil content and less HC than their brown-seeded counterparts [7,37]. Indeed, the yellow-seeded accessions displayed a lower HC compared to the brown-seeded genotypes. Nevertheless, caution should be exercised in adopting yellow-seeded flaxseed cultivars for reduced HC flaxseed because their susceptibility to natural splitting and mechanical cracking of the seed coat can negatively affect seed quality [38]. Consequently, breeding and seed tests to mechanical damage during harvesting should be conducted together in order to identify the ideal HC that would ensure seed mechanical resistance. All considered, our association panel harbored abundant phenotypic variation for dissecting the genetic landscape of MC and HC.

Population Structure and Linkage Disequilibrium
When the main factors accounting for population subdivision correlate with a trait under study (i.e., geographic distribution and flowering time), then marker-trait associations will undergo a more accentuated inflation of observed p-values as effect of the structure confounding factor [39]. In flaxseed, population structure has been assessed in varying numbers of accessions, where geographic origin and flax morphotype seemed to have been the main factors underlying population subdivisions [40][41][42]. In our association panel, the "red" and "blue" clusters were slightly differentiated (F ST = 0.08), with a weak morphotypic effect on dendrogram topology, possibly due to the small number of fiber types (n = 33) compared to the larger number of oilseed type accessions (n = 153).
Linkage disequilibrium (LD) is the main factor influencing marker density requirement and mapping resolution in GWAS. Mating system and genetic diversity influence LD decay. LD decays more rapidly in outcrossing plant species than in self-pollinated plants [43] and, similarly, in wild relatives and landraces compared to modern cultivars [44]. Here, we observed a rapid LD decay for most of the chromosomes, comparable to some maize commercial elite inbred lines [45] and faster than winter-type Brassica napus (480 to 1283 kb, r 2 = 0.1) [46]. Therefore, the 200 flaxseed accessions of our diversity panel are expected to contain plentiful allelic diversity, as suggested by the generally short LD blocks for the 15 chromosomes, thereby assisting the search for candidate genes through efficient narrowing of the putative QTL regions.

Genome-Wide Association Analysis
Several general (GLM) and mixed (MLM) linear models have been proposed to control both population structure and cryptic relatedness [47][48][49]. In flax, MLM has been the preferred association model for multiple traits [24][25][26]42]. The "red" and "blue" clusters were weakly differentiated, and MC and HC between flax morphotypes was not statistically significant ( Figure S2a,b), in contrast to a report comparing indica and japonica rice types assessed for 34 traits [39]. Hence, the genetic architecture of MC and HC seem to be only weakly correlated with population and family structures, and GLM-PCA was sufficient to control the rate of false positive associations.
The discovery of QTL for agronomic and economically important traits in crops is of great importance for marker-assisted breeding. This is the first report of QTL for MC in flax, likely because this trait has not been a breeding priority in the most important breeding programs of the world [18].
In the present study, GWAS identified seven QTL for MC, and their effects clearly suggest the promise of marker-assisted selection for modifying MC.
Chromosome 3 s multiple MC QTL harbored candidate genes orthologous to Arabidopsis TT8 gene, which is part of a transcription factor complex that, along with GLABRA2 (GL2), regulates MUM4 gene expression [50]. MUM4 is required to produce rhamnose, a key substrate for mucilage biosynthesis [50], and chromosome 3 Lus10009311 is its flax ortholog. In Arabidopsis, GATL5 encodes a glycosyltransferase involved in rhamnogalacturonan I (RG I) backbone synthesis [51]. The presence of a L. usitatissimum ortholog Lus10009311 in a LD block, with a peak SNP for MC, corresponds to the expected role of RG I synthesis. Arabidopsis gene SBT1.7 triggers the activation of cell wall-modifying enzymes necessary for mucilage release upon imbibition [52]. In line with the expected seed coat mucilage dynamics, we identified two orthologous copies of this gene in two independent QTL ( Table 2). Arabidopsis PECTIN METHYLESTERASE INHIBITOR 6 (PMEI6) mutants were defective in seed coat mucilage release [53]. An ortholog of the Arabidopsis gene, PECTIN METHYLESTERASE 36 (PME36), another family member, was located at one of the MC QTL loci identified herein. While PME36 has not been shown to be involved in mucilage release, it might participate indirectly because it exerts a similar role to that of PMEI6 in pectin synthesis and cell wall modification [54].
Oil content is an economically important but genetically complex trait. MC is negatively correlated with oil content, therefore, reducing MC should facilitate increasing oil content. Indeed, reduced accumulation of mucilage accompanied by increased oil content was observed in Arabidopsis MUM4 or GL2 mutants [55]. We observed a significant negative correlation (r = −0.15, p = 0.03) between MC and oil content in the association panel (data not shown). This is perhaps due to increased carbon allocation to the embryo in reduced or no seed coat mucilage synthesis in low MC accessions as proposed in Arabidopsis [55].
Increasing seed oil content and reducing the fiber fraction of the meal have been important goals in oil crop breeding. In B. napus and L. usitatissium, seed coat thickness or HC are negatively correlated with seed oil and protein content, as well as seed color [56][57][58]. QTL for seed coat color to indirectly increase oil content and minimize HC have been identified in B. napus and soybean [37,59,60]. In flax, a pleiotropic QTL controlling yellow seed and white flower color was recently dissected at the molecular level, but its effect on HC has not been addressed [23]. Here, we identified four QTL whose effects reduced HC by 2.6%, on average. Chromosome 7 harbored Lus10035456, which resembles the A. thaliana transcription factor AGL62. AGL62 mutants initiated embryo and endosperm formation, but failed to form a seed coat [61]. Light seed color and low HC are thought to coincide because the biochemical pathways leading to lignin and pigment synthesis share common precursors [59]. In Arabidopsis, the core components of seed coat pigments are proanthocyanidins (PAs) [62]. Chromosome 12 encompassed three candidate genes including the ortholog of Arabidopsis O-GLYCOSYL HYDROLASES FAMILY 17 gene. GH17 is coexpressed with TT12, AHA10, and BAN, that might process glycosylated flavan-3-ol monomers, leading to accumulation of PAs in the seed coat [63]. In black seed soybean, a UDP-GLUCOSE:FLAVONOID 3-O-GLUCOSYLTRANSFERASE (UGT78K1), was isolated from the seed coat, a key enzyme that catalyzes the final step in anthocyanin biosynthesis [64]. Chromosome 13 contained Lus10026926, an ortholog of the A. thaliana UGT79B1, a gene also involved in anthocyanin biosynthesis. Yellow seed color stems from the blocked biosynthesis of PAs that impart the brown color to the seed coat [65]. The flaxseed meal derived from brown-seeded cultivars contains PAs that negatively affect protein digestion [66], hence low PA meal is preferred in animal ration. Additional advantages of modifying the seed color and reducing MC and HC include higher limpidity of the crude oil from the removal of gum-like residues and dark pigments, higher protein content and better feeding value of flaxseed meal for livestock and poultry [7].
Few accessions combined favorable alleles for reduced MC and HC. It should be possible to combine these attributes in a single genotype through the pyramiding of the respective favorable alleles owing to the fact that the significant QTL for both traits did not co-locate in the flax genome.
The development of yellow-seeded cultivars with low HC and either low or high MC for different industrial uses is an opportunity to increase market share and value.

Plant Material, Field Trials, Phenotyping, and Statistical Analyses
A total of 200 L. usitatissimum accessions from the Canadian flax core collection [67] were selected for this study based on their geographic distribution and genetic diversity (Table S3). The 200 genotypes were planted in 2014 and 2015 at the Agriaquaculture Nutritional Genomic Center (CGNA) experimental stations located in Vilcún (CAR2014) and Huichahue (HU2015), La Araucania region, Chile, using a completely randomized design (CRD) with three biological replicates. Genotypes were arranged in rows and columns in order to take into account spatial heterogeneity.
The seed mucilage content (MC) was determined in three biological replicates following the procedure described by Kaewmanee et al. [4] with minor modifications. A total of 2 g of seeds were incubated in 20 mL of water at 100 • C for 15 min in 50 mL Falcon tubes. Next, the tubes were shaken for 30 min at 250 rpm. The soluble extract was recovered by centrifugation at 6132 relative centrifugal force (RCF) for 30 min, and the mucilage fraction was precipitated by incubating in 30 mL of ethanol (95%) overnight at 4 • C. The seeds were recovered, and the extraction procedure was carried out twice more to maximize mucilage recovery. The mucilage pellet was weighed and expressed as milligrams of mucilage per gram of seed (mg g −1 ).
HC was determined in three biological replicates by separating the hull from the embryos using a dissecting needle and tweezers from 50 seeds after imbibition in water for 24 h. Both fractions were dried at 90 • C for 4 h before their dry weights were measured. HC was expressed as (hull dry weight/(hull dry weight + embryo dry weight)) × 100, averaged from 50 seeds.
Variation of phenotypic data was analyzed individually for each environment using a restricted maximum likelihood (REML) analysis. Spatial correction in row and column directions was used with different variance-covariance structures. Spatial models were compared with Akaike information criterion (AIC) and Bayesian information criterion (BIC), and the most appropriate model in each environment was used to obtain a best linear unbiased estimate (BLUEs) for mucilage and hull contents in GenStat v.16 [68]. Descriptive statistics and Shapiro-Wilk normality test were conducted in the R package MVN [69]. Narrow sense heritability (h 2 ) was estimated using variance components from TASSEL v.5.2.31 [70]. Trait h 2 estimates were computed using the equation: h 2 = σ 2 a /σ 2 a + σ 2 e , where σ 2 a is the additive genetic variance and σ 2 e is the residual error variance [70].

Whole Genome Resequencing and SNP Calling
Genotyping by sequencing (GBS) methodology was adopted to genotype the 407 accessions from the Canadian flax core collection. The 407 individuals were grown in pots in a greenhouse with a 16 h light and 8 h dark cycle. Young leaf tissues from single plants were collected for DNA extraction using the DNeasy 96 Plant kit (Qiagen, Mississauga, ON, Canada) according to manufacturer's instructions. Genomic DNA was quantified, sheared, size-selected, and libraries were constructed for each genotype by the Michael Smith Genome Sciences Centre of the BC Cancer Agency, Genome British Columbia (Vancouver, BC, Canada) which also sequenced the libraries generating 100 bp paired-end reads on the Illumina HiSeq 2000 platform (Illumina Inc., San Diego, CA, USA). A total of 26.875 billion 100 bp pair-end reads were generated, corresponding to 6587 MB sequences and 15.5× genome equivalents of the reference genome (~370 MB) [32,71], on average, per individual.
All reads from each individual of the population was aligned to the flax reference sequence (the flax pseudomolecules v2.0) [72] using BWA (v0.6.1) [73] with default parameters. The alignment file for each individual was used as input for SNP discovery using the software package SAMtools [74]. SAMtools converts the sequence alignment files from sequence alignment/map (SAM) to sorted binary alignment/map (BAM) format and call all potential variants (SNP and indels) into the pileup files. Then, all variants were further filtered to get a set of high-quality SNPs. SNPs were filtered by the following criteria: (1) candidate SNP loci must be more than 10 bp away from each other; (2) each candidate SNP must have three mapped reads on the region; and (3) all the singleton SNPs were excluded. All steps of this procedure were implemented in an annotation-based genome-wide SNP discovery (AGSNP) pipeline [75,76]. The coordinates of all SNPs were extracted from the chromosome-based flax pseudomolecules v2.0 [72] [47] was employed with predefined numbers of genetic clustering (K) from 1-5, using 50,000 burn-in iterations, followed by 100,000 MCMC across five independent runs for each K values. The number of clusters (K) was calculated with the Evanno method [77] implemented in the R package POPHELPER v.1.1.10 [78]. A total of 771,914 SNPs, filtered from the 1.7 million SNPs by removing those with a minor allele frequency <0.05 and >10% missing data, were used to produce a dendrogram using the neighbor-joining (NJ) algorithm implemented in TASSEL v.5.2.31 [70]. Genome-wide linkage disequilibrium (LD) and intrachromosomal LD between pairs of SNPs using the 771,914 filtered SNPs were estimated using squared allele frequency correlations (r 2 ) in TASSEL v.5.2.31 [70]. LD values were plotted against physical distance to determine the LD decay using the Hill and Weir [79] function. A cut-off value of r 2 = 0.1 was set to estimate the average LD blocks [41].
GWAS was performed in TASSEL v.5.2.31 [70] using the 771,914 filtered SNPs. Three models were evaluated, including GLM-Q, GLM-PCA, and MLM-K. The Q matrix generated from STRUCTURE was used as a cofactor to adjust for population stratification (GLM-Q). A GLM-PCA was assessed, including up to ten principal component covariates. The ten PCAs were generated in TASSEL v.5.2.31 [70] with 105,038 SNPs (MAF > 0.05 and at least 95% present among the 200 genotypes). For the MLM-K, a kinship matrix was created in TASSEL v.5.2.31 [70] with the set of 105,038 SNPs, and used as covariate to account for cryptic relatedness. A quantile-quantile (Q-Q) plot was displayed using the R package qqman [80] to evaluate the fitness and efficiency of the different models. The final Manhattan plots were also displayed using the qqman package [80]. The Bonferroni correction (0.1/771,914 = −log (P) = 6.88) was used as threshold for the significance of marker-trait associations.
To identify candidate genes associated with significant SNPs, the Jbrowse feature of Phytozome v.12.1 (http://phytozome.jgi.doe.gov/pz/portal.html) was used to examine the L. usitatissimum v.1.0 genome [71] for genes relevant to MC and HC in flaxseed. As mentioned above, a cut-off value of r 2 = 0.1 was set to estimate the average LD block for each chromosome. The defined physical distance was used to pinpoint candidate genes on either side of the most significant SNPs. A plausible candidate gene was defined by the following criteria: (a) the gene had a function known to be related to the trait evaluated based on gene ontology term descriptions in Phytozome; (b) BLASTX searches from the Arabidopsis genome returned orthologous protein sequences with functions associated to the phenotypes of interest.

Conclusions
We performed GWAS using a set of 771,914 SNPs, identifying seven and four QTL for MC and HC, respectively. Above all, chromosome 3 encompassed three QTL harboring promising candidate genes for MC. Three of the QTL associated with HC contained plausible candidate genes related to seed coat and anthocyanin biosynthesis. These favorable QTL alleles will assist the design of market specific flaxseed cultivars with reduced HC while maintaining high MC for food and low MC for feed. The application of the identified SNP markers in molecular-assisted breeding for MC and HC, two complex traits whose phenotyping is labor-intensive and time-consuming, might enable a rapid transfer of favorable alleles into well adapted elite flaxseed cultivars, thus shortening the breeding cycle. Based on our results and previous gene expression studies, we hypothesize that the genetic control of mucilage and hull content in flax might share conserved pathways with Arabidopsis. Further validation of candidate genes, like LuTT8, LuSBT1.7, LuMUM4, and LuAGL62, through gene expression analysis or gene editing, will be necessary to validate the hypothesis mentioned above. Characterization of genes underlying the QTL will expand knowledge of the high complexity of cell wall dynamics involved in seed mucilage and seed coat biosynthesis in flaxseed. Acknowledgments: This work was supported by the Agriaquaculture Nutritional Genomic Center (CGNA), the Programa Regional de Investigación Científica y Tecnológica and the Gobierno Regional de La Araucania, Chile. CGNA acknowledges the collaboration of Agriculture and Agri-Food Canada (AAFC) and the Total Utilization Flax GENomics (TUFGEN) project formerly funded by Genome Canada and other stakeholders of the Canadian flax industry.

Conflicts of Interest:
The authors declare no conflict of interests.