Genome-wide association study of micronaire using a natural population of representative upland cotton (Gossypium hirsutum L.)

Micronaire is a comprehensive index reflecting the fineness and maturity of cotton fiber. Micronaire is one of the important internal quality indicators of the cotton fiber and is closely related to the value of the cotton fiber. Understanding the genetic basis of micronaire is required for the genetic improvement of the trait. However, the genetic architecture of micronaire at the genomic level is unclear. The present genome-wide association study (GWAS) aimed to identify the genetic mechanism of the micronaire trait in 83 representative upland cotton lines grown in multiple environments. GWAS of micronaire used 83 upland cotton accessions assayed by a Cotton 63 K Illumina Infinium single nucleotide polymorphism (SNP) array. A total of 11 quantitative trait loci (QTLs) for micronaire were detected on 10 chromosomes. These 11 QTLs included 27 identified genes with specific expression patterns. A novel QTL, qFM-A12–1, included 12 significant SNPs, and GhFLA9 was identified as a candidate gene based on haplotype block analysis and on strong and direct linkage disequilibrium between the significantly related SNPs and gene. GhFLA9 was expressed at a high level during secondary wall thickening at 20∼25 days post-anthesis. The expression level of GhFLA9 was significantly higher in the low micronaire line (Msco-12) than that in the high micronaire line (Chuangyou-9). This study provides a genetic reference for genetic improvement of cotton fiber micronaire and a foundation for verification of the functions of GhFLA9.


Background
Cotton is an economically important crop and the most important source of the natural fiber for the textile industry (Yu 2018). Gossypium hirsutum L. is characterized by high yield and wide adaptability, accounting for more than 95% of cotton production worldwide. The cotton fiber yield is of primary importance for cotton growers, whereas the textile industry demands high fiber quality (Ali et al. 2018). The important fiber quality properties for spinning primarily include length, strength and micronaire. Micronaire is one of the most important fiber characteristics for international cotton classers and spinners. Micronaire is a comprehensive indicator of fiber fineness and maturity and an important characteristic for the spinning process (He 2005). Highmicronaire fibers are normally coarse, which is undesirable from the point of view of spinning and yarn evenness. Cotton fibers are the longest and fastest-growing cells in known plants. Each cotton fiber is composed of a single cell produced on the surface of an ovule. Fiber development mainly includes four stages: initial development, elongation, secondary wall thickening, and maturity (Pang et al. 2010). Cotton fiber micronaire (FM) is mainly influenced by the formation of the secondary wall of the fiber (Wu et al. 2020). Thickening of the secondary wall of the cotton fiber is a complex physiological process that mainly involves the synthesis and deposition of cellulose, usually starts 16∼19 days after flowering and is jointly regulated by the expression of many genes (Yan 2010). Therefore, the analysis and identification of the candidate genes that regulate FM at the QTL mapping level have important theoretical value for molecular breeding of cotton quality and identification of genetic mechanisms of cotton fiber development.
Linkage mapping is usually used to detect QTLs related to FM in specific mapping populations (Said et al. 2013(Said et al. , 2015a(Said et al. , 2015b. Four QTLs for FM were detected in the A01, A02, and A07 chromosomes using a population of 143 recombinant inbred lines (RILs) (Fan et al. 2018). Twenty-two QTLs for FM were detected using 180 RILs, and 13 of these QTLs were detected in two or more environments (Ali et al. 2018). Additionally, 27 QTLs for FM were detected using BC 3 F 2 , BC 3 F 2:3 , and BC 3 F 2:4 populations, including 11 QTLs that were located near the same marker in various populations or near the linked markers in the same population (Wang et al. 2017a). However, the populations used for linkage mapping are usually characterized by a large positioning interval, limited recombination times, and low genotype variation. The rapid development of genome sequencing resulted in successful applications of genome-wide association studies (GWAS) for genetic dissection of the fiber quality traits, including FM. Resequencing of the markers identified 3 and 533 significant SNPs for FM in the groups of 362 and 419 diverse upland cotton accessions, respectively (Ma et al. 2018b;Wang et al. 2017b). Additionally, 503 upland cotton accessions were individually genotyped using a Cotton 63 K Illumina SNP array, resulting in the identification of 3 stable QTLs associated with the FM trait located on the A05, D05 and D12 chromosomes. Previous studies identified many QTLs for FM mainly distributed on the A03, A07, A12, D03, D08, and D11 chromosomes. Comparison with the orthologs in other organisms demonstrated that various genes, such as GhADF1, GhWLIM1a, and GhXTH, are related to the development of the fiber cell walls in upland cotton (Han et al. 2013;Michailidis et al. 2009;Wang et al. 2009). Fasciclin-like arabinogalactan protein (FLA) is an arabinogalactan protein (AGP) with one or two fasciclin domains involved in plant cell development (Showalter 2001). FLA may play a role in cell elongation and secondary wall maturation. Huang et al. (2008) isolated and identified 19 GhFLAs from cotton, 7 of which were expressed at a high level during fiber development; however, genes regulating FM during thickening of the secondary wall of the fiber were not reported. Furthermore, the study focused only on certain specific genes; however, exploration of the whole genetic architecture of FM with QTL mapping is very important. A diversity panel consisting of 83 upland cotton accessions was genotyped using a Cotton 63 K Illumina Infinium SNP array to investigate genetic variation of FM at a natural population level. Fiber micronaire was measured in five environments. GWAS was performed to identify the SNP loci or QTL regions associated with FM in upland cotton. Candidate genes controlling FM were predicted by haplotype block analysis in the novel and stable QTL regions. These results provide a foundation for FM improvement by marker-assisted breeding.

Plant materials
The present study investigated a diversity panel of 83 representative upland cotton accessions obtained from cotton germplasm collections housed in our laboratory and the low-temperature germplasm gene bank of the Institute of Cotton Research, Chinese Academy of Agricultural Sciences (ICR-CAAS). The detailed information about 83 upland cotton samples has been presented in a previous study from our laboratory (Ma et al. 2018a, b).

Field experiments and phenotyping
All of the 83 upland cotton accessions were planted at Anyang (Ay), Henan, China (36. 06°N, 114.49°E), in 2014 and at Aral (Ale), Xinjiang, China (40.55°N, 81.28°E), in 2016 (designated 16_Ale). These two sites were used as representative cotton production locations in the Yellow River Valley region and northwest inland region, respectively. They were planted at Sanya (Sy), Hainan, China (18.41°N, 109.20°E), another representative cotton production area, in 2016 (designated 16_Sy). The field experiments followed a randomized complete block design with three replications. At Anyang, each accession was grown in a single-row plot with 18∼23 plants, with a plot length of 4 m and a row spacing of 0.38 m; at Aral, each accession was grown in a plot with 30∼40 plants in two rows, with a plant spacing of 0.1 m and a row spacing of 0.45 m. The planting at Sanya had a plant spacing of 0.11 m, a row spacing of 0.38 m, and a plot length of 5 m with a total of 50 plants. The trial management procedures followed standard breeding field practices. A total of 20 naturally opening bolls in the middle of plants were manually harvested from each accession in September at Anyang and Aral. The FM of each harvested sample was evaluated using a highvolume instrument (HVI) 900 (Cotton Fiber Quality Inspection and Testing Center of Ministry of Agriculture and Rural Affairs, Anyang, Henan, China). Statistical analyses, including descriptive statistics and correlation analysis of FM in 83 upland cotton accessions across five environments, were performed using SPSS 24.0 (IBM, New York, USA). The combined broad-sense heritability (H 2 ) of micronaire in various environments was estimated by QTL IciMapping 4.2.0 .

QTL mapping for FM SNP genotyping
Genomic DNA from all accessions was extracted from young leaf tissue using the quick cetylrimethylammonium bromide (CTAB) method (Zhang and Stewart 2000). SNP genotyping of the association panel was performed using a 63 K Illumina Infinium SNP array (Hulse-Kemp et al. 2015). A total of 15 369 SNP markers were identified and used for subsequent analysis. The details were reported previously by Ma et al. (2018a).

Identification of SNPs associated with FM
A previous study analyzed the population structure and linkage disequilibrium (LD) (Ma et al. 2018a, b). The best linear unbiased predictions (BLUPs) for FM across five environments were estimated using R 3.6.3. The BLUP values for the five environments and the phenotypic values for FM in each environment were used for GWAS. The general linear model (GLM) in TASSEL 5.0 was used for GWAS, and the population structure was considered a fixed effect (Ma et al. 2019). The Bonferroni threshold of SNP significance was P < 6.51 × 10 − 5 (P = 1/n, n = the number of SNPs, −log 10 (1/15369) ≈ 4.19) Yang et al. 2014;Liu et al. 2016). Manhattan plots were generated using the "CMplot" package for R (Turner 2014). Stable QTLs were identified in two or more different environments in the present study. The QTLs were named as follows: q + trait abbreviation -chromosome -QTL number ).

Candidate gene identification
Transcriptome sequencing data (https://www.ncbi.nlm. nih.gov/bioproject/PRJNA490626/) from G. barbadense Hai7124 and G. hirsutum TM-1 tissues including 0 days post-anthesis (0 DPA), 1 DPA, 3 DPA, 5 DPA, fiber at 10 days post-anthesis (fiber-10 DPA), fiber-20 DPA and fiber-25 DPA, were acquired from a previous study (Hu et al. 2019). The screening criteria were defined based on the expression levels in the fiber at 20 or 25 DPA, which should be significantly higher than those at 0∼15 DPA. The genes related to FM were subsequently analyzed according to functional gene annotation. R software was used to generate the haplotype blocks based on the stable qFM-A12-1 SNPs, corresponding to the transcriptome database and functional gene annotations, and differentially expressed genes were screened using intervals (Su et al. 2016).
The expression trends of the candidate genes were validated by quantitative reverse transcription-polymerase chain reaction (qRT-PCR) analysis. Total RNA was extracted from the fibers collected from five developmental time points (5, 10, 15, 20, and 25 DPA) of the highmicronaire (Msco-12) and low-micronaire lines (Chuangyou-9) using a FastPure Plant Total RNA Isolation Kit (polysaccharides & polyphenolics-rich) (Vazyme Biotech Co., Ltd., Nanjing, China). And cDNA was synthesized using a reverse transcription kit (HiScript II Q RT Super-Mix for qPCR). ChamQ universal SYBR quantitative fluorescent kit and a quantitative fluorescent PCR instrument (ABI7500, Eppendorf, Germany) were used for qRT-PCR validation. The qRT-PCR conditions were as follows: DNA polymerase activation as the first step at 94°C (30 s); the second step included 40 cycles at 94°C (5 s), 58°C (15 s), and 72°C (12 s); the third step included melting curve analysis; the final step included incubation at 12°C (1 min). Histone3 (AF024716) was used as a housekeeping gene (Tu et al. 2007). The experimental design included 3 technical and 3 biological replicates for each gene, and the 2 −ΔΔC T method was used to calculate the relative gene expression (Livak and Schmittgen 2001).

Phenotypic variation of FM
Significant variation in FM was observed in 83 upland cotton accessions; FM varied from 2.73 to 6.50, with an average of 4.65 (Table 1 and Fig. S1). In the 14_Ay, 15_ Ay, 16_Ay, 16_Ale, and 16_Sy environments, the natural population had average FM values of 5. 00, 4.76, 4.92, 4.32, and 4.23, respectively. The coefficients of variation (CVs) for FM ranged from 9.09% to 14.77% in five environments, indicating that the variation of the cotton FM index was adequate (Table 1). As shown in Fig. 1, FM was normally distributed in five environments. The analysis results indicated that cotton fiber in the Yellow River Valley was relatively thicker, and that in the northwest inland area was relatively finer. Correlation analysis demonstrated a very significantly positive correlation of the FM trait in five environments, and the correlation coefficient varied from 0.4 to 0.7 (Fig. 2). Analysis of variance indicated that the genotypes and environments had significant effects on FM (P < 0.01), and the estimated broad-sense heritability was 86.27% (Table 1,  Table S1). These results suggested that FM in upland cotton is highly heritable; however, the role of environmental factors in fiber development cannot be ignored.
The results indicated that this population is suitable for the GWAS of the cotton FM trait.

GWAS
GWAS for FM was performed using BLUPs across five environments, and individual environments were evaluated based on a GLM considering Q-matrix (GLM (Q)). A threshold of -log 10 (P) > 4.19 was considered to indicate significant SNPs. A total of 18, 37, 67, 7, 21, and 25 SNPs were significantly associated with FM in 14_Ay, 15_Ay, 16_Ay, 16_Ale, 16_Sy, and BLUP, respectively ( Fig. 3; Fig. S2). Analysis of significantly related SNPs identified a total of 55 common SNPs, including 16 SNPs detected in two environments. These 55 significantly related SNPs were distributed on 10 chromosomes (A10, A12, A13, D03, D06, D08, D09, D10, D11, and D12), among which, chromosome A12 has the most SNPs of 12. The QTL interval was calculated as significant SNP position ± LD based on LD specific for each chromosome (Ma et al. 2018a, b). Thus, a total of 11 stable QTLs related to FM were identified. These QTLs explained 18.10%~31.64% of the phenotypic variation in FM. The phenotypic contributions of these 11 markers  in qFM-A12-1 were more than 20%. The results of GWAS were compared with the data of previous association and linkage studies for FM to verify the reliability of 11 identified stable QTLs for FM. The simple sequence repeats (SSR) and SNP markers of QTLs previously identified as related to FM were aligned to known marker sequences of the G. hirsutum TM-1 genome using local basic local alignment search tool (Zhang et al. 2015b). Comparison indicated that 8 stable QTLs for FM were reported in previous studies and 3 new QTLs were detected in the present study ( Table 2).

Identification of the candidate genes in 11 stable QTLs
Combined functional annotation of the genes in stable QTLs and the analysis of the Arabidopsis genes homologous to the cotton fiber transcriptome database were used to identify the candidate genes related to FM (Du et al. 2018). A total of 1 594 genes were present in 11 QTLs according to local BLAST of the upland cotton genome from Zhejiang University (http://ibi.zju.edu.cn/ cotton). The fiber quality of G. barbadense was significantly better than that of G. hirsutum, and significant differences in FM between G. barbadense Hai7124 and G. hirsutum TM-1 were detected. The Hai7124 and TM-1 transcriptome databases were used to determine the differences in the gene expression levels during fiber development (0, 1, 3, 5, 10, 20, and 25 DPA). The fiber expression levels at 20 to 25 DPA were higher than those at 0 to 15 DPA, and the expression levels in the fibers of the two materials were significantly different. A total of 27 genes were initially identified (Table 3, Fig. 4). Annotation analysis of 27 differentially expressed genes versus previously reported genes were found to be related to fiber development, such as GhTUB6 and GhFLA.
Mining of the candidate genes in qFM-A12-1 qFM-A12-1 contained the most significant SNPs, and the phenotypic contribution of this locus was stable and high. However, previous studies have not located any genes related to FM at this locus. Therefore, the present study focused on the analysis of qFM-A12-1 to identify the candidate genes influencing FM. A haplotype block of qFM-A12-1 was constructed to narrow the QTL interval, which was subsequently narrowed to location from i27811Gh to i41399Gh based on direct strong LD between significantly related SNPs and the corresponding genes (Fig. 5a). The i41990Gh marker in the interval was used to genotype upland cotton. The accessions carrying the AA (44) allele had significantly higher micronaire than that in the accessions with the GG (35) allele (Fig. 5b), indicating that this interval may significantly influence FM; Chuangyou-9 and Msco-12 harbor the AA and GG alleles, respectively. GhFLA9 is located between the i27811Gh and i41990Gh markers, and the expression levels of GhFLA9 in the fiber at 20 and 25 DPA were significantly higher than those in Hai7124 and TM-1 at 5 and 10 DPA. The highest expression level was observed at 20 DPA, and the expression levels of GhFLA9 in the fiber in Hai7124 at 20 and 25 DPA were significantly higher than those in TM-1 (Fig. 4). In this study, qRT-PCR was used to determine the expression levels of GhFLA9 at various fiber development stages (5, 10, 15, 20, and 25 DPA) in the high-micronaire and lowmicronaire genotypes. The expression level of GhFLA9 increased during fiber development. Moreover, the expression levels in the fiber in Msco-12 at 20 and 25 DPA were significantly higher than those in Chuangyou-9 (Fig. 6). These data indicated that GhFLA9 may be involved in the regulation of FM and may act as a positive regulator.

Discussion
The quality of the cotton fiber mainly depends on being "long, strong and fine", where "fine" refers to micronaire, which has a direct impact on the processing of cotton fiber and the quality of the derived products. A moderate FM provides for high yarn strength and good spinning quality of cotton fiber. Therefore, the genetic improvement of cotton FM has become an important area of cotton research. In the early stage of the project, a 63 K Illumina Infinium SNP array was used to obtain 15 369 high-quality SNPs in 83 upland cotton accessions. The population was used to perform an association analysis of dynamic fiber length and oil traits, and the candidate genes were identified (Ma et al. 2018a;Ma et al. 2019), so it is also suitable for GWAS of related traits. FM of 83 upland cotton accessions used in the present study was characterized by high phenotypic variation and a significant variation range in 5 environments ( Table 1). The H 2 value of FM obtained in the present study was relatively high (86.27%) and very similar to the H 2 values (76.37%~94.00%) obtained in previous studies (Ma et al. 2020;Sun et al. 2010). These results indicated that the FM trait is mainly influenced by the genotype and is suitable for GWAS. GWAS of FM in 5 environments identified 11 stable QTLs containing 55 significant SNPs, including 4 QTLs located in the At subgenome and 7 QTLs located in the Dt subgenome ( Table 2). The mapped interval was compared with the results of QTL mapping obtained in previous studies of micronaire traits using genetic population linkage analysis or natural population association analysis. Comparison indicated that 8 QTLs identified in the present study were similar to those identified in previous studies (Chen et al. 2018;Diouf et al. 2018;Huang et al. 2017;Jamshed et al. 2016;Jia et al. 2018;Wang et al. 2017a;Wang et al. 2017b;Zhang et al. 2016 ). Huang et al. (2017) used an Illumina Infinium SNP array for GWAS of 503 upland cotton accessions and identified the stable QTL qGhMV-c26-1, which coincides with the relatively small qFM-D12-1 described in the present study. This result indicated that GWAS of FM in 83 upland cotton accessions is credible. The results of GWAS combined with transcriptome and qRT-PCR data analyses identified the genes related to fiber development, including GhTUB6 and GhFLA. GhTUB6 is predominantly expressed during the rapid elongation stage of the cotton fiber cells and synthesis of the primary cell wall (He et al. 2008). The GhFLA gene is involved in cotton fiber development and can promote the expression levels of the genes responsible for the synthesis of the primary cell wall (Huang et al. 2008). New QTLs identified in the present study were stable in multiple environments and provided a reference for cotton molecular marker-assisted selection with targeting micronaire. The stable QTL qFM-A12-1 identified in the present study has not been reported previously. This QTL contains 12 adjacent significant SNPs. Therefore, we focused on the differential expression of the genes located within this interval. The interval was narrowed according to a haplotype block and transcriptome data. The GhFLA9 gene encoding fasciclin-like arabinogalactan protein was identified. This gene is mainly expressed in the secondary wall thickening stage (20∼25 DPA) during fiber development, and the expression was positively correlated with micronaire (Fig. 4). The expression level of this gene is the highest in the Kezi cotton fibers at 10 DPA (Huang et al. 2008), which is different from the expression trend observed in the most recent G. hirsutum TM-1 transcriptome, indicating that the expression of this gene may differ between different materials. Subsequently, we performed a genotyping analysis based on the significant SNP i41990Gh. The results indicated that the genotypes of the high-micronaire line (Chuangyou-9) and the low-micronaire line (Msco-12) had significant differences. Subsequent qRT-PCR analysis of GhFLA9 in the fiber tissues of the two materials at five stages (5, 10, 15, 20, and 25 DPA) indicated that the expression of this gene was consistent with the expression in the whole transcriptome assay. In the secondary wall thickening stage (20∼25 DPA), the expression level in the lowmicronaire material was higher than that in the high micronaire material. Thus, a trend of GhFLA9 expression identified in the present study is relatively reliable.

Conclusions
GWAS of FM was performed in 5 environments, and 11 QTLs related to FM were identified, including 8 QTLs colocalized with QTLs identified in previous studies. Three QTLs were newly identified in the present study. Twenty-seven candidate genes related to FM were identified within 11 stable QTLs, and haplotype block analysis and qRT-PCR assay of the stable QTL qFM-A12-1 indicated that GhFLA9 may positively regulate the development of cotton FM.
Additional file 1: Table S1. Analysis of variance for FM.
Additional file 2: Fig. S1. Box plots of fiber micronaire in upland cotton accessions in different environments. Fig. S2. QQ plots against  6 qRT-PCR analysis of GhFLA9 during fiber development. Expression levels of GhFLA9 are shown at five developmental stages of the fiber. ** indicates significant differences at P = 0.01 genotypic and phenotypic data represent the normal distribution for genotypic and phenotypic data. The lowercase letters a, b, and c represent the QQ plots in Anyang in 2014, 2015 and 2016, respectively; the letters d and e represent the QQ plots in Aral and Sanya in 2016, respectively; and f represents the QQ plots for BLUP.