QTL Mapping of Fiber- and Seed-Related Traits in Chromosome Segment Substitution Lines Derived from Gossypium hirsutum × Gossypium darwinii

A narrow genetic basis limits further the improvement of modern Gossypium hirsutum cultivar. The abundant genetic diversity of wild species provides available resources to solve this dilemma. In the present study, a chromosome segment substitution line (CSSL) population including 553 individuals was established using G. darwinii accession 5-7 as the donor parent and G. hirsutum cultivar CCRI35 as the recipient parent. After constructing a high-density genetic map with the BC1 population, the genotype and phenotype of the CSSL population were investigated. A total of 235 QTLs, including 104 QTLs for fiber-related traits and 132 QTLs for seed-related traits, were identified from four environments. Among these QTLs, twenty-seven QTLs were identified in two or more environments, and twenty-five QTL clusters consisted of 114 QTLs. Moreover, we identified three candidate genes for three stable QTLs, including GH_A01G1096 (ARF5) and GH_A10G0141 (PDF2) for lint percentage, and GH_D01G0047 (KCS4) for seed index or oil content. These results pave way for understanding the molecular regulatory mechanism of fiber and seed development and would provide valuable information for marker-assisted genetic improvement in cotton.


Introduction
Cotton, one of the most widely cultivated cash crops in the world, mainly provides natural fiber for modern textile industries.Furthermore, it is also a significant source of edible oil, feed, and biofuel [1,2].The wide range of uses of cotton have linked it closely with people's daily lives.To meet the conditions of the reduction of arable land, the rapid development of the textile industry, and the demands of better clothing quality, it is becoming increasingly relevant to cultivate and popularize cotton varieties with superior performance, including their high yield and the high quality of their fiber and seed [3].However, fiber-and seed-related traits are quantitative traits controlled by multiple genes and are simultaneously subjected to environmental factors, resulting in the slow progress of genetic improvement [4].
The Gossypium genus consists of 52 species, including 45 diploids and 7 allotetraploids [5].G. hirsutum, a widely cultivated cotton species, produced 95% cotton in the world [6].However, the narrow genetic basis of modern upland cotton cultivars has resulted from long-term domestication and artificial selection has limited the development of cotton research and breeding.It has become a continuous pursuit to explore new germplasm resources and discover novel alleles relevant to important agronomic traits in cotton research [7].Wild cotton species have gained excellent characteristics and adaptation mechanisms to resist various adverse factors under long-term natural selection, which are potential resources for broadening the genetic diversity of G. hirsutum [5].There are five wild cotton species, including G. tomentosum, G. mustelinum, G. darwinii, G. ekmanianumsi, and G. stephensii [8].To date, three wild cotton species (G.mustelinum, G. darwinii, and G. tomentosum) have been used for genetic breeding and improvement [7,9,10].G. darwinii, one of the allotetraploid wild cotton species distributed on the Galapagos Islands [11], is closely related to G. barbadense but quite different from the cultivated G. barbadense and G. hirsutum.It has the advantage of fine fiber, drought resistance, verticillium wilt resistance, and so forth [12].It is of great importance to explore favorable alleles from G. darwinii and be applied for upland cotton cultivar improvement.
Due to the barrier of interspecific incompatibility, segregation distortion, suppression of recombination, and linkage drag need to be overcome first [13][14][15]; it is quite difficult to transfer these favorable traits directly into cultivated cotton by conventional breeding.Chromosome segment substitution lines (CSSLs) are optimal tools for transferring favorable genes into cultivated cotton.After a series of backcrossing and self-crossing events, these CSSLs are distinct from each other by several DNA polymorphisms, which were substituted by different segments from the donor parent.Except for the comparatively small, substituted segments, all the other fragments of the chromosomes are derived from the receptor parent.Construction of chromosome segment substitution lines will provide the foundation for the genetic dissection of important traits and facilitate utilization in breeding by marker-assisted selection (MAS) [16].In the previous study, an introgression line (IL) population including 105 lines was obtained by crossing G. darwinii Watt with Ejing1, and 40 QTLs controlling fiber quality were identified [7].However, a small number of markers and lower introgression percentages limited the application of this introgression line (IL) population in further research or cotton breeding.
In this study, a high-density genetic linkage map was first constructed based on a BC 1 population developed from the CCRI 35 and G. darwinii accession 5-7.A set of 553 progeny CSSLs of advanced generations were obtained.Then, we genotyped these CSSLs and analyzed the distribution of chromosome fragments introgressed from G. darwinii.QTL mapping of cotton seed-and fiber-related traits were detected based on phenotypic values in multi-environments.Genes in the confidence interval were analyzed to predict the candidate genes of stable QTLs or QTL clusters.

Genetic Map Construction
A total of 6009 primers were first screened for the polymorphism between CCRI35 and G. darwinii 5-7.Subsequently, 2178 primer pairs with clear polymorphism were carried to genotype the BC 1 population of (CCRI35 × G. darwinii 5-7) × CCRI35.Finally, a highdensity genetic map including 2309 loci was constructed.This map covered 4002.16cM with an average distance of 1.73 cM between adjacent markers.The A t subgenome contained 1006 loci spanning 2094.06 cM and the D t subgenome contained 1303 loci spanning 1908.10 cM.The genetic distance of 26 chromosomes were ranged from 104.30 cM (ChrD03) to 251.60 cM (ChrA05) (Figure 1A; Table S1).Moreover, the genetic map showed high collinearity with both the G. hirsutum and G. darwinii reference genomes (Figure 1B).

Introgressive Segments Analysis of the CSSLs
A chromosome segment substitution line (CSSL) population including 553 individuals was constructed in this study.Then, a total of 551 markers evenly distributed on 26 chromosomes were selected to test the genotype of 553 CSSL lines.The coverage of the introgressed segments in the genome and percentage of genome coverage were investigated.As shown in Figure 2A, the introgressed segments covered 26 whole chromosomes.The number of introgressed segments in each line ranged from 2 to 73 (Figure 2C).The

Introgressive Segments Analysis of the CSSLs
A chromosome segment substitution line (CSSL) population including 553 individuals was constructed in this study.Then, a total of 551 markers evenly distributed on 26 chromosomes were selected to test the genotype of 553 CSSL lines.The coverage of the introgressed segments in the genome and percentage of genome coverage were investigated.As shown in Figure 2A, the introgressed segments covered 26 whole chromosomes.The number of introgressed segments in each line ranged from 2 to 73 (Figure 2C).The genetic distance of the introgressed chromosome segments from G. darwinii for each CSSL ranged from 15.77 to 914.47 cM, with an average of 321.13 cM (Figure 2D).The coverage of the introgressed segments ranged from 0.4% to 23.2%, with an average of 10.27%.

Characterization of Phenotypic Performance
Due to the specific growth habit of G. darwinii, only the recurrent parent CCRI35 together with CSSLs population were planted in four environments.Descriptive statistics of phenotypic traits of the CSSLs population are presented in Table S1.Among the CSSLs population, all traits were distributed continuously with some fluctuations in different environments (Figure 3).The absolute skewness of all traits in the four environments was less than one, thus following a normal distribution.Moreover, the results of the ANOVA test showed that both environmental and genotypic effects have a significant impact on all examined traits (Table S2).Correlation analysis of all traits was conducted and is visualized in Figure 4.Among the six fiber-related traits, LP and FM show a strong negative correlation with FL and FS.A significantly positive correlation occurred among FL, FS, and FE.Among the seven seed-related traits, SI was significantly positively correlated with OC and OA but negatively correlated with PC and LA.A strong negative correlation existed between OC and PC, OA, and PA, as well as OA and LA.Moreover, highly significant correlations were observed between fiber-related traits and seed-related traits.Six paired traits (LP and PC/LA, FL and SI, FS and SI/OA, and FE and OA) showed significant positive correlations.Four paired traits (LP and SI/OC/OA, FL and PA, FS and PA/SA, and FE and PA/SA/LA) showed significant negative correlations.Correlation analysis of all traits was conducted and is visualized in Figure 4.Among the six fiber-related traits, LP and FM show a strong negative correlation with FL and FS.A significantly positive correlation occurred among FL, FS, and FE.Among the seven seed-related traits, SI was significantly positively correlated with OC and OA but negatively correlated with PC and LA.A strong negative correlation existed between OC and PC, OA, and PA, as well as OA and LA.Moreover, highly significant correlations were observed between fiber-related traits and seed-related traits.Six paired traits (LP and PC/LA, FL and SI, FS and SI/OA, and FE and OA) showed significant positive correlations.Four paired traits (LP and SI/OC/OA, FL and PA, FS and PA/SA, and FE and PA/SA/LA) showed significant negative correlations.and 0.001 probability levels, respectively.LP, lint percentage (%); SI, seed index (g); FL, fiber length (mm); FS, fiber strength (cN/tex); FU, fiber uniformity (%); FM, fiber micronaire; FE, fiber elongation (%); PC, protein content (%); OC, oil content (%); PA, palmitic acid content (%); SA, stearic acid content (%); OA, oleic acid content (%); LA, linoleic acid content (%).

QTL Mapping for Fiber-and Seed-Related Traits
A total of 104 QTLs, including 27 QTLs for lint percentage and 77 QTLs for five fiber quality traits, were identified with the phenotypes from four environments (Figure 5; Table S3).Of the QTLs, 19 QTLs were detected in two or more environments and considered as stable QTLs.Three QTLs (qLPA10.3,qLPD01.1, and qFED11.1)were identified across three environments, and qLPD03.1 was detected across four environments.The favorable alleles of all stable lint percentage QTLs were contributed by CCRI35, whereas the favorable alleles of fiber quality related QTLs were derived from G. darwinii 5-7.

QTL Mapping for Fiber-and Seed-Related Traits
A total of 104 QTLs, including 27 QTLs for lint percentage and 77 QTLs for five fiber quality traits, were identified with the phenotypes from four environments (Figure 5; Table S3).Of the QTLs, 19 QTLs were detected in two or more environments and considered as stable QTLs.Three QTLs (qLP A10.3 , qLP D01.1 , and qFE D11.1 ) were identified across three environments, and qLP D03.1 was detected across four environments.The favorable alleles of all stable lint percentage QTLs were contributed by CCRI35, whereas the favorable alleles of fiber quality related QTLs were derived from G. darwinii 5-7.
Based on the position of 235 fiber-and seed-related QTLs, we found 25 QTL clusters be made up of 114 QTLs (Figure 5; Table 1).These clusters were distributed on 17 chromosomes, including 9 clusters on the At subgenome and 8 clusters on the Dt subgenome.Eleven out of twenty-five clusters contained five or more QTLs.Both A01-cluster-2 and D11-cluster-1 included seven QTLs.D03-cluster-1 contained nine QTLs, and six out of nine QTLs were found in three or more environments.Eleven of the QTL clusters contained at least one stable QTL, and five QTL clusters contained more than two stable QTLs.Four QTL clusters (A02-cluster-1, A09-cluster-1, A12-cluster-1, and D01-cluster-2) contained only seed-related QTLs, while others were associated with both fiber-and seed-related traits.These QTL clusters combined with stable QTLs above would provide potential loci for further functional research and cotton breeding.Four QTL clusters (A02-cluster-1, A09-cluster-1, A12-cluster-1, and D01-cluster-2) contained only seed-related QTLs, while others were associated with both fiber-and seedrelated traits.These QTL clusters combined with stable QTLs above would provide potential loci for further functional research and cotton breeding.

Functional Annotation of Candidate Genes in QTL Clusters
According to the annotation of the reference genome [17], the confidence intervals of 25 QTL clusters contained 10,747 genes.Among these genes, 6172 genes were expressed in the fiber or ovule development of CCRI35 (Table S4).Based on the expression profiles at fiber and ovule development stages, 6172 genes were classed into 16 clusters (Figure 6A).The genes in the 16 clusters show different expression patterns at different development stages.

Utilization of G. darwinii and Its Related CSSLs
Compared with G. barbadense and G. tomentosum, the research on G. darwinii is relatively backward.G. darwinii possessed a variety of excellent characteristics to adapt to the It is reported that seed oil accumulation mainly occurs at the late stage of ovule development [18].In Cluster 2, Cluster 8, and Cluster 14, genes expressed at 8 DPA and 18DPA of ovule were mainly related to seed size (Figure 6A).Cell cycle arrest (GO:0007050) and glycogen biosynthetic process (GO:0005978) were significantly enriched (Figure S3; Table S8).While genes in Cluster 10 and Cluster 16, showing high expression at the late stage of ovule development, may be associated with seed size and oil accumulation (Figure 6A).The GO terms related to seed development and oil accumulation were significantly enriched, such as the xyloglucan metabolic process (GO:0010411), cellular glucan metabolic process (GO:0006073), xyloglucosyl transferase activity (GO:0016762), and lipid binding (GO:0008289) (Figure 6C; Table S9).

Utilization of G. darwinii and Its Related CSSLs
Compared with G. barbadense and G. tomentosum, the research on G. darwinii is relatively backward.G. darwinii possessed a variety of excellent characteristics to adapt to the wicked environment, such as fine fiber, drought resistance, verticillium wilt resistance, and so forth.In the present study, chromosome segments spanning almost the entire G. darwinii genome were introgressed into G. hirsutum through multiple-generation backcross (Figure 2A).Based on this CSSL population, a great many QTLs controlling fiber-and seed-related traits were identified (Table S3).The QTLs associated with drought resistance and verticillium wilt resistance will be analyzed in future research.These QTLs and CSSLs provide a foundation for in-depth research on fiber and seed development.Moreover, some chromosome segment substitution lines possessing favorable allele for one or more target traits could be directly utilized as superior varieties.

Sources and Effects of Favorable Alleles
Introducing favorable alleles from other species into G. hirsutum cultivars is one of the major strategies in the breeding practice of cotton.Throughout this process, favorable allele identification is a crucial step.The number of favorable alleles had been reported in G. barbadense, G. mustelium, and G. tomentosum [9,10,19-23], which is a viable resource for improving fiber quality of G. hirsutum.In this study, a total of 235 QTLs associated with 13 fiber-and seed-related traits were identified.Among these QTLs, 102 favorable alleles of QTLs were from G. darwinii and 133 favorable alleles of QTLs were from CCRI35.Most of the favorable alleles for lint percentage were derived from G. hirsutum cultivar CCRI35 (22/27), which may be the result of a pursuit of fiber yield during the domestication of cotton.Meanwhile, most of the favorable alleles for fiber quality were from G. darwinii (60/77).It is noteworthy that among the detected stable QTLs (qFL A01.1 , qFL A10.1 , qFL A11.1 , qFL D05.1 , qFS A05.2 , qFS A11.1 , qFS D10.1 , qFS D11.1 , qFM A01.1 , qFE A01.1 , qFE A05.1 , qFE A11.2 , and qFE D11.1 ) related to fiber quality, the favorable alleles were all derived from G. darwinii (Table S3).Moreover, G. darwinii also provided the favorable alleles of 75 seed-related QTLs, including stable QTL (qSI A10.1 , qSI D01.1 , qSI D03.2 , qSI D05.2 , and qOC D03.1 ) associated with seed index and oil content.The abundant favorable alleles of G. darwinii can help to improve fiber quality, seed size, and oil content and are of great significant research value for the improvement of upland cotton varieties.

Comparison of CSSLs between G. darwinii and G. barbadense
In comparison with other tetraploid cotton, G. darwinii and G. barbadense show the most recent divergence (~0.20 Ma) and are considered to be descendants from a common ancestor [11,24].There is a close kinship and some divergence between G. darwinii and G. barbadense.In our previous study, a chromosome segment substitution line population was established using G. barbadense cultivar Pima S-7 as the donor parent and G. hirsutum cultivar CCRI35 as the recipient parent [23].It shares common recurrent parent (G.hirsutum cultivar CCRI35) with the population developed in this study.The introgressive segments of either G. barbadense or G. darwinii span almost the whole genome in their CSSL populations (Figure 2).Totals of 105 and 104 QTLs were identified based on the CSSLs population of G. barbadense or G. darwinii, respectively.Among these QTLs, 34 QTLs (~32%), including 9 QTLs for lint percentage and 25 QTLs for fiber quality, were detected in both two CSSL populations.Compared with G. barbadense, there are less common QTLs for fiber-related traits in G. tomentosum CSSL population [10].These results may be the result of the closer genetic relationship between G. darwinii and G. barbadense compared to others.Meanwhile, the specific QTLs in each CSSLs population indicated the divergence of G. darwinii or G. barbadense.This divergence would provide novel favorable alleles for upland cotton improvement.Moreover, qLP D03.1 was also detected in multiple populations with different male parent including wild and semi-wild species [6,10,23,25].It indicated that the allele of qLP D03.1 is an ideal allele for improving fiber yield and may be a key watershed in the domestication of upland cotton cultivar.

Identification of Candidate Genes Associated with Stable QTLs
According to the physical distance of confidence intervals, three stable QTLs with relatively small intervals were selected for candidate gene identification.qFL A01.1 , a member of A01-cluster-2, was positioned at 13.79-34.21Mb on A01.In this region, an auxin response factor (ARF5, GH_A01G1096) associated with auxin signal was located.There is an SNP in the coding region of GoARF5 between the two parents.It was reported that GhARF5 regulates the expression of GhROP7/GhRAC13, and then affects the onset of secondary growth [26].In A10-cluster-1, two stable QTLs (qLP A10.1 and qSI A10.1 ) were mapped to a 1.97 Mb interval on A10, where PDF2 (GH_A10G0141), a homeobox-leucine zipper protein, was located.Three nonsynonymous mutations were found between G. hirsutum and G. darwinii.Cotton PDF2 was highly expressed in ovule epidermis and fiber cells.Knockout PDF2 significantly decreased fiber initials on 0 DPA ovules [27].Therefore, GoARF5 and GoPDF2 might be the possible underlying gene of qFL A01.1 and qLP A10.1 , respectively, and play a key role in regulating fiber development.
qOCD01.1, detected in three environments and overlapping with qLP D01.1 and qSI D01.1 in D01-cluster-1, was anchored on 165366-1717322bp on D01.KCS4 (GH_D01G0047), a gene encoding an enzyme involved in very-long-chain fatty-acid (VLCFA) synthesis, which is a branch point in the regulation of triacylglycerol synthesis [28].A base deletion in the first exon results in frame shift and thus an extension of 137 amino acid in G. darwinii 5-7.Therefore, KCS4 here might be the underlying gene of qOC D01.1 , which participates in the oil accumulation in cotton seed development.

Plant Materials
The BC 1 populations for genetic map construction were developed from the cross and backcross between donor parent G. darwinii accession 5-7 and recurrent parent CCRI35 in Chongqing, China, in 2013.G. darwinii accession 5-7 was provided by the Institute of Cotton Research of Chinese Academy of Agriculture Sciences.CCRI35 is a G. hirsutum cultivar with characteristics of high yield and disease resistance [29].After further backcross with recurrent parent CCRI35 and selfing, the CSSL (BC 3 F 2 ) population, including 553 individuals, was generated in Chongqing province, China, in 2017.Subsequently, the generations from BC 3 F 2:3 to BC 3 F 2:6 were planted from 2018 to 2021 in Chongqing province, China.The weather information for cotton cultivation from 2018 to 2021 is provided in Table S10.
The statistical analysis and the analysis of variance (ANOVA) were performed in Microsoft Excel (Office 2016), and the R-4.3.0 software were carried out for correlation analysis and visualization.

Genetic Map Construction and Collinearity Analysis
The primers for genetic map construction were selected from a high-density interspecific genetic map between G. hirsutum and G. barbadense [30].The primers were first screened for polymorphisms between CCRI35 and G. darwinii accession 5-7.Polymorphic markers were used to genotype the BC 1 population and further construct the genetic linkage map using JoinMap 4.0 [31].
The physical map was constructed according to the physical positions of polymorphic markers.The physical positions of all markers were obtained by aligning the primer sequences to G. hirsutum and G. darwinii reference genomes using BLAST + 2.15.0 soft-ware [24].Then, a python library JCVI was used to analyze and illustrate the relationship between genetic map and two physical maps [32].

Detection of Introgressive Chromosome Segments
Based on the genetic map constructed in this study, markers evenly distributed on the genetic map were selected to genotype the 553 individuals of CSSL population.The average interval between the two markers was approximately 10 centimorgans (cM).GGT2.0 software [33] was applied to analyze the characteristic of chromosomal introgressed segments (the background recovery rate of the CSSLs and the number and length of introgressed segments) with default parameters.4.5.Identification of Fiber-and Seed-Related QTLs MapQTL 6.0 was applied to map QTLs by Multiple-QTL model (MQM), with a threshold of LOD ≥ 2.0 [34].Positive additive effects indicated that G. darwinii contribute to the favorable alleles of QTLs, whereas negative additive effects indicated CCRI35 contribute to the favorable alleles.QTLs were named according to their trait and the order on the chromosome.The region with three or more QTLs was regarded as QTL cluster.

Functional Annotation of Candidate Genes
We determined 99% confidence intervals of a stable QTL or QTL cluster as candidate regions.According to the annotation of G. hirsutum [17] and G. darwinii [24] reference genome, the genes in the candidate regions were obtained.Genes expressed in fibers or seeds were selected for further analysis based on the RNA-seq data of CCRI35 [10,23].The expression profiles of the selected genes were analyzed using the R package mfuzz and visualized using the R-4.3.0 software.Gene Ontology annotations were performed on the Cotton Functional Genomics Database (CottonFGD) (https://cottonfgd.net/analyze/)[35].

Conclusions
In this study, 553 CSSLs with one or more segments of G. tomentosum were developed.Totals of 235 QTLs were identified for fiber-and seed-related traits.Of these, twenty-seven QTLs were detected in two or more environments, and the candidate genes for three of them were further identified.The results of this study provide a basis for exploring the molecular mechanism of fiber and seed development and marker-assisted genetic improvement in cotton.

Figure 1 .
Figure 1.The interspecific genetic map of the (CRRI35 × G. darwinii 5-7) BC 1 population.(A) The genetic map of (CRRI35 × G. darwinii) BC 1 population, (B) collinearity analysis between the genetic map and two associated physical maps of G. hirsutum and G. darwinii.

Figure 2 .
Figure 2. Genetic constitution and introgressive segments of the CSSLs.(A) Distribution of introgressed segments in the CSSLs on the 26 chromosomes.A and H represent homozygous and heterozygous chromosome segments from the donor parent G. darwinii 5-7, respectively; (B) represents homozygous chromosome segments from the recurrent parent CCRI35.(B-D) Genetic background recovery rate and the number and length of the introgressed segments in the CSSLs population.

Figure 5 .
Figure 5. QTLs detected on all chromosomes.I, II, III, and IV indicate that QTL were detected in one environment, two environments, three environments, and four environments, respectively.Figure 5. QTLs detected on all chromosomes.I, II, III, and IV indicate that QTL were detected in one environment, two environments, three environments, and four environments, respectively.

Figure 5 .
Figure 5. QTLs detected on all chromosomes.I, II, III, and IV indicate that QTL were detected in one environment, two environments, three environments, and four environments, respectively.Figure 5. QTLs detected on all chromosomes.I, II, III, and IV indicate that QTL were detected in one environment, two environments, three environments, and four environments, respectively.

Figure 6 .
Figure 6.Expression patterns of candidate genes in 16 QTL clusters in CCRI35.(A) Expression profile of candidate genes.The color represents the density of gene in this cluster.Green and red represent low and high density, respectively.(B) GO annotation of candidate genes associated with fiber initiation.(C) GO annotation of candidate genes associated with seed development and oil accumulation.

Figure 6 .
Figure 6.Expression patterns of candidate genes in 16 QTL clusters in CCRI35.(A) Expression profile of candidate genes.The color represents the density of gene in this cluster.Green and red represent low and high density, respectively.(B) GO annotation of candidate genes associated with fiber initiation.(C) GO annotation of candidate genes associated with seed development and oil accumulation.

Table 1 .
QTL clusters identified in the CSSLs across multiple environments.