Construction of a High-Density Genetic Map via Genome Resequencing and QTL Detection for Key Fiber Traits in Asiatic Cotton

Background: Asiatic cotton (Gossypium arboreum, genome A 2 ) is one of diploid cotton species producing spinnable bers. However, few studies on the genetic mechanism of key ber traits of Asiatic cotton have been reported. Sequencing technology advancement and the release of Asiatic cotton genome made it possible to construct a high-density SNP genetic map and further untapped QTL detection. Results: The Asiatic cotton cultivars SXY No.1 and CSLZ were crossed to develop a recombinant inbred line (RIL) population with 189 lines. Whole genome resequencing technology was employed to construct a high-density genetic map that covered 1980.17 cM with an average distance of 0.61 cM between adjacent markers. Based on ber quality and yield component trait data from three environments, a total of 177 QTL were identied for 8 key ber traits explaining 5.0-37.4% of the phenotypic variance. Besides, 48 stable QTL, including 15 for upper quartile length (UQL), 18 for ber neness (FF), 1 for immature ber content (IFC), 4 for ber neps count (FNC), 3 for lint percentage (LP), 7 for seed index (SI), were detected in more than one environment. Conclusions: Using a RIL population and whole genome resequencing strategy, this study presented a high-density genetic map of G. arboreum and identied 48 stable QTL for 6 key ber traits (UQL, FF, IFC, FNC, LP, SI). Our work laid solid foundation for subsequent ne mapping of QTL for key ber traits and cloning of controlling genes.


Background
Cotton is the most important ber crop in the world. Cultivated species of genus Gossypium include A-genome diploids G. herbaceum (A 1 ) and G. arboreum (A 2 ) and AD-genome allotetraploids G. hirsutum (AD 1 ) and G. barbadense (AD 2 ) which produce spinnable ber [1]. It was believed that the allotetraploids arose from a interspeci c hybridization between an A-genome diploid and D-genome taxon within the last 1-2 million years ago [1,2]. Presently, G. hirsutum is most widely planted with more than 95% worldwide production due to its super ber quality and high yield [3]. However, the diversity of G. hirsutum germplasm base is relatively narrow [4]. In contrast, the cultivation scale of diploid cotton is very limited, mainly in India, Pakistan and China [5], but they are extensively deployed by many cotton breeders for a series of genetic and biotechnological studies owing to diverse agronomic and ber properties.
Asiatic cotton (G. arboreum L.) derived in the Indian subcontinent and has been cultivated for 5000 years [6]. During long-term natural and arti cial selection, Asiatic cotton possesses many favorable traits that are not found in the allotetraploids cultivars. For example, adaptive characteristics like remarkable drought tolerance [7], enhanced resistance against insect pests like thrips [8], bollworms and aphids and diseases such as black arm, root rot and leaves reddening [9], complete immunity to cotton leaf curl virus [10]. Some of G. arboreum cultivars produce bers with high strength and seeds with high oil content and seed index [9]. Fiber related properties are generally polygenic traits. Compared to allotetraploids, diploid cotton may facilitate to study these characters for reducing gene duplication and genetic redundancy [11]. Future efforts could utilize favorable traits found in Asiatic cotton species as donor in introgressive improvement of allotetraploid cotton. And molecular markers are expected to accelerate breeding progress with which traits can be introgressed [12].
ILs. For diploid cottons, these researches in high-density genetic map and QTL mapping relatively delayed. Du et al. [16] resequenced 243 lines of G. arboretum and G. herbaceum and identi ed 98 signi cant peak associations for 11 agronomically traits combined GWAS analysis, but candidate genes for ber quality traits were not reported. Before this study, researchers were all employed traditional molecular markers, such as RFLP, AFLP, TRAP and SSR, to construct interspeci c and intraspeci c genetic map of Asiatic cotton and to map QTL [28][29][30]. These genetic maps cannot meet the demands of accurate QTL mapping due to their large distances between adjacent DNA markers. Therefore, a high-density genetic map within G. arboreum and corresponding QTL detection are currently in demand.
In this study, we developed an Asiatic cotton recombinant inbred line (RIL) population of 189 F 2:6 lines from a cross SXY No.1 ⋅ CSLZ. Whole genome resequencing was applied to genotype RILs, which was then used to construct the high-density genetic map and detect QTL of key ber traits. Our results will facilitate to ne map the QTLs for key ber traits in Asiatic cotton and subsequently to clone the controlling genes.

Fiber trait evaluation
Because there were missing data for some lines across three environments, we nally used trait data of 179 lines for phenotypic analysis and subsequent QTL mapping. Descriptive statistics for all key ber traits i.e. upper quartile length (UQL), ber neness (FF), maturity ratio (MR), immature ber content (IFC), ber neps count (FNC), ber neps mean size (FNMS), lint percentage (LP) and seed index (SI) across three environments were summarized in Additional le 9: gure S1 and Additional le 1: Table S1.
The absolute values of kurtosis and skewness for most traits were <1.0 in three environments, except for the kurtosis values for UQL-2018CQ, FNC-2019CQ, LP-2018CQ and LP-2018HN, suggesting that most traits were normally distributed. Besides, all eight traits of population showed transgressive segregation.
Correlation analysis was conducted among the eight traits (Additional le 2: Table S2). Most traits showed signi cant correlations with other traits except for UQL-FNMS, FF-LP, FNMS-LP. One-way ANOVA showed that most traits had signi cant genetic and environmental effects (p < 0.01) except for genetic effect of FNMS (Additional le 3: Table S3).
Whole-genome sequencing and SNP identi cation Sequencing data were generated from two parents and their RIL population. In total, 3808. 40 Table S5).

Genetic map construction
By genetic linkage analysis, a high-density genetic map containing 3286 bin markers was constructed, which covered 1980.17 cM with an average distance of 0.61 cM between adjacent bin markers ( Fig. 1; Table 1; Additional le 10: Supplement S1). 13 linkage groups were corresponding to 13 chromosomes respectively in this genetic map. The largest chromosome was Chr07, consisting of 392 bins covering 207.11 cM, with an average bin interval was 0.53 cM. The shortest chromosome was Chr02, containing 135 markers spanning 106.5 cM, with an average bin interval of 0.79 cM. Chr09 harbored the largest gap that was 12.62 cM. Besides, 99.41% of the intervals between adjacent bins were less than 5 cM, indicating that bin markers were welldistributed on the genome ( Fig. 1; Table 1).

Segregation distortion analysis
Among 3286 mapped loci, 446 (13.57%) showed segregation distortion (p < 0.05) ( Table 1). The segregation distortion markers (SDMs) were unevenly distributed over the genome, forming 26 segregation distorted regions (SDRs) (Fig. 1) LG linkage group, SDM segregation distorted marker, SDR segregation distorted region Collinearity between the genetic and the physical map In order to assess the quality of the genetic map, collinearity between genetic and physical map was conducted ( Figure 2). Most genetic loci were in accordance with their positions on the reference genome sequence of G. arboreum [16] except that 5 loci of LG08 corresponded to the 5 loci on Chr02 and 60 loci located on chrtig which were not mapped to chromosomes (Additional le 6: Table S6). Furthermore, 1980.17 cM corresponded to 1.46 GB, which covered 99.24% of the genome and all chromosomes showed more than 95% coverage (Additional le 7: Table S7).

QTL mapping
A total of 177 QTL for key ber traits were identi ed in this study (Additional le 8:

Seed index QTL
Twenty-seven seed index QTL were detected on 11 chromosomes, with LOD scores ranging from 2.04 to 7.59, and explaining 5.1-17.7% of the phenotypic variance (Additional le 8:

Discussion
High-density genetic map construction A high-density genetic map is very important for the detection of new QTL, gene mapping and maker assisted selection in cotton. In previous studies, traditional molecular markers, such as RFLP, AFLP, TRAP and SSR, were widely used in most interspeci c and intraspeci c genetic map construction of Asiatic cotton [28][29][30]. However, most of these genetic maps were not enough to provide ne mapping of QTL and identi cation of candidate genes due to low polymorphism rate of traditional markers. Consequently, SNP markers with high abundance and even distribution in genome are more suitable for constructing ultra-dense genetic map. In this study, we developed a high-density genetic map of intraspeci c Asiatic cotton, containing 3286 bin markers with the percentage of < 5 cM gaps more than 99%. Meanwhile, the collinearity between physical and genetic map was good except the ve markers on Chr02 were matched to LG08 (Additional le 6: Table S6). Taken together, the quality this genetic map was good and subsequent QTL mapping of ber traits was reliable.
In fact, an ultra-dense genetic linkage map is tremendously bene cial for assembling large and complex polyploidy genomes [32]. In the present study, we assembled 60 Block markers of 13 chrtigs into genetic map based on Illumina sequencing reads (Additional le 6: Table S6). These previously unanchored chrtigs were mapped on chromosome 02, 04, 05, 06, 07, 08, 11 and 12 respectively, which covered a total of 5.62 Mb. Eventually, the Asiatic cotton genome sequences will be updated, which will bene t the integrity of genome structure and detection of candidate genes for key agronomic traits.

Segregation distortion
Segregation distortion is common in most QTL mapping populations, and is common in cotton [33,34]. Genetic causes of this phenomenon can be hybrid incompatibility and no-functional gamete formation, which related to species, population types and crosses. And sometimes marker types also lead to distortion [35][36][37]. Segregation distortion markers may link to genes or traits of interest [38][39][40]. Hence, ltering out these distortion markers may result in loss of important gene information. The present study included SDMs (p<0.05) for linkage map construction. Besides, we found the proportion of SDMs was relatively low (13.57%) and these markers were mainly derived from SXY No.1 alleles (72.42%), which showed that SXY No.1 plays a signi cant role in segregation distortion. Among 48 stable QTL identi ed in this study, only con dence interval of qUQL01.4 and qIFC13.1 located on segregation distortion regions (Figure 1), suggesting that segregation distortion markers were not linked to the majority of stable QTL.

QTL identi cation
In the present study, there were signi cant differences for ber traits between parents and their progeny population showed transgressive segregation with huge differences, which indicated that both parents had diverse alleles for these ber traits.
Obviously, each of the parents contributed equally for ber quality traits (72: 64 chromosomes. Many QTL belonged to same QTL cluster shared an overlapping region, which suggested that these overlapping regions may contain pleiotropic genes or multiple genes related to different ber traits [41][42][43]. In this study, for example, Chr13cluster-3 contained qFF13.2, qFNC13.1 and qLP13.1, sharing a nearly same con dence interval (Table 3), which indicated candidate genes in this region might control all three traits. Thus, candidate genes of clustered QTL may participate in the complex genetic network of ber development via regulating multiple ber traits [20,44].

Stable and common QTL
Variance analysis in the present study showed that the environment had signi cant effect on eight key ber traits. Consequently, QTL which were detected more than one environment were regarded as stable QTL. Our results indicated that 48 of 177 detected QTL occurred in at least two environments, including 38 QTL for ber quality and 10 QTL for lint yield, most of which (72.9%) were within QTL clusters. These stable QTL and clusters may deserve high priorities for ne mapping and identi cation of candidate genes in the future.
In addition, we compared the stable QTL with QTL listed in the COTTONGEN [45], We employed sequences of anking markers to do blast alignment in upland cotton genome [26], then the physical position of result sequences were found on COTTONGEN.

DNA isolation, library construction and high throughput sequencing
Fresh young leaves of two parents and 189 RILs were collected for total genomic DNA extraction by a modi ed CTAB method [46]. The genomic DNA was randomly cleaved into 200-500 bp fragments by ultrasonic method, then DNA library was constructed by terminal repair, addition of polyA-tails and paired-end adaptors, puri cation and PCR ampli cation of these small DNA fragments. Eventually paired-end sequencing was performed with an Illumina HiSeqTM-2500 platform (Illumina, Inc., San Diego, CA, USA) at the Biomarker Technology Co. Ltd. (Beijing, China). In order to ensure the accuracy of data, raw reads were ltered to get clean reads for subsequent data analysis. The main steps of data ltering are as follows: removing the adaptors; removing reads with >10% unidenti ed nucleotides (N); and removing the low-quality reads with >50% bases of Q phred ≤10.

SNP detection
The clean reads were aligned to the Asian cotton reference genome [16] using BWA-MEM algorithm (version 0.7.15) [47] for genomic variation analysis. The SAMtools software (version 1.6) [48] was used to convert the sequence alignment les (SAM) containing mapping results into binary BAM les, and to sort, merge and count alignment e ciency of BAM les. Subsequently, the SNP detection of parents and RILs was implemented by GATK (Genome Analysis Toolkit) software (version 3.8) [49] including Indel realignment, base recalibration, variant calling and nal SNP ltration. The SNP variation results were displayed in VCF le format and input to SnpEff (SNP effect) software (version 1.9.6) [50] to annotate SNP variation. The parameters of the above software were all default. The polymorphic SNPs which were identi ed between the parents were used to genotype calling. Before genotype calling, unquali ed SNPs were ltered out according to following criteria: the markers of intergenic region; the markers that were homozygous and inconsistent in two patents; the read depth of parental markers was less than 4fold.

Genotyping and bin map construction
As for genotype calling, the missing genotypes were imputed using k-nearest neighbor algorithm [51]. A window size of 15 SNPs, with a step size of 1 SNP, was a sliding unit for chromosome scan. When the aa: bb allele ratio was 11:4 or higher in the sliding window, the segregation pattern was aa. When the aa: bb allele ratio was 4:11 or lower in the sliding window, the segregation pattern was bb. Other cases with intermediate ratios, ab pattern was used for genotype lling and correction. The SNPs whose genotypes were aa ´ bb were employed for genetic map construction.
Recombination breakpoint was assumed when the sliding window moves on the chromosome until genotype changes. The SNP alleles between the recombination breakpoints was designated as a recombination bin. Unquali ed bins were excluded if the length of bins was less than 20 kb, and bins showing signi cant deviation from the 1:1 were also ltered out through chi-square test (p<0.01). Bin markers were used to construct linkage groups with Highmap software [52]. The SMOOTH [53] was applied to correct errors according to genotypes of parental contribution. The Kosambi mapping function was used to estimate genetic map distances [54].
QTL analysis QTL identi cation was performed with Map QTL 6.0 [55]. A threshold of log of odds ratio (LOD) was set as 2.0 [56]. Positive additive effects indicated favorable alleles derived from SXY No.1 whereas negative additive effects indicated favorable alleles derived from CSLZ. QTL identi ed in two or three environments were considered to be potential stable QTL in this study.
MapChart 2.2 [57] was used for presentation of genetic map and QTL. The nomenclature of QTL was: q + trait abbreviation + chromosome number + QTL number. QTL for the same trait across different environments were considered in the same QTL region when their con dence intervals overlapped.
Segregation distorted regions (SDRs): the red labeled part of chromosomes  Sequencing data related to this study has been uploaded to NCBI SRA database, which can be accessed through series of SRA numbers PRJNA649393.

Competing interests
The authors declare that they have no competing interests.  The genetic map and stable QTL for key ber traits in the SXY No.1 x CSLZ RIL population