Construction of a Genetic Map by Resequencing and Speci c Locus Ampli ed Fragment sequencing (SLAF-seq) for QTL ne mapping of plant height in upland cotton (Gossypium hirsutum L.)


 Background: Gossypium hirsutum (upland cotton), the most widely cultivated cotton species in the world, is an important raw material for the textile industry. Using high-throughput sequencing to construct high-density genetic maps can be widely used in quantitative trait locus (QTL) mapping and molecular marker-assisted breeding.Results: In this study, an F2 population was used to construct a genetic map by parental resequencing and progeny SLAF-seq. The F2 population consisted of Gossypium hirsutum L. cultivars: CCRI 49 and 396289. The genetic map contained 4,607 single nucleotide polymorphisms markers, which overlapped a total length of 3,063.4 cM with an average genetic distance of 0.898 cM between adjacent markers. A high-density genetic map was used to map the QTLs of plant height in four environments. 16 QTLs were obtained, which could explain 2.07%–19.04% of the phenotypic variation. A total of 1,028 candidate genes were identified in the confidence interval and were categorized according to function through cluster of orthologous groups analysis, gene ntology analysis, and Kyoto Encyclopedia of Genes and Genomes analysis. Within the QTLs confidence interval, the D05 chromosome(ChrD05) was finely mapped using Mutmap-like strategy, and the reliability was validated by qRT-PCR of 18 candidate genes . Finally, we obtained 14 candidate genes that are most likely to be related to plant height.Conclusions: This study provides a successful application of parental sequencing and progeny SLAF-seq strategy in the genetic map of upland cotton in an F2 population. This study provides theoretical support for molecular marker-assisted breeding and plant height-heterosis in upland cotton and provides a fine mapping scheme for QTLs. In addition, the combination of multiple analytical methods also provided a solution for QTL fine mapping.


Background
Upland cotton (Gossypium hirsutum L.) is the largest cultivated species. It is characterized by high yield and good ber quality, thus is an important raw material in the textile industry and also closely related to people's livelihoods (Dass et al. 2016;Chen et al. 2007;Li et al. 2013). Higher labor costs promote the continuous improvement of the mechanization of cotton; therefore, there is a demand for breeding "mechanized picking cotton". Plant type is an important index to evaluate whether cotton is suitable for mechanical harvesting, and plant height is an important component of plant type. For example, if plants are too short, the spacing between cotton bolls is smaller, affecting the cotton picking rate; if plants are too tall, bolls ripen late and the vigorous vegetative growth is not conducive to cotton bolting (Qi et al. 2017). One way to breed plants with desirable traits is through molecular marker technology, which is a strategy used to construct genetic linkage maps (Luo et al. 2001;Chen and Zhang 2009). Based on the construction of genetic linkage maps, functional genes can be identi ed by locating the quantitative trait loci (QTLs) of the target trait and preparing for molecular polymerization breeding (Zhang et al. 2016c). Through molecular marker-assisted selection (MAS), the genotypes of target traits can be screened directly to improve the breeding e ciency and speed up the breeding process (Guo et al. 2005).
With the progress of high-throughput sequencing technology and the reduction of sequencing cost, the genome sequences of Raymond cotton (Gossypium Raimondii) , Asian cotton (Gossypium arboreum) (Li et al. 2014), and upland cotton (Gossypium hirsutum) (Li et al. 2015;Zhang et al. 2015b) have been published. To identify a large number of polymorphic markers, genotyping and construction of high-density genetic mapping and other detailed sequencing work (Davey et al. 2011) should be conducted. Based on the availability of advanced recombinant inbred lines (RIL) or F 2 populations, genotyping has been used to construct high-density genetic maps (Huang et al. 2009;Xu et al. 2013;Wang et al. 2015b;Mun et al. 2015). Re-seq is a technique that can be used to nd a large number of genetic markers. In addition, digestion-based simpli ed genome sequencing has also been used, including restriction site associated DNA sequencing (RAD-seq) (Baird et al. 2008), genotyping-by-sequencing (GBS-seq) (Elshire et al. 2011), and speci c-locus ampli ed fragment sequencing (SLAF-seq) (Sun et al. 2013). SLAF-seq has the advantage of avoids repetitive genome sequences and highly accurate, low cost, and rapid (Sun et al. 2013), and widely used in the construction of high-density genetic maps and has been applied in many species. Jiang et al. (Jiang et al. 2015) constructed a high-density genetic map of wax gourd (Benincasa hispida(Thunb.) Cogn) by using SLAFseq, the map spanned 2,172.86 cM with 4,607 polymorphic markers and an average distance between adjacent markers for 0.49 cM. Mao et al. (Mao et al. 2015) constructed a high-density genetic map of rice (Oryza ru pogon) by using SLAF-seq, the total length of map was 1537.1 cM with 4740 SLAFs and the average distance between adjacent markers was 0.32 cM. Zhang et al.(Zhang et al. 2016a) constructed a high-density genetic map of soybean (Glycine max (L.) Merrill), the map, spanning 3020.59 cM in length, contained 6159 markers and with an average distance of 0.49 cM between adjacent markers.
In this study, an F 2 population was obtained by crossing cultivars CCRI 49 and 396289. We attempted to use this population to construct high-density genetic maps and re ne the mapping of plant height QTLs by using linkage analysis and correlation analysis. Hoping to nd candidate genes related to plant height. This study will enrich the genetic map of upland cotton, will provide theoretical support for molecular marker-assisted breeding, and will provide a successful application of the parental resequencing and progeny SLAF-seq strategy in an F 2 population of upland cotton. At the same time, the combination of multiple analytical methods will provide a solution for ne QTL mapping.

Methods
Plant material and phenotypic data analysis An F 2 population was obtained from an intra-speci c cross between two cultivars (lines): CCRI 49 and 396289. CCRI 49 is a cultivar with shorter, excellent ber quality, while cultivar 396289 has the characteristics of higher, stress resistance and good adaptability, and 396289 is signi cantly higher than CCRI 49 on plant height. In the summer of 2015, CCRI 49 was crossed with 396289 to get F 1 seeds at Anyang, Henan Province. The F 1 seeds were plant during winter in Sanya, Hainan Province,and selfpollinated to produce F 2 seeds. The F 2 seeds were plant and self-pollinated to get F 2:3 seeds in 2016. 160 F 2:3 families were selected and planted in three environments in 2017. A survey of phenotypic data was conducted in late September at 2016 and 2017. The plant height of the F 2 population was taken as the measured value, while the plant height of the F 2:3 pedigree was taken as the average value of the plant line. In 2016, the F 2 population was planted in Anyang, Henan Province, China. In 2017, an F 2:3 population was planted in three locations: Anyang, Henan Province; Anqing, Anhui Province, China; and Alaer, Xinjiang Autonomous Region, China. All four environments were hereinafter referred to as 16AY, 17AY, 17AQ, 17ALE, respectively. In 16AY, plants were grown in single-row plots, with 0.8-m row distances and 7-m row lengths. In 17AY, plants were grown in with 0.8-m row distances and 3-m row lengths. In 17AQ, plants were grown in 1.2-m row distances and 5-m row lengths. In 17ALE, plants were grown in 0.6-m row distances and 5-m row lengths. One-way analysis of variance (ANOVA) was used to test the signi cance of the height of the two parents. Descriptive statistics of the F 2 population and F 2:3 population were performed using the program Microsoft EXCEL 2010.

DNA Extractions
Young leaves of parents and the F 2 populations were sampled in July 2016 and stored at -80 °C. Genomic DNA was extracted following the CTAB method (Paterson et al. 1993). Agarose gel electrophoresis (1%) was used to check DNA integrity and a Nanodrop 2000 UV-vis spectrophotometer (Thermo Fisher Scienti c, Waltham, MA, USA) was used to measure DNA purity.

Library Construction And High Throughput Sequencing
Paired-end (PE) and paired Solexa libraries were prepared according to the sequencing requirements of the resequencing platform (Illumina, Hayward, CA, USA). Genomic DNA from the CCRI 49 and 396289 cultivars were randomly interrupted. The acquisition and puri cation of DNA fragments, as well as the speci c process of DNA library construction was conducted following the methods described by Zhang et al. (Zhang et al. 2016b). The resulting DNA library was Solexa sequenced using the Illumina PE150 (Illumina) platform at Beijing Biomarker Technologies Co., Ltd. (Beijing, China; http://Biomarker.com.cn/). Low-quality reads (< 30), reads with adaptor sequences, and duplicate reads were ltered out, which was done using the Burrows-Wheeler Alignment (BWA) software (Zhang et al. 2016b;Li and Durbin 2010). SNPs were detected using the GATK software (Xu et al. 2012) toolkit. To ensure the accuracy of SNP detection based on the results of clean reads of the alignment of upland cotton reference genomes (http://mascotton.njau.edu.cn/info/1054/1118.htm), picard was used for duplication, and GATK for local realignment and base recalibration. After a single nucleotide polymorphism (SNP) was detected using GATK, it was ltered and the nal SNP site was obtained. We followed the best practice methods described by the o cial GATK website (https://www.broadinstitute.org/gatk/guide/best-practices.php).
One of the advantages of SLAF sequencing is that it is possible to use the bioinformatics method to mimic the enzymatic cleavage process according to published genomic sequences to obtain a suitable enzymatic cleavage protocol (Sun et al. 2013;Zhang et al. 2015a). For the upland cotton F 2 population, two enzymes (RsaI and HaeIII, NEB, USA) were used to digest the genomic DNA, the length of the digested fragment was 314 bp to 344 bp. The details of the SLAF-seq strategy was described by Sun et al. (Sun et al. 2013) and Zhang et al. (Zhang et al. 2015a). Pair-end sequencing was conducted using the Illumina platform at the Beijing Biomarker Technologies Corporation.

SNP Grouping And Genotyping
Speci c procedures for the identi cation and genotyping of progeny SLAF markers are as follows. In short, the low-quality reads were ltered out (quality score < 30e), and the remaining reads were classi ed into the progeny by diploid barcode sequencing. Each high-quality read was then trimmed at the 5-bp terminal location. Then, 150 bp pair-end clean reads were obtained from the same sample, and the BWA software (Li and Durbin 2010) was used to map the read on the published genome of G. hirsutum . Sequences with greater than 95% similarity were de ned as one SLAF locus (Zhang et al. 2015a). Alleles of each SLAF were de ned using the minor allele frequency (MAF) assessment (Mei et al. 2017). Upland cotton is a tetraploid species; therefore, it can only have a maximum of eight alleles at one locus. If the SLAF group contained more than eight alleles or only one allele, it was considered repetitive or not polymorphic and ltered out. Therefore, SLAFs with 2-8 alleles were identi ed as polymorphic and divided into eight separate modes as following: ab × cd, ef × eg, hk × hk, lm × ll, nn × np, aa × bb, ab × cc, and cc × ab. The F2 population used here was derived from two homozygous inbred lines, so only the SLAF tags that had segregation patterns of aa × bb were used for subsequent analysis.

Genetic Map Construction And Evaluation
Genetic maps were constructed using Highmap mapping software (Liu et al. 2014) and referenced to the complete upland cotton genome ). Highmap mapping software was used to conduct a two-point chain analysis and calculate the reorganization rate and LOD score between two markers. A cluster analysis was performed using the shortest distance method, and all of the markers were divided into multiple linkage groups. Using linkage groups as a unit, the two-step mapping method (Zhang et al. 2015a) based on an optimized high density map was used to obtain the linear arrangement of markers in the linkage group, and the genetic distance between adjacent markers was estimated and nally mapped onto a genetic map. Segregation distortion markers (SDM) are common, and some SDMs (χ2 test, p < 0.001) can be selected for genetic map construction. Genetic map construction followed the methods described by Zhang et al.(Zhang et al. 2015a) and Zhang et al. (Zhang et al. 2016c). In order to construct the genetic map accurately, a haplotype map, heat map, and collinearity were used to assess the quality of the genetic map (Zhang et al. 2016c;Liu et al. 2017a).

QTL Mapping And Mutmap-like Analysis
The R/QTL package (Broman and Sen 2009) was used to identify QTLs for plant height together with the F 2 linkage map with F 2 and F 2:3 phenotypic data from four environmental treatments (16AY, 17AY, 17AQ, and 17ALE). Plant height QTL mapping was conducted using composite interval mapping (CIM) (Zeng 1994). The permutation test (PT) signi cance threshold was set to 1,000 times and the positions were kept in the default mode. QTLs were named as described by Sun et al. (Sun et al. 2012).
According to the standard described by Stuber et al. (Stuber et al. 1987), the QTL mode of action was determined. When the D/A absolute value was less than 0.20, it was an additive effect; 0.21-0.80 was a partially dominant effect, 0.81-1.20 was a dominant effect; and > 1.20 was an overdominance effect.
To further the ner mapping of the QTLs controlling plant height, a Mutmap-like strategy was used to identify SLAF-tagged polymorphisms. Mutmap was exploited on the basis of bulked segregation analysis (BSA) and the contribution of the allele to the mutant phenotype was determined by calculating the parameters (SNP-index) (Takagi et al. 2015). According to the Mutmap-like strategy, sequencing data were grouped and analyzed. On the basis of the segregation of plant height of the F 2:3 population, individuals with extreme plant height were selected from the F 2 population. To get high quality SLAF markers for each individual, these markers were used to build two pools (the tallest pool and shortest pool) according to plant height. By calculating the Δ (SNP-index) value of each marker, the QTLs of the plant height were further speci ed. The details of the Mutmap-like strategy are given in Zhang et al. (Zhang et al. 2015a).

Screening And Annotation Of Candidate Genes
After con rming the identity of the QTLs, the sequence information from the parents in the QTL region was extracted and the open reading frame (ORF) was predicted to obtain the gene information of the candidate region. A bioinformatics analysis was conducted to determine the non-synonymous mutation gene between parents as the nal candidate gene. Using BLASTp (Camacho et al. 2009) and BLAST2GO (A et al. 2005), the data was aligned to the previously published sequence of the upland cotton genome , and annotation using multiple databases was performed on the coding genes in the con dence intervals. Candidate gene products were orthologally classi ed using the Cluster of Orthologous Groups of proteins (COG) database. Using the Gene Ontology consortium (GO) database, candidate genes were enriched and analyzed from the aspect of biological processes, molecular functions, and cellular components. Pathway enrichment analysis of candidate genes was conducted using Kyoto Encyclopedia of Genes and Genomes (KEGG) database.

Real-time Fluorescence Quantitative PCR(qrt-pcr)
Total RNA from the cultivars CCRI49 and 396289 was extracted using an RNAprep Pure Plant Kit (Polysaccharides & Polyphenolics-rich, Tiangen, China). Total RNA was reverse transcribed using PrimerScript™ RT reagent Kit (Perfect Real Time, TaKaRa, Japan) to obtain cDNA for qRT-PCR. Real-time PCR was performed using an UltraSYBR Mixture (Low ROX, CWbio, China) according to the manufacturer's protocols. Speci c primers for these candidate genes were designed using the Primer-BLAST NCBI online tool (Table S1). Real-time uorescence quantitative PCR (qRT-PCR) was performed on an CFX96TM PCR instrument (Bio-Rad, USA). Three biological and technical replicates were performed in the qRT-PCR test to validate the results.
Relative gene expression levels were quanti ed by the 2 −ΔΔCt method (KJ and TD 2001).

Statistics of plant height of parents and offspring
Results from the ANOVA showed that the P-value was 0.0062, indicating that there was a signi cant difference in plant height among the parents. The descriptive statistics of the F 2 population and F 2:3 pedigree and their parents in the four tested environments are shown in Table 1. The skewness of plant height in all four environments was < 1, indicating a nearly normal distribution of the phenotype.  Table 2). The reads from the 160 offspring ranged from 2.71 M to 9.81 M, and the average read was 4.94 M (Table S2). The average sequencing depths from CCRI49, 396289, and the offspring were 30.10×, 27.54×, and 16.93×, respectively ( Table 2). The number of SNPs obtained from CCRI49, 396289, and the offspring were 1,619,902, 1,613,499, and 105,168, respectively. The developed parents and offspring SNPs were compared, the SNPs that had no polymorphism in the progeny and the SNPs that lack more and show partial segregation in the population were ltered out, with the remaining 2,207,531 SNPs used for genotyping. According to the gene coding rules (Mei et al. 2017), 1,103,837 of the 2,207,531 SNPs were successfully coded and divided into ve separation modes (ef × eg, lm × ll, nn × np, aa × bb, and hk × hk) (Fig. 1). Since the parents (CCRI49 and 396289) were homozygous inbred lines, only 620,378 SNPs (aa × bb) were used to construct the genetic map.

Construction Of The Genetic Map With SNPs
In order to ensure the quality of the genetic map and make the markers in the linkage group as uniform as possible, the polymorphic SNP markers were ltered out according to the following rules: (1) the markers with a depth of less than 4 × were omitted; (2) the selected SNPs cover at least 50% of offspring individuals; (3) segregation distortion markers (P < 0.001) were omitted. Therefore, a total of 5,182 SNPs were used for mapping. These SNPs were divided into 26 linkage groups by comparison with the reference genome , and each chromosome was a linkage group. The admixture LOD(MLOD) values between every two markers were calculated. The markers with MLOD values lower than three were ltered out, and 4,607 upper mapping SNPs were obtained (Table S3). The upper mapped rate was 88.90%. A genetic map with a total distance of 3,063.4 cM was obtained, and the average genetic distance was 0.898 cM between adjacent markers (Fig. 2, Table 3). The number of markers on each chromosome ranged from 22 (ChrA06) to 650 (ChrD06), the genetic distance ranged from 25.31 cM (ChrD05) to 192.16 cM (ChrA13), and the average genetic distance ranged from 0.18 cM (ChrD06) to 1.66 cM (ChrA06). The largest gap on the map was 18.85 cM, located on ChrA09. The proportion of the gap was < 5 cM, which was more than 90%, except the gap in ChrA06 (80.95%), while ChrD05 and ChrD11 had no gap more than 5 cM (Table 3).

Quality Assessment Of The Genetic Map
Segregation distortion markers are common and can affect map construction and QTL mapping (Zhang et al. 2016c). Among the 4,607 mapped SNPs, ve of them were segregation distortion markers (P < 0.001). The segregation distortion markers are located on ChrA09 and ChrA10, and the numbers of the two chromosomes were three and two, respectively (Table 3).
A haplotype map can be used to display recombination events and missing data for each individual (Liu et al. 2017a). In this study, haplotype maps were obtained from each of the 160 offspring using 4,607 SNPs (Fig. S1). From these haplotype maps, the percent of double crossover sites was less than 0.30%, of which the maximum was only 0.28% at ChrD11, and for ChrA03, ChrA06, and ChrD08 there were no double crossover sites. There was no missing data from the 26 chromosomes (Table 3). This indicates that the source of the larger segments of each individual in the progeny is consistent and the genetic map is of high quality.
reorganization analysis between the markers (Liu et al. 2017a). The heat map produced by this study illustrates that the linkage between the adjacent markers on each linkage group is very strong, and with the increase of distance, the linkage between the markers gradually weakened (Fig. S2). This indicates that the sequence of the markers is correct.
Collinearity analysis was performed by locating the genomic position and genetic map of the mapped markers. Spearman correlation coe cients of each linkage group and physical map were calculated and this data is shown in Table 3. The closer the Spearman coe cient is to 1, the better the collinearity between the map and the physical map (Table 3 and Fig. 3). Most of the linkage groups are consistent with the genome, indicating that the collinearity is good and the calculation accuracy of the genetic recombination rate is high. progeny. The genetic distance of the 16 QTLs ranged from 0 to 3.13 cM, while the physical distance of the alignment to the reference genome ranged from 0 to 7.72 Mb. The 16 QTLs explained 2.07-19.04% of the phenotypic variation. Of the 16 QTLs, three QTLs, qPH-D05-1, qPH-D05-2, and qPH-D06-1, explained more than 10% of the phenotypic variation. Their LOD scores were all greater than ve, 7.50, 5.37 and 5.13, respectively. Since the additive effects of qPH-D01-1, qPH-A01-1, and qPH-D09-1 are negative, they have a negative additive effect, and their favorable alleles are derived from the male parent, CCRI49, which reduced plant height. The other QTLs had positive additive effects, with favorable alleles from the female parent, cultivar 396289, which had the effect of increasing plant height. qPH-A03-1, qPH-D01-1, qPH-A01-1, qPH-D04-1, qPH-D06-1, and qPH-D09-1 all had a negative dominant effect, and all other QTLs had a positive effect. The negative dominance effect of qPH-D01-1 was the highest, at -7.62, while the positive dominance effect of qPH-D07-1 was 4.71. The effect of the QTL was judged according to the ratio of dominant effect to additive effect: qPH-D04-1 had a negative super-dominant effect, qPH-A03-1 and qPH-D06-1 had a negative partial dominant effect, qPH-D05-1 and qPH-D05-2 had a partially positive dominant effect, qPH-A13-4 and qPH-A13-5 had a positive dominant effect, while the rest of the QTLs had a positive super-dominant effect ( Table 4). Env Envirment, ADD Additive effect, DOM Dominance effect, S(D/A) Degree of dominance, PV Phenotypic variation Table 4 Details of QTLs affecting the plant height of upland cotton.
To further re ne the map of QTLs controlling plant height, a Mutmap-like strategy was used to identify SLAF-tagged polymorphisms. Mutmap was developed on the basis of bulked separation analysis (BSA) to determine the contribution of alleles to the tall mutant by calculating parameters (SNP-index) (Takagi et al. 2015). Here, we considered plant height a mutant trait by calculating the Δ (SLAF-index) for each SLAF marker on the genome. The gure shows the relationship between Δ (SLAF index) and the SLAF position (Fig. 4). Two signi cant peaks were screened while the Δ (SLAF index) threshold was set to 0.6, and the QTL was located on ChrA07 and ChrD05, respectively (Table S4). A signi cant peak was observed on ChrD05 with six markers in the region (marker 561423, marker 561425, marker 562264, marker 562265, marker 562458, and marker 562459). This was considered a tight linkage marker for plant height, ranging from 0.312 cM to 0.312 cM on ChrD05 (Fig. 5A). These results strongly suggest that the major gene controlling plant height is on ChrD05. Based on the QTL mapping results, we narrowed the QTL controlling plant height to a fragment of essentially 0, and its physical distance on the genome was shortened from 7.72 Mb to 0.72 Mb. In addition, using the Mutmap-like strategy, the region not mapped by the QTL was also obtained. On ChrA07, the genome has a physical distance of 0.1 Mb in this region.

Functional Annotation Of Candidate Genes
In total, 1,028 candidate genes were obtained in the con dence intervals of the QTLs for plant height. The number of candidate genes in the QTL con dence intervals ranged from 1 (qPH-D06-1) to 839 (qPH-D05-1), except for the seven QTLs (qPH-D01-1, qPH-D01-2, qPH-D07-1, qPH-D04-1, qPH-A01-1, and qPH-A13-4) that have no candidate genes in the con dence intervals. We annotated 1,012 out of 1,028 candidate genes. In the COG classi cation, a total of 365 genes were annotated, of which 105 genes were involved in general function prediction only (R), 67 genes were involved in transcription (K), 62 genes were involved in replication, recombination and repair (L), and 62 genes were involved in signal transduction mechanisms (T). Among these genes, 45 genes have four functions of RTKL at the same time, and seven genes have four functions of GEPR (Table S5). In addition, no genes involved in cell motility (N), extracellular structures (W), and nuclear structures were found (Y) (Fig. 6A). In the GO analysis, a total of 628 genes were annotated; 1,270 genes were identi ed as cellular components, 765 genes were identi ed as molecular functions, 1,739 genes were identi ed as biological processes, and some had multiple functions and were included into two or three categories. In the cellular component classi cation, there are 285 genes related to the cell, 271 genes related to cell parts, and 203 genes related to organelles. In the molecular function classi cation, 317 genes are related to catalytic activity, 287 genes are related to binding, and 71 genes are related to transporter activity. In the biological processes classi cation, 421 genes are related to metabolic processes, 377 genes are related to cellular processes, and 288 genes are related to singleorganism processes (Fig. 6B, Table S6). In the KEGG analysis, 214 genes were identi ed in 97 pathways. Among them, 14 genes were related to plant hormone signal transduction pathways, 12 genes were related to protein processing in the endoplasmic reticulum pathway, and 12 genes were related to ribosome pathways (Table S7).
A total of 128 candidate genes were obtained by Mutmap-like correlation analysis, including 122 on Chr.D05 and six on Chr.A07 (Table S4, Table S8). 120 of 122 candidate genes were annotated. In comparison with the mapping results of the QTLs of the genetic map, the results were mapped to 18 candidate genes that may be related to plant height at 0.312 cM on ChrD05 (Fig. 5B). Among the six candidate genes obtained from ChrA07, three candidate genes (Gh_A07G1465, Gh_A07G1466, and Gh_A07G1468) were annotated in the COG classi cation, which were respectively assigned to coenzyme transport and metabolism [H], intracellular tra cking, and secretion. Among the three categories of GO classi cation, ve candidate genes (Gh_A07G1464, Gh_A07G1465, Gh_A07G1466, Gh_A07G1467, and Gh_A07G1468) were annotated in the GO classi cation, of which Gh_A07G1467 was annotated as having cellular component functions. Among the three categories of vesicular transport [U] and general function prediction only [R] Gh_A07G1464 and Gh_A07G1466 are mainly classi ed into two groups: cell component and biological process; Gh_A07G1465 and Gh_A07G1468 have three kinds of cell components, biological processes, and molecular functions. In the KEGG pathway analysis, Gh_A07G1465 is involved in the pathway of pantothenate and CoA biosynthesis and beta-alanine metabolism, Gh_A07G1466 is involved in SNARE interactions in the vesicular transport pathway, and Gh_A07G1468 is involved in the development of eukaryotes.
3.7 Identi cation of candidate genes related to plant height According to the results of QTL ne mapping, 18 candidate genes were subjected to qRT-PCR (Fig. 7). The results of qRT-PCR analysis showed that the expression level of candidate genes Gh-D05G1321 and Gh-D05G1335 in cultivar 396289 was signi cantly lower than their expression in cultivar CCRI49. However, the expression levels of Gh-D05G1315, Gh-D05G1334, Gh-D05G1339, and Gh-D05G1342 in cultivar 396289 and CCRI49 were not signi cantly different. The expression levels of the 12 remaining candidate genes in cultivar 396289 were higher than their expression levels in CCRI49, especially Gh-D05G1297, Gh-D05G1310, and Gh-D05G1351, where the expression level in 396289 was more than twice that of CCRI49.

Discussion
Characteristics of Re-seq and SLAF-seq strategy for large-scale marker development With the vast amount of plant genomic data published, whole genome resequencing and simpli ed genome sequencing techniques have been applied widely in different species (Causse et al. 2013;Takagi et al. 2015;Liang et al. 2016). Sequence variation of SNPs is commonly used in plant genetic analyss due to its richness and universality in the genome (Wang et al. 2015b). High-throughput next-generation sequencing (NGS) technology enables rapid and cost-effective access to massive amounts of SNP markers. A large number of SNPs can be detected comprehensively and accurately using genome-wide sequencing technology, but the sequencing cost is higher. Simpli ed genome sequencing can reduce the complexity of the genome by using the enzymatic capture technique obtaining the target data and can greatly reduce the experimental cost, but the number of labeled SNPs obtained is far fewer than re-sequencing techniques. Compared with traditional simpli ed genome sequencing, SLAF-seq has the ability to develop personalized enzyme digestion schemes based on different reference genomes, and simultaneously exploits dual-index sequencing to develop high-quality markers and dynamic suspicious markers to obtain the ltering process (Zhang et al. 2016c;Sun et al. 2013;Liu et al. 2014). Zhang et al. (Zhang et al. 2016c) developed the rst high-density genetic map of upland cotton and sequenced the RIL population using SLAF-seq, obtaining 443.65 Mb of clean reads. Qi et al. [6] used the GBS-seq method to obtain a total of 908 Mb of clean reads for an F 2 population of upland cotton. In this study, a total of 1,448 Mb clean reads were obtained for an F 2 population of upland cotton using a combination of resequencing and SLAF sequencing, demonstrating the robust data development capabilities of this strategy. This economical and viable method of obtaining mass data is the basis for developing a large number of polymorphic markers, ensuring the accuracy of genetic map construction. This is also the rst successful application of this strategy in an F 2 population of upland cotton.

Characteristics Of Genetic Map Construction
Compared with traditional RFLP, RAPD, AFLP, and SSR markers, more SNP polymorphism markers can be obtained by using high-throughput sequencing. Sun et al. (Sun et al. 2012) constructed a genetic map of upland cotton using SSR markers, including 34 linkage groups with an average distance of 3.69 cM and a genetic linkage map of 700.9 cM. Because of the narrowed genetic background about G. hirsutum (Rungis et al. 2005), SSR markers are still commonly used to construct population maps. However, the number of markers obtained by SSR markers is still not ideal. Chen et al. (Chen et al. 2015) detected 2,763 markers in 26 linkage groups (chromosomes) in an F 2 population (G. hirsutum × G. darwinii), covering the genome length of 4,176.7 cM and the average distance between genes was 1.5 cM. In the present study, 4,607 upland cotton genotypes were obtained using high-throughput sequencing. The total genetic distance was 3,063 cM, with an average distance of 0.898 cM. Compared with the previously produced genetic maps Sun et al. 2012), our study obtained a larger number of markers and the genetic distance of the genetic map was ner. Producing genetic maps from high-throughput sequencing data is superior to other techniques.

QTL Mapping For Plant Height
Previous studies on QTL mapping of upland cotton have focused on ber quality traits (Sun et al. 2012;Zhang et al. 2009;Tang et al. 2015;Nie et al. 2016) and yield traits (Wang et al. 2015a;Cong et al. 2016;Liu et al. 2017b), with little focus on morphological traits. Plant height is an important morphological trait that effects the economic output of cotton as well as the mechanization of cotton harvesting. Eighty-nine QTLs related to the plant height of upland cotton can be found in the QTL database (http://www2.cottonqtldb.org:8081/index), and 75 QTLs were obtained after removing 14 repetitive QTLs. These QTLs are distributed on 26 chromosomes except Chr04, Chr12, and Chr18. The LOD values of these QTLs ranged from 1.41 to 18.61, accounting for 0.05-73.9% of the phenotypic variation. The con dence intervals of four of the QTLs could not be obtained, but the remaining QTLs had con dence intervals of 0-48 cM. In addition to the QTL database, Qi et al. (Qi et al. 2017) detected seven QTLs related to plants in a single environment using an F 2:3 family of upland cotton; one QTLs was located on Chr05, and the others were located on Chr25. These QTLs had a con dence interval of 0.7-15 cM and explained 3.49-21.51% of the phenotypic variation. Li et al. (Li et al. 2017) detected ve QTLs related to plant height in a single environment in an F 2 population of upland cotton. Those QTLs were distributed on four chromosomes, ChrA01, ChrA08, ChrD06, and ChrD08, which can explain 4.92-10.75% of the phenotypic variation. In the present study, 16 QTLs were detected in a total of four environments from F 2 and F 2:3 populations, explaining 2.07-19.04% of the phenotypic variation. In addition, QTLs related to plant height were isolated to ChrD05. This result lled gaps in the QTL database and added to the work previously published by Qi et al.(Qi et al. 2017) and Li et al.(Li et al. 2017). The QTLs in this study have a con dence interval of 0-3.13 cM, which greatly reduces the con dence intervals of QTLs compared with previous studies and is conducive to follow-on functional veri cation and molecular marker-assisted breeding. Because of heterosis studies are rarely involved in the construction of genetic maps and QTL mapping using high-throughput sequencing, this study provided theoretical support creatively for the study of heterosis by studying the additive effect, dominant effect, and degree of dominance of QTLs. In addition, using the Mutmap-like strategy, plant height-related fragments were obtained at ChrA07 and ChrD05. Through the combination of QTL mapping with a genetic map, we obtained a ne mapping of ChrD05 and identi ed nine candidate genes related to plant height. Compared with previous studies (Qi et al. 2017;Li et al. 2017;Zhang et al. 2016c), there are some innovations in the analysis methods, which provide a case for the ne mapping of QTLs.
Functional Annotation Analysis Of Candidate Genes 714 candidate genes were annotated using at least one of the three databases (COG, GO, and KEGG), and some genes may be related to plant height. In Arabidopsis (Wang and Li 2008) and rice (Sun et al. 2010), the biosynthesis and signaling pathways of some hormones (e.g., auxin and brassinolide) regulate plant height. In this study, the COG analysis found more genes involved in general function prediction only [R], transcription [K], replication, recombination and repair [L], signal transduction mechanisms [T], and posttranslational modi cation, protein turnover, and chaperones [O], which are related to hormone biosynthesis and signal transduction pathways, and may affect the height of cotton. Qi et al. (Qi et al. 2017) found that cytochrome P450 is involved in plant height, and in this study we found that 11 genes (Gh_D05G0646, Gh_D05G0686, Gh_D05G0830, Gh_D05G0879, Gh_D05G0951, Gh_D05G1038, Gh_D05G1098, Gh_D05G1281, Gh_A13G0977, Gh_A13G0992, and Gh_D09G0920) were associated with cytochrome P450. In the GO analysis, more genes were involved in auxin and brassinosteroid related processes, such as Gh_D05G0636, Gh_D05G0661, Gh_D05G1056, and Gh_D05G1254, which are involved in the response to auxin (GO: 0009733). Speci cally, Gh_D05G0636 is involved in the auxin metabolic process (GO: 0009850), Gh_D05G1123, and Gh_D05G1357 are associated with auxin homeostasis (GO: 0010252), Gh_D05G0606 is associated with basipetal auxin transport (GO: 0010540), Gh_D05G0837, Gh_D05G0755, Gh_D05G1216, and Gh_D05G0765 are related to the auxin-activated signaling pathway (GO: 0009734), and Gh_D09G0919 is related to auxin polar transport (GO: 0009926). Gh_D05G1281 and Gh_D05G0615 participated in the process of brassinosteroid biosynthetic processes (GO: 0016132) and Gh_D05G1055 is involved in response to brassinosteroids (GO: 0009741). In addition, Silvia et al. (Fornalé et al. 2010) found that lignin impacts the composition of cell polysaccharides and the composition of the cell wall, thereby affecting the height of maize. In the present study, the lignin biosynthetic process (GO: 0009809) and the lignin catabolic process (GO: 0046274) were Gh_D05G1123 and Gh_D05G0888, respectively. These genes may be related to the plant height of cotton. In the KEGG pathway analysis, the three pathways that contain the most genes are hormonal signal transduction, the processing of proteins in the endoplasmic reticulum, and ribosomal pathways. The results of this study are similar to the results described in Zhang et al. (Zhang et al. 2016c), which suggested that ribosomes are related to protein synthesis and plant hormones, which can in turn affect plant height.
The Mutmap-like correlation analysis was used to ne map the QTLs of the candidate region on ChrD05, and 18 candidate genes related to plant height were obtained. The results of qRT-PCR showed that there was no signi cant difference in the relative expression level of four genes between the two parents, and the relative expression levels of 14 candidate genes were signi cantly different between the two parents. In future studies, these 14 genes can be used for functional veri cation.

De ciencies In The Genetic Map
The strategy of parental re-sequencing can help nd more polymorphic sites and contribute to the construction of high-density genetic maps. However, according to the results of genetic map construction in this study, chromosomes A06, D03, D04, and D05 were found to be short in length. By examining the physical length of each chromosome in the G. hirsutum genome ) and counting the SNP markers of each chromosome during development of the marker [Table S9], we found that: 1) the genome length of chromosomes D03 and D04 was shorter than the other chromosomes; 2) parents had fewer polymorphic SNPs, polymorphisms, and homozygous SNPs in the A06, D03, and D04 chromosomes. Therefore, the reason for the short A06, D03, and D04 chromosomes may be related to the reference genome or may be more related to the parental conserved regions on these chromosomes. Qi et al. (Qi et al. 2017) also found that ChrD05 is short, which may be related to the mapping population used, and requires further investigation.

Conclusions
In this study, an F 2 population of upland cotton was used to construct a high-density genetic map by parental re-sequencing and progeny SLAF sequencing. Finally, 4,607 of upmapping markers were obtained with a total genetic distance of 3,063.4 cM. The average genetic distance of markers was 0.898 cM. Sixteen QTLs related to plant height were obtained from upland cotton grown in four environments, and the QTLs were genetically analyzed and functional annotations were identi ed for candidate genes within con dence intervals. In addition, the ne mapping of QTLs on ChrD05 by the Mutmap-like strategy greatly reduced the genetic distance and physical distance of the QTLs. This study not only enriched the genetic map of upland cotton, which provided theoretical support for molecular marker-assisted breeding, but also was a successful application of the parental resequencing and progeny SLAF-seq strategy in the F 2 population of upland cotton. At the same time, the combination of multiple analytical methods provided a solution for the ne mapping of QTLs. A total of 18 candidate genes were obtained through ne mapping, and we obtained 14 candidate genes that may be related to plant height.

Declarations
Availability of data and materials Review of data is available and there are no competing interests. All datapresented within is the corresponding authors'data and is available uponrequest.
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Competing interests
The authors have declared that no competing interests exist

Funding
The special academic-level coordinating work task of the basic scienti c research business expenses of the Chinese Academy of Agricultural Sciences "CCIA high-quality cotton industry chain key technology research and demonstration" (Y2020LM01); Chinese Academy of Agricultural Sciences Science and Technology Innovation Project Agricultural Science and Technology go out collaborative innovation and integration demonstration research task "Central Asia Screening of high-quality cotton varieties and integrated R&D and application of supporting cultivation techniques" (CAAS-XTCX2018020-4); "Belt and Road" agricultural resources and technology cooperation project of the Ministry of Agriculture and Rural Affairs (125161016000180005) Authors' contributions HQ conceived and designed the experiments. QW performed the experiments and analyzed the data. YG and WN contributed reagents and materials. QW wrote the article. All the authors read and approved the nal manuscript.

Figure 1
The number of SNPs in eight segregation patterns. The x-axis indicates eight segregation patterns grouped by the genotype encoding rule, and the y-axis represents the number of SNPs.

Figure 2
The genetic map constructed from the obtained SNPs. The black bar indicates a marker. The x-axis indicates linkage group (LG) and the y-axis represents genetic distance (cM as the unit).

Figure 4
Mutmap-like analysis of plant height in upland cotton. The x-axis represents the position of 26 chromosomes, whereas the yaxis shows the value of Δ (SLAF-index). The dashed line represents the threshold value.

Figure 6
Annotation of the candidate genes in the con dence intervals of QTLs. (A) The annotation of candidate genes in the con dence intervals of QTLs through COG analysis. (B) The annotation of candidate genes in the con dence intervals of QTLs through GO analysis.