Whole-Genome Sequencing of 117 Chromosome Segment Substitution Lines for Genetic Analyses of Complex Traits in Rice

Rice is one of the most important food crops in Asia. Genetic analyses of complex traits and molecular breeding studies in rice greatly rely on the construction of various genetic populations. Chromosome segment substitution lines (CSSLs) serve as a powerful genetic population for quantitative trait locus (QTL) mapping in rice. Moreover, CSSLs containing target genomic regions can be used as improved varieties in rice breeding. In this study, we developed a set of CSSLs consisting of 117 lines derived from the recipient ‘Huanghuazhan’ (HHZ) and the donor ‘Basmati Surkb 89–15’ (BAS). The 117 lines were extensively genotyped by whole-genome resequencing, and a high-density genotype map was constructed for the CSSL population. The 117 CSSLs covered 99.78% of the BAS genome. Each line contained a single segment, and the average segment length was 6.02 Mb. Using the CSSL population, we investigated three agronomic traits in Shanghai and Hangzhou, China, and a total of 25 QTLs were detected in both environments. Among those QTLs, we found that RFT1 was the causal gene for heading date variance between HHZ and BAS. RFT1 from BAS was found to contain a loss-of-function allele based on yeast two-hybrid assay, and its causal variation was a P to S change in the 94th amino acid of the RFT1 protein. The combination of high-throughput genotyping and marker-assisted selection (MAS) is a highly efficient way to construct CSSLs in rice, and extensively genotyped CSSLs will be a powerful tool for the genetic mapping of agronomic traits and molecular breeding for target QTLs/genes.


Background
Rice is an important food crop, and its high and stable yield is related to global food security. Many agronomic traits in rice, such as heading date, tiller number, plant height and disease resistance, are related to rice production, and these complex traits are controlled by many QTLs (Glazier et al. 2002). Five QTLs for heading date, namely, Hd1, Hd2, Hd3, Hd4 and Hd5, were found in an F 2 population derived from a Nipponbare and Kasalath cross (Yano et al. 1997). Through fine mapping, the rice Hd1 gene (homolog of CONSTANS in Arabidopsis) was finally cloned and validated (Yano et al. 2000). In addition to heading date, QTL mapping and cloning have also been used in traits controlling plant architecture and stress resistance. For example, the PROG1 gene, which is related to tiller angle and the number of tillers of rice (Jin et al. 2008;Tan et al. 2008), and Thermotolerance 1 (TT1) for thermotolerance , have been mapped. Therefore, QTL mapping and gene cloning are accurate and effective methods to study functional genes (Price 2006). To date, more than 225 QTLs have been cloned and functionally validated in rice (Salvi and Tuberosa 2005). The mapping populations commonly used for QTL mapping include F 2 , F 2:3 , recombinant inbred lines (RILs), doubled haploid (DH), CSSLs, and others. As a stable mapping population, CSSLs have been widely used in QTL mapping and gene cloning. After the first application of CSSLs in tomato Zamir 1994, 1995), this technique was immediately applied to rice research (Doi et al. 1997). In general, the development of CSSLs requires MAS to determine the genotype of the population and perform backcross breeding. Ideally, each CSSL has a single, small minimal chromosome fragment from the donor, and all donor fragments collectively cover the entire genome of the donor (Balakrishnan et al. 2019). However, to obtain a perfect set of CSSLs, high-density molecular markers are needed to identify the size of the introgressed fragment, but the PCR analysis of molecular markers often greatly increases the workload. High-throughput genotyping methods based on next-generation sequencing technology can be used to draw high-resolution physical maps quickly, which can replace marker-based genotyping approaches and save many hours of laborious work (Huang et al. 2009). Recently, many high-precision CSSLs have been constructed by using high-throughput genotyping technology Zhu et al. 2015;Xu et al. 2010;Jiang et al. 2017). These high-quality CSSLs are helpful for analyzing traits and cloning candidate genes.

Open Access
Flowering is the hallmark of the transition from vegetative growth to reproductive growth (Arteca 1996). For rice, flowering time (called heading date in rice) is directly related to yield. For example, Ghd7 (Xue et al. 2008), Ghd7.1/DTH7/OsPRR37 (Yan et al. 2013;Gao et al. 2014;Koo et al. 2013) and Ghd8/DTH8 (Yan et al. 2011;Wei et al. 2010) simultaneously control three traits -grain yield, plant height, and heading date. In particular, florigen, as a key protein encoded by the FLOWERING LOCUS T (FT) gene, is directly related to flowering in plants; it is produced in the phloem of leaves and transferred to the shoot apical meristem (SAM) to induce flowering (Tsuji 2017;Turck et al. 2008;Tamaki et al. 2007). In rice, Hd3a and RFT1 are orthologs of the A. thaliana florigen FT, with high sequence similarity (Komiya et al. 2008;Kojima et al. 2002). Previous studies have found that the 14-3-3 protein of the florigen receptor mediates the interaction of Hd3a and the transcription factor OsFD1 to form a triple-structured "florigen activation complex (FAC)" that activates the expression of the downstream genes OsMADS14 and OsMADS15 to induce rice heading (Taoka et al. 2011;Tamaki et al. 2015). Interestingly, RFT1 also interacts with the 14-3-3 protein, and nonfunctional RFT1 with the E105K mutation fails to interact with the 14-3-3 protein (Zhao et al. 2015). However, it is unclear whether other mutated sites in RFT1 can affect its interaction with the 14-3-3 protein.
Here, we constructed a set of CSSLs derived from the indica cultivar 'Huanghuazhan' (HHZ, a high-quality rice variety widely cultivated in China) and 'Basmati Surkb 89-15' (BAS, an aromatic rice variety from Pakistan). The variety HHZ was used as the recipient parent, and BAS was used as the donor parent. A total of 117 CSSLs were constructed by a combination of MAS and high-throughput genotyping based on whole-genome sequencing. QTLs for heading date, plant height and panicle length were analyzed using the CSSLs, and the biological function of RFT1 in BAS, which contained a P94S mutation, was verified.

Development of the CSSLs
The development process of the CSSLs is shown in Fig. 1. F 1 plants were obtained in cross between HHZ and BAS. The F 1 plants were backcrossed once with HHZ to produce the BC 1 F 1 generation. A total of 184 plants screened from the BC 1 F 1 population were backcrossed to produce the BC 2 F 1 generation. Then, 79 plants were backcrossed to produce the BC 3 F 1 generation. Furthermore, 57 plants were screened from the BC 3 F 1 population and backcrossed to create the BC 4 F 1 generation, and 64 plants were screened from the BC 4 F 1 population and backcrossed to create the BC 5 F 1 generation. In each generation, plants that had heterozygous genotypes on one chromosome and the remaining genetic background homozygous for HHZ genotypes were chosen. In addition, heterozygous fragments of those selected plants could cover whole chromosomes. The genotypes of BC n(1-4) F 1 plants were determined by whole-genome resequencing. The genotypes of BC 5 F 1 plants were determined by MAS. A total of 107 plants, including 19 BC 3 F 1 plants, 21 BC 4 F 1 plants and 67 BC 5 F 1 plants with a heterozygous substituted segment, were self-pollinated to produce BC 3 F 2 , BC 4 F 2 and BC 5 F 2 populations, respectively. Thirty-three plants with small segment substitutions (approximately 5 Mb), including 19 BC 3 F 2 plants and 14 BC 4 F 2 plants selected by MAS, were self-pollinated to obtain 33 CSSLs. Then, 7 BC 4 F 1 plants and 67 BC 5 F 1 plants were self-pollinated to obtain 84 CSSLs, and the 84 CSSLs were subjected to another round of highthroughput genotyping by whole-genome resequencing. Finally, a linkage map was constructed for the 117 CSSLs.
In addition to whole-genome sequencing, we also developed a set of PCR-based markers for genotyping and gene pyramiding in the future. Based on the comparison of the HHZ and BAS genome assemblies (data not shown), we developed 396 InDel (insertion-deletion) markers for the construction of CSSLs that were evenly distributed on the 12 rice chromosomes (Additional file 1: Fig. S1). The average interval between two adjacent markers on the physical map was 0.94 Mb (Additional file 2: Table S1). The primer sequence information for the markers used in this study is shown in Additional file 3: Table S2. Both PCR genotyping and whole-genome sequencing were applied to BC 5 F 1 plants. The genotyping results from the 396 markers were consistent with those from whole-genome resequencing.

Characteristics of the CSSLs
An accurate physical map of the 117 CSSLs was constructed according to SEG-Map  with Os-Nipponbare-Reference-IRGSP-1.0 Sakai et al. 2013) as the reference genome (Fig. 2).
The set of CSSLs contains 117 homozygous segments, and each line contains only one substituted segment. The average number of substituted segments was approximately 10 for each chromosome, ranging from 5 on chromosome 12 to 18 on chromosome 4 ( Table 1). Analysis of the length of the substituted segments showed that the total length of the substituted segments in the population was 704.6 Mb, which is 1.89 times the total length of the rice genome; on average, each line carried 6.02 Mb of substituted material. The coverage rate of the substituted segments with redundancy removed was 99.78% of the BAS genome in the CSSL set. Except for chromosome 11 and chromosome 1, which had 98.61% and 99.07% coverage rates, respectively, all of the other 10 chromosomes were fully covered. The size of the segments ranged from 0 to 24 Mb (Fig. 3). Among those segments, the smallest segment is 0.1 Mb, which is on chromosome 11, and the largest one is 23.5 Mb on chromosome 3. Additionally, 71.79% of the segments were shorter than 7 Mb, while 15.38% were longer than 10 Mb. In particular, 10 CSSLs had a substituted segment of less than 1 Mb (Fig. 3).

QTL Analysis Using the CSSLs
Since self-pollinated CS004 plants failed to produce seeds, heading time (HD), plant height (PH) and panicle length (PL) were investigated for 116 CSSLs and their parents in Shanghai and Hangzhou, China. The phenotypic values of the three traits had a normal or skewed distribution in both environments (Fig. 4). The average values for the CSSLs were close to the statistical data from HHZ, which was consistent with the genetic background of the CSSLs. Descriptive statistics are listed in Table 2. QTL IciMapping was used to analyze the QTLs for the specified agronomic traits in both Shanghai and Hangzhou (Table 3). A total of 25 QTLs were detected for those three traits and were distributed on 9 chromosomes, while no QTLs were found on chromosomes 2, 5, and 9 (Additional file 4: Fig. S2). Among the 25 QTLs, 9 were detected from the data derived from Shanghai, and 16 were detected from statistical data from Hangzhou; 4 significant QTLs (qHD6-1, qHD8-1, qHD10-1, and qPH1-1) were identified at both sites. Some QTLs were located in the same or adjacent chromosomal regions.

Plant Height
Three QTLs associated with PH were detected on chromosomes 1, 6 and 8. The phenotypic variance explained by individual QTLs varied from 3.6% to 71.7%. The QTL qPH1-1, for which the BAS genotype delayed heading date, explained 71.7% of the phenotypical variation in Shanghai and was located in the region from 35.7 to

Panicle Length
Three QTLs associated with PL were detected on chromosomes 3 and 10. The phenotypic variance explained by individual QTLs varied from 9.8% to 14.0%. The BAS alleles for three QTLs had negative effects on panicle length. However, no shared QTLs were detected in the two sites.

Verification of the Biological Function of qHD6-1
QTL analysis was carried out for heading date in this set of CSSLs, and qHD6-1 found on chromosome 6 had a noticeable effect on heading date. The interval of qHD6-1 could be narrowed down to a region of 2.0 Mb, which was located from 1.0 Mb to 3.0 Mb on chromosome 6 ( Fig. 5). Considering that qHD6-1 is a major locus underlying heading date, we analyzed the candidate genes related to heading date in this region and preliminarily identified rice FT genes (Hd3a and RFT1) that may play a role in this locus. Sequence variation analysis of the coding region and 5 kb promoter of Hd3a and RFT1 showed that there was a 1 bp deletion in the Hd3a promoter of BAS (Additional file 6: Fig. S4), and in RFT1, there was a nonsynonymous mutation in exon 3 (Fig. 6a). Compared with HHZ, RFT1-BAS has a unique amino acid substitution from Pro (P) to Ser (S) at position 94. Then, we analyzed the transcriptome data of 30-day-old seedling leaves and found that there was no difference in the expression of Hd3a between the two parents (data not shown). In recent studies, we collected quantitative trait gene (QTG) alleles of known QTLs and confirmed that the BAS alleles for Hd3a did not belong to the known QTG alleles (Wei et al. 2021). According to previous studies, the Hd3a promoter types of BAS and HHZ did not cause differences in the expression of Hd3a (Takahashi et al. 2009). In addition, RFT1 played a major role in inducing rice flowering under LD conditions. Under these conditions, the heading date of RFT1 RNAi plants was delayed by approximately 100 days compared with that of the wild type, whereas Hd3a RNAi plants basically flowered at the same time as wild-type plants (Komiya et al. 2009). Recently, hd3a and rft1 were targeted by CRISPR/ Cas9-mediated mutagenesis of Hd3a and RFT1. Under LD conditions, the heading date of rft1 mutants was significantly delayed compared with that of wild-type plants Song et al. 2017), while hd3a mutants did not display late-flowering phenotypes under those conditions (Song et al. 2017). In summary, we speculated that the candidate gene at qHD6.1 was possibly RFT1 rather than Hd3a.
To verify the functionality of RFT1-BAS, we performed yeast two-hybrid assays to test the interaction between the proteins encoded by the different RFT1 alleles (HHZ and BAS) and 14-3-3 family proteins (Gf14a, Gf14b, Gf14c, Gf14d, Gf14e, and Gf14f ). We found that RFT1-HHZ could interact with all the isoforms of GF14, while none of the GF14s interacted with RFT1-BAS (Fig. 6b). This result suggested that the P94S mutation in RFT1-BAS prevented the interaction with the 14-3-3 protein. Therefore, we proposed that rft1-BAS is a nonfunctional allele caused by a coding SNP that leads to the P94S substitution.

Discussion
CSSLs are an excellent population for QTL mapping and gene cloning. Currently, a number of CSSLs populations with indica and japonica as parents have been successfully constructed (Kubo et al. 2002;Xi et al. 2006;Zhu Fig . 6 a The nucleotide sequences and amino acid sequence variation sites of RFT1 in HHZ and BAS compared with Nipponbare. b The protein interactions were tested by yeast two-hybrid assays. RFT1-HHZ interacted with six members of the 14-3-3 protein family, but RFT1-BAS did not interact with any. The interactions are indicated by blue-colored yeast colonies on SD/ − Ade/ − His/ − Leu/ − Trp/ + X-α-Gal/ + aureobasidin A (-AHLT) media. SD/-Leu/-Trp (-LT) et al. 2009). In addition, some groups have constructed CSSLs using cultivated rice as recurrent parents and wild rice as donors (Ma et al. 2019;Yuan et al. 2020;Xi et al. 2006). Based on the several CSSLs populations, many important functional genes have been cloned. A total of 153 single segment substitution lines (SSSLs) were constructed by crossing Basmati385 (donor parent) and HJX74 (recurrent parent), and OsSPL16, which controls grain size, grain shape and rice quality, was cloned from this population (Wang et al. 2012a;Zhang et al. 2004;Xi et al. 2006). PROG1, which controls the prostrate growth habit of common wild rice, was originally mapped from a set of CSSLs derived from Teqing as the recurrent parent and wild rice (O. rufipogon) as the donor parent (Hao et al. 2006;Jin et al. 2008). In this study, we mapped a number of novel QTLs using 116 CSSLs that will be used for gene cloning in future studies. For example, qHD4-2 for heading date was mapped between 18.5 and 22.0 Mb on chromosome 4; genes related to heading date have not been reported in this region. Therefore, this set of CSSLs provides excellent material for QTL mapping and cloning.
CSSLs contain one substituted chromosomal segment from the donor parent, so they can be used as near-isogenic lines (NILs) by themselves or can be developed into higher resolution NILs by crossing with the recurrent parent again (Zamir 2001). NILs must be constructed when cloning genes using the traditional QTL cloning method ). To genotype a CSSLs population, molecular markers that are inexpensive and easy to use must be adopted. Currently, multifarious molecular marker systems have been established. However, in the process of constructing CSSLs by the MAS method, the size of the substituted fragment cannot be accurately calculated; therefore, deviations may occur in QTL detection (Paterson et al. 1990). High-throughput genotyping by whole-genome resequencing can accurately determine recombination breakpoints (Huang et al. 2009), which have been used for physical mapping of RIL, F 2 and CSSL populations Huang et al. 2016;Xu et al. 2010).
The stable production of rice is directly related to global food security. Therefore, breeding varieties with high yield, strong stress resistance (biotic and abiotic stresses) and superior quality should be a top priority for breeders (Zhang 2007). However, using traditional breeding methods to improve multiple crop traits simultaneously is difficult (Schaart et al. 2016). Recently, the concept of rational design breeding was proposed, and valuable genes from different rice varieties were pyramided to simultaneously improve multiple traits in Teqing in a short time (Zeng et al. 2017). In previous studies, according to RiceNavi, we pyramided Badh2 (Chen et al. 2008), TAC1 (Yu et al. 2007) and OsSOC1 (Lee et al. 2004) from BAS, and the new line showed improved grain fragrance, heading date, plant type and yield compared with HHZ (Wei et al. 2021). The 117 CSSLs created here can be applied to the innovation of germplasm resources. Moreover, we also designed 396 InDel markers based on HHZ and BAS genome sequences; these markers can be applied to gene pyramiding of different chromosome segments.
Even though each CSSL contains only one segment, the QTL mapping interval is still large. However, quantitative trait nucleotide (QTN) variation information of some major loci in rice has been extensively studied (Wei et al. 2021), and QTNs can be located in many different populations. Therefore, even if some QTLs are located in a large interval, the main causal genes contained in this interval can be analyzed based on previous research results and existing biological techniques. For instance, the BAS allele at locus qPH1-1, which can increase plant height, was located at 35.7-40.2 Mb on chromosome 1, where SD1-BAS was confirmed as a wild-type allele. Furthermore, rice FT is candidate gene of qHD6-1. By combining different analysis and experimental methods, we confirmed with high probability that the candidate gene of qHD6-1 is RFT1.

Conclusions
In summary, we successfully developed a set of CSSLs by combining MAS and high-throughput genotyping based on whole-genome resequencing. These CSSLs can be used for QTL mapping, cloning and molecular breeding. Using this set of CSSLs, we not only detected cloned QTLs but also found some novel QTLs. Among them, a new nonfunctional rft1-BAS allele was verified based on yeast two-hybrid experiments.

Plant Materials
Huanghuazhan (HHZ), an indica cultivar, is often used as a restorer line in three-line hybrid rice seed production. The aromatic rice Basmati Surkb 89-15 (BAS) is native to Pakistan. The variety HHZ was used as the recipient parent, and BAS was used as the donor parent. The two parental lines were derived from the China National Rice Research Institute. All materials used in the process of population construction were grown in the summer in Shanghai (121°42′ E, 30°97′ N) and in the winter in Sanya (109°19′ E, 18°38′ N), China.

DNA Extraction and Molecular Analysis
The TPS method was used for the extraction of genomic DNA from fresh leaves of each individual. DNA amplification was performed by PCR with the following protocol: predenaturation at 94 °C for 5 min; 36 cycles of denaturation at 94 °C for 30 s, annealing at 53-58 °C for 30 s, and extension at 72 °C for 30 s; and a final extension at 72 °C for 5 min. The reactions were carried out in 96-well PCR plates in 25 μl volumes containing 50-100 ng of template DNA, 0.2 μmol/L of each primer and 12.5 μl of 2 × EasyTaq PCR SuperMix (TransGen Biotech Inc., China). Electrophoresis of the amplification products was carried out on 4% agarose gels and photographed using a Tanon 1600 Automatic Digital Gel Imaging Analysis System (Tanon Inc., China).

High-Throughput Genotyping by Whole-Genome Resequencing
The genomic DNA from individuals from each generation used for sequencing was extracted from young leaves using magnetic beads (no. 500 T, NanoMagBio S-96, China). The Tn5 transposition system was used for DNA library construction. DNA libraries were sequenced with Illumina HiSeq X Ten or NovaSeq6000 using PE150 flow cells according to standard procedures and generated 150 bp paired-end reads with an average 500 bp insert size for subsequent genotyping analyses. Approximately 0.4 × coverage sequence reads were generated for each line. Genotyping was performed using SEG-Map software.

Field Experimental Design and Phenotypic Assessment for CSSLs
The two parents and 116 CSSLs were planted in Shanghai (121°42′ E, 30°97′ N) and Hangzhou (119°93′ E, 30°08′ N) in the summer of 2021. Thirty-day-old seedlings of each line were transplanted into a seven-row plot with seven plants per row and 25 × 30 cm spacing. Field management followed local regulations. The middle five plants in each row were used as samples for phenotypic measurement.
Heading date was defined as the time from sowing to emergence of the first inflorescences above the flag leaf sheath. Plant height, panicle length, and tiller number were measured 20 days after heading. The distance from the ground to the top of the first panicle was measured as the full height of the plant. Panicle length was measured as the plant height minus the distance from the ground to the neck-panicle node. Thirty replications were performed for each trait in Shanghai, and two replications were performed in Hangzhou.

QTL Analysis for Three Agronomic Traits Based on CSSLs
Based on the physical map, each line was converted into a skeleton bin map with 3723 bins. Using a 116 bin map and phenotypic data, QTL analysis was performed with QTL IciMapping V4.2.53 software (https:// www. isbre eding. net/). WinQTLCart (Wang et al. 2012b) was used to analyze the heading time in the BC 3 F 1 population, and the relevant parameters were set according to the user manual.

RNA Extraction and Yeast Two-Hybrid (Y2H) Assay
Total RNA was extracted using TRIzol reagent (Invitrogen Inc.) following the manufacturer's instructions. First-strand cDNA was retrotranscribed using reverse transcriptase (Takara Bio Inc.). The vectors and yeast strains used in the yeast two-hybrid assays were from Clontech (Beijing, China).