A high-density genetic map of Schima superba based on its chromosomal characteristics

Schima superba (Theaceae) is a popular woody tree in China. The obscure chromosomal characters of this species are a limitation in the development of high-density genetic linkage maps, which are valuable resources for molecular breeding and functional genomics. We determined the chromosome number and the karyotype of S. superba as 2n = 36 = 36 m, which is consistent with the tribe Schimeae (n = 18). A high-density genetic map was constructed using genotyping by sequencing (GBS). A F1 full-sib with 116 individuals and their parents (LC31 × JO32) were sequenced on the Illumina HiSeq™ platform. Overall, 343.3 Gb of raw data containing 1,191,933,474 paired-end reads were generated. Based on this, 99,966 polymorphic SNP markers were developed from the parents, and 2209 markers were mapped onto the integrated genetic linkage map after data filtering and SNP genotyping. The map spanned 2076.24 cM and was distributed among 18 linkage groups. The average marker interval was 0.94 cM. A total of 168 quantitative trait loci (QTLs) for 14 growth traits were identified. The chromosome number and karyotype of S. superba was 2n = 36 = 36 m and a linkage map with 2209 SNP markers was constructed to identify QTLs for growth traits. Our study provides a basis for molecular-assisted breeding and genomic studies, which will contribute towards the future research and genetic improvement of S. superba.


Background
Schima superba, based on its original description, is formally placed in the tribe Schimeae (≡ Gordonieae) in Theaceae [1][2][3]. Theaceae is a family of subtropical and tropical trees in Asia containing approximately 17-20 genera and 500 species [1]. The genus Schima is closely related to the genera Franklinia and Gordonia and has approximately 20 species. Schima is an economically and ecologically important genus and is mainly distributed in southern China and the adjacent parts of East Asia, with 13 species (six endemic) present in China [1]. Schima superba is a typical large tree and dominant element in the subtropical evergreen broad-leaved forests of southern China. This species is valued commercially for its timber, and the wood is used for furniture and in construction. Additionally, these trees are used as fire breaks and thus help protect forests from fires [4][5][6].
Schima is distinct from other genera within Theaceae with regards to its chromosome number. The family Theaceae comprises three major tribes: Theeae, Schimeae, and Stewartieae. The tribe Stewartieae was the earliest to show differentiation at nearly 48 mya and has a chromosome number of n = 17. Theeae (n = 15) and Schimeae have a closer relationship, but their chromosome numbers are different. Previous studies indicate that all members of the tribe Theeae have the dominant basic chromosome number of n = 15 [3,7,8]. A chromosome number of 17 (n = 17) is the base number within the entire Stewartiae tribe [9]. In the tribe Schimeae, the basic chromosome numbers are either n = 15 or n = 18, and the count of n = 18 is more widely recognized in Schima than the original count number of n = 15 [2,[10][11][12][13]. The chromosome number of S. superba within Schima is still unclear, which limits studies on this species.
Due to the long (up to 30 years) breeding cycle of woody trees, there is an urgent need to improve upon traditional breeding methods [14]. Marker-assisted selection (MAS) is a useful tool for reducing the breeding cycle and has been used for many decades [15][16][17][18]. The development of a saturated genetic linkage map using molecular markers with high genome coverage is a prerequisite for the application of molecular plant breeding [19]. A molecular genetic linkage map is required as a fundamental resource for MAS; however, no high-density genetic maps have been constructed for S. superba. Therefore, an advanced breeding strategy and genetic maps are necessary for the identification of genes associated with important genotypes in S. superba. Genotyping by sequencing (GBS) has been used for the rapid development of thousands of segregating single nucleotide polymorphism (SNP) markers in large mapping populations at a low cost [20]. GBS has been widely used in high-density genetic linkage map construction [18,21,22] and quantitative trait locus (QTL) mapping in several plant species [23][24][25][26].
In the present study, we report the chromosome number of S. superba and describe its karyotype. We further performed hybridization between the individuals "LC31" and "JO32". Based on the F1 full-sib, a high-density genetic map was constructed using the GBS approach. Phenotypic traits related to height, stem base diameter, growth rate, crown width, branching characters, and other growth-related parameters were mapped on this linkage map, and the associated SNPs were identified.

Chromosome data
The mapping population consisted of 116 full-sibs derived from two selected trees from natural S. superba forests: LC31 (female parent, 50-years-old) and JO32 (male parent, 37-years-old). The parents exhibited differences in growth rate, woody yield, and wood quality. The hybridization was performed in 2013. A total of 116 full-sibs were harvested and planted on the forest farm of Longquan in 2014 and were used for the genetic map construction.
The mitotic metaphase indicated a chromosome number of 2n = 36 for S. superba (Fig. 1a). The absolute  Table 1). The chromosomes were determined to be 36 median ( Fig. 1b; Table 1), which corresponded to a 3B karyotype symmetry. A secondary constriction was not observed. The arm ratios of most of the chromosomes were lower than 1.5 except one that was between 1.5 and 2.0 ( Table 1) indicating low intra-chromosomal asymmetry.

Sequencing data quality assessment
The number of clean reads obtained from the female parent, male parent, and progeny were 37,926,886, 50,742,572, and 9,981,084-45,923,854 with an average of 20,550,577 clean reads. The 2.96 Gb of individual raw data were generated using a Hi-Seq platform for the parents and 116 progenies, yielding 343.3 Gb of high-quality sequences with an average Q20 ratio of 94.87%, a Q30 ratio of 87.83%, and a GC content of 35.17%. The average MseI enzyme capture rate was 98.63%, validating the quality of the enzyme digest. No abnormal SNP calls were found, validating the genotyping accuracy.
In the absence of a suitable reference genome, a 137 Mb tag clustering by the male parent 'JO32' was used as the reference genome. A summary of the parental reads and reference genome alignment is shown in Table 2. For each progeny, the clean reads ranged from 9,981,084 to 45,923,854 bp, with an average of 20,550,577 bp (Additional file 1: Table S1). Moreover, the average mapping rate of the 116 full-sibs was 89.17%.

GBS-based SNP identification
We used the GATK (type UnifiedGenotyper) software to determine 298,332 and 190,522 SNPs in the female and male parents, respectively. For F1 individuals, an average of 276,582 SNPs was discovered for an individual progeny. The heterozygosis SNP rate of the females and males was 72.04 and 98.86%, respectively. The progeny SNP results are provided in the Additional file 2: Table S2 File. To avoid false positive SNPs, the base number of the parent SNP was set as ≥4 and the base number of the progeny was ≥2.
A Bayesian model was used to detect 99,966 polymorphic loci (Fig. 2), which could be classified into eight segregation types according to the CP model in JoinMap 4.0. Among these, three major patterns including hk × hk, nn × np, and lm × ll accounted for nearly 94.92% of the loci, while the other five patterns, ab× cd, ab × cc, cc × ab, ef × eg, and aa × bb, accounted for only 5.08%. Thus, only segregation types hk × hk, nn × np, and lm × ll were selected for genotyping in F1 individuals, resulting in a total number of 94,883 polymorphic loci.

High-density genetic map development
A total of 28,856 markers were ultimately obtained after filtering the markers with complete coverage during genotyping. The available markers were filtered for < 65% integrity using a Chi-squared test with a threshold of P < 0.001. Ultimately, 519 markers with hk × hk, 639 markers with lm × ll, and 1051 markers with nn × np segregation types were used for map construction. Following data preparation, 1569 markers with types nn × np and hk × hk and 1137 markers with types lm × ll and hk × hk were used for the construction of male and female maps, respectively. On the male map, 1569 markers were placed into 18 LGs, and the genetic length was 1583.97 cM with an average marker interval distance of 1.01 cM (Additional file 3: Table S3). On the female map, 1137 markers were placed into 18 linkage groups (LGs), and the genetic length was 1459.19 cM with an average marker interval of 1.28 cM (Additional file 4: Table S4). The heat maps reflect the linkage relationships between the markers in a single linkage group (Additional file 5: Figure S1).
The two parent maps were then merged, and the integrated map spanned 2076.24 cM with 2209 markers placed into 18 LGs (Fig. 3). Among the 18 LGs, LG06 was the largest group with a genetic distance of 173.97 cM and 231 markers.
LG18 was the smallest group with  (Table 3). Between the markers, 2161 gaps (97.83%) were less than 5 cM, 40 gaps were between 5 and 10 cM, seven gaps were between 10 and 20 cM, and only one gap was larger than 20 cM, which was located on LG13.

QTL mapping of growth traits
QTLs were mapped using the phenotypic data (

Discussion
Morphological and karyotype analysis of S. superba indicated that it has a diploid chromosome number (n = 18) with three large, five medium-large, seven medium-small, and three small sized chromosomes with a median centromere (Fig. 1, Table 1). Gordonia and Schima are the two biggest genera in the tribe Schimeae (≡Gordonieae). Gordonia has the highest species count with n = 15 [2,10,[27][28][29], and only one North American species (G. lasianthus) has a different chromosome number of n = 18 [30]. Yang (2004) indicated that Gordonia should be further classified into two genera: one being the Chinese Gordonia species with n = 15, which should be classified into Polyspora in tribe Theeae, whereas the other North American Gordonia (G. lasianthus) with n = 18 should form the monotypic genus Gordonia s.str [2]. Our results confirmed that the species chromosomal number in Schima is n = 18. Previous studies showed that n = 18 for most species in this genus [28,29,31,32], and only Schima wallichii has n = 15 [33] or n = 18 [2,34]. Bloembergen and Keng regarded the genus as monotypic and subdivided S. wallichii into nine or more geographically separated species [35,36]. Our results corroborated the study of Wang (2006) where the base chromosome numbers in tribes Theeae, Schimeae (≡Gordonieae), and Stewartieae in Theaceae are n = 15, n = 18, and n = 17, respectively [3].
The construction of genetic linkage maps and the identification of growth trait-related QTLs will facilitate future genetic and breeding studies in S. superba. However, the absence of a genetic map has prevented the use of QTLs from this species in breeding programs. GBS is a rapid, efficient, and cost-effective strategy for SNP development, genetic linkage map construction, marker-based complex trait selection, and draft genome assembly in many species with or without reference genomes [18,20,[37][38][39][40]. We generated 343.3 Gb of raw sequences from which 99,966 SNPs and 94,883 (94.9%) polymorphic SNPs were detected from the two parents and the 116 full sibs. The integrated genetic linkage map was divided into 18 LGs and comprised 2209 SNP markers, which spanned 2076.24 cM with an average marker interval of 0.94 cM. Accordingly, the high-quality SNP-based map will provide a basis for MAS and genomic studies, which should contribute to the genetic improvement of S. superba.
Plant morphological traits influence cultivation area, cultivation patterns, yields, and planting efficiency in forests. Woody plant traits such as higher individual height, thicker stem base diameter, fewer branches or bifurcate trunks, fast growth rate, higher wood density, wood durability, lower wood shrinkage and fissure, and greater wood strength are favored by forest geneticists and breeders [14,41]. We have a good understanding of the effect of individual QTLs on phenotypes and their position in the genome [42]. However, QTLs discovered in Pinus radiata experimental populations explain only 0.78-3.8% of the variation in juvenile wood density and diameter [43]. We developed 168 QTLs from 14 growth traits, which varied from 11.2-22.8% (Table 4). Plant growth and branching pattern traits are complex dynamic traits that are regulated by the interactions of many genes that may behave differently during different growth stages. Some chromosome segments may be associated with different traits at different growth periods, and QTLs can be detected throughout various stages of plant development. However, certain QTLs are conditional and are found only at specific growth stages [44,45]. For example, the QTL affecting height located in the interval from 45.95-143.94 cM on LG13 and the QTL affecting SBD located in the interval from 21.21-94.78 cM on LG17 appeared to be unconditional and were detected in the measurements from years one and three (Additional file 6: Table S5). However, the minor QTL of height located at 92.17 cM at LG2 was found only at the one-year stage, and the QTL of height located at 69.55 cM at LG10 was found only at the three-year stage (Additional file 5: Table S5). Ten out of the 168 QTLs with LOD scores of  (Table 5).

Conclusion
We further examined the association between the actual segregation of the SNP markers closest to the QTL peaks and the traits of interest in the mapping population. The relationship between the genotypes of the linked markers and the average phenotypic values are displayed in Table 5. For SsSNPLG13lm3242 and SsSNPLG13np4091, the individuals harboring the homozygous AA genotype exhibited significantly lower HGR.
The "G" allele, in this case, was strongly correlated with increased vertical height. However, the presence of "G" in the marker SsSNPLG4np2901 was associated with increased stem base diameter (SBD). Other markers, including SsSNPLG4np2901, SsSNPLG4np2206, and SsSNPLG10np8947, showed a "co-dominant" effect, where the average phenotypes of the heterozygous "TG," "TA", and "GA" individuals were higher than those of the homozygous groups. The genotypes of all of these SNPs for these QTLs in the full-sib were much more similar to the father line (JO32) and showed higher phenotype trait values (Table 5).

Morphological and karyotype characters
Seeds and seedlings of S. superba were collected from the Longquan region of Zhejiang province, China. The seeds were germinated in Petri dishes in a growth chamber, and the seedlings were cultivated in the nursery of the Longquan Forestry Academy. Roots (1 cm) were removed from the seedlings or germinating seeds and pretreated with 8-hydroxy quinoline for 4 h at 4°C, fixed for 24 h in Carnoy's fluid (absolute alcohol: glacial acetic acid =3:1) at 4°C, washed with 70% alcohol (v/v), washed with distilled water, macerated in 1 M hydrochloric acid at room temperature for 10 min and at 60°C for 20 min, and then macerated in distilled water for 1 h. The root tips were then removed, compressed, and stained with carbol Fuchs. The cytological classification of the resting and mitotic prophase was performed as described by Tanaka [46]; the classification of karyotype symmetry was according to Stebbins [47]; and the use of symbols for the description of the chromosomes was according to Levan [48].

Experimental population and phenotypic measurements
The mapping population consisted of 116 full-sibs derived from two select trees from natural S. superba forests: LC31 (female parent) and JO32 (male parent). The parents exhibited obvious differences in several phenotypes, such as growth rate, woody yield, and quality.
They were twig-grafted in 2009 and kept in Longquan, Zhejiang, China (latitude: 28°03'N, longitude: 119°06′E, mean altitude: 200-300 m). The mean annual air temperature was 17.6°C, and the rainfall was 1664.8-1706.2 mm. The hybridization was performed in 2013. A total of 116 full-sibs were harvested and planted on the forest farm of Longquan in 2014 and were used for the genetic map construction.
In November 2015, H1, SBD1, LL, and LW of one-year-old seedlings were measured, and in November of 2017, H3, SBD3, HGR, SBDGR, CW, PSN, MBA, MBD, BT, and HBT of three-year-old seedlings were measured. All measurements were used for the QTL analysis.

DNA extraction
Young leaf samples were individually collected from the two parents and 116 full-sibs for DNA extraction. All samples were immediately frozen in liquid nitrogen and preserved at − 80°C until extraction. The genomic DNA was extracted using a Plant Genomic DNA Isolation kit (Dingguo, Beijing, China) following the manufacturer's instructions. The DNA purity and concentration were determined using a NanoDrop 1000 (Thermo Fisher Scientific, USA) and assessed on 1% agarose gels.

GBS protocol and library construction
A GBS strategy (Novogene, Beijing, China) was used to develop the SNP markers. First, we performed a GBS pre-design for restriction enzyme selection. The GBS library was constructed by digesting the genomic DNA with a Mse I, EcoR I, and Hae III enzyme combination with subsequent ligation to barcodes, after which each sample was amplified in the multiplex PCR. The desired fragments were selected for library construction.
Next, we performed a standard analysis of the raw data. The Illumina HiSeq™ sequencing platform (Illumina, San Diego, CA, USA) was used for paired-end (PE) 150 sequencing. Then, based on the analysis of the original data, we conducted advanced analyses, and DNA library assembly was followed by HiSeq sequencing with the removal of reads with adapters, low-quality base calls, or uncalled bases.
Finally, in the progeny GBS-Seq analysis, we analyzed the number of reads cut by MseI at both ends of each screened read, and the reads that did not contain these restriction sites were discarded. The specific reads and the ratio of the total number of reads to the number of enzyme-captured reads were also recorded. Then, Burrows-Wheeler Aligner (BWA) software [49] was used to align the clean reads against the reference genome (settings: mem -t 4 -k 32 -M -R). The reference genome was selected using a large amount of data from the male parent and was clustered and built to obtain a consistent sequence. The reads of the male parents, allowing up to six base mismatches, were clustered using the Stacks software [50,51] and used to select the groups that contained read support numbers up to 3. These were clustered to obtain the final reference sequence. The alignment files were converted to bam files using SAMtools software [52]. If multiple read pairs had identical external coordinates, only the pair with the highest mapping quality was retained.

SNP identification and genotyping
SNP calling was performed for parents and progenies using SAMtools software [49]. Then, a Perl script was used to filter the SNPs that had more than two genotypes. Polymorphic markers between the two parents were detected and classified into eight segregation patterns (ab × cd, ab × cc, cc × ab, ef × eg, hk × hk, nn × np, lm × ll, and aa × bb) according to the CP model in Join-Map 4.0 software [53]. After the parental markers were developed, the 116 progeny lines were genotyped for the loci at which the parents differed.

Linkage map construction
Markers indicating significantly distorted segregation (P < 0.001), integrity (> 65%), or containing abnormal bases were filtered by JoinMap 4.0 (JoinMap® 4.0: Software for the calculation of genetic linkage maps in experimental populations). The segregation patterns hk × hk and nn × np were used for the construction of the male parent map, while the patterns lm × ll and hk × hk were used for the female parent map using JoinMap 4.0. The regression algorithm, three times circulation sequence, and Kosambi mapping functions were used in marker distance calculation [54]. The LOD value was set to 2.0-10. The integrated map for both the male and female parents was computed using the combined group for map integration function in MergeMap software [55]. A Perl script SVG was used to visualize the exported maps, and heat maps were constructed to evaluate the maps.

Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author details