Evaluation of a Sugarcane (Saccharum spp.) Hybrid F1 Population Phenotypic Diversity and Construction of a Rapid Sucrose Yield Estimation Model for Breeding

Sugarcane is the major sugar-producing crop worldwide, and hybrid F1 populations are the primary populations used in breeding. Challenged by the sugarcane genome’s complexity and the sucrose yield’s quantitative nature, phenotypic selection is still the most commonly used approach for high-sucrose yield sugarcane breeding. In this study, a hybrid F1 population containing 135 hybrids was constructed and evaluated for 11 traits (sucrose yield (SY) and its related traits) in a randomized complete-block design during two consecutive growing seasons. The results revealed that all the traits exhibited distinct variation, with the coefficient of variation (CV) ranging from 0.09 to 0.35, the Shannon-Wiener diversity index (H′) ranging between 2.64 and 2.98, and the broad-sense heritability ranging from 0.75 to 0.84. Correlation analysis revealed complex correlations between the traits, with 30 trait pairs being significantly correlated. Eight traits, including stalk number (SN), stalk diameter (SD), internode length (IL), stalk height (SH), stalk weight (SW), Brix (B), sucrose content (SC), and yield (Y), were significantly positively correlated with sucrose yield (SY). Cluster analysis based on the 11 traits divided the 135 F1 hybrids into three groups, with 55 hybrids in Group I, 69 hybrids in Group II, and 11 hybrids in Group III. The principal component analysis indicated that the values of the first four major components’ vectors were greater than 1 and the cumulative contribution rate reached 80.93%. Based on the main component values of all samples, 24 F1 genotypes had greater values than the high-yielding parent ‘ROC22’ and were selected for the next breeding stage. A rapid sucrose yield estimation equation was established using four easily measured sucrose yield-related traits through multivariable linear stepwise regression. The model was subsequently confirmed using 26 sugarcane cultivars and 24 F1 hybrids. This study concludes that the sugarcane F1 population holds great genetic diversity in sucrose yield-related traits. The sucrose yield estimation model, ySY=2.01xSN+8.32xSD+0.79xB+3.44xSH−47.64, can aid to breed sugarcane varieties with high sucrose yield.


Introduction
Sugarcane is the most important sugar-producing crop worldwide. It accounts for 85-90% sugar production in China and is mainly distributed in Guangxi, Yunnan, and Guangdong provinces [1,2]. Until 2020, the sugarcane harvest area was about 1.35 × 10 6 hectares in China,

Construction of Hybrid F1 Population
Primer combinations were used to genotype the parental lines and identify the hybrid plants. In this study, 3 SSR markers (sh020061, sh060101, and sh090229) [24] were detected to be differentially amplified between 'Yacheng 9446' and 'ROC22' from 20 markers. The primer combinations were used for genotyping the 209 seedlings based on the male parent's specific bands, and 171 real F1 hybrids were detected. Finally, after removing 36 seedlings that died of disease, we constructed an F1 population with 135 progenies, and this population was used for the genetic diversity analysis.

Phenotype Variation and Diversity of the Hybrid F1 Population
The trait name, abbreviations, units, and descriptive statistics of the 11 traits are shown in Table 1. The coefficient of variation (CV) of the 1 traits ranged from 0.09 to 0.35. The tillering ratio (TR), effective tillering ratio (ETR), yield (Y), and sucrose yield (SY) showed a relatively high (greater than 0.30) coefficient of variation, with the highest CV being 0.43 for effective tillering ratio (ETR), indicating that these traits contained rich variation in the hybrid F1 population. On the other hand, the traits' stalk diameter (SD), internode length (IL), Brix (B), and sucrose content (SC) exhibited a relatively small coefficient of variation (CV < 0.15), with Brix (B) having the minimum coefficient of variation (0.09), indicating that these traits were relatively stable in the population. The skewness and the kurtosis ranged from −0.09 to 1.86 and from −0.20 to 5.57, respectively. The Shannon-Wiener diversity index (H′) of the traits evaluated ranged between 2.64 and 2.98, indicating abundant phenotypic diversity. The broad-sense heritability of the traits ranged from 0.75 to 0.84. The high heritability indicated that genetic factors played a predominant role in determining trait variation. According to the variance analysis, the environment also significantly influenced the traits (p < 0.001). The trait frequency distribution was partial or bimodal, as shown in Figure 1, suggestive of quantitative traits controlled by multiple genes.  TR, ETR, SN, SD, IL, SH, B, SW, SC, Y, and SY are the abbreviations of tillering ratio, effective tillering ratio, stalk number, stalk diameter, internode length, stalk height, Brix, stalk weight, sucrose content, yield, and sucrose yield, respectively. For each trait, the frequency distribution (bottom), distribution fitting curve (diagonal), boxplot (right), scatter plot (below diagonal), and correlation coefficient between pairs of traits are shown. *, ** and *** represent significant at p < 0.05, p < 0.01, and p < 0.001, respectively.

Correlation Analysis of the Eleven Traits
Correlation analysis revealed that 30 pairs of the 11 evaluated traits were significantly correlated. Among them, 17 pairs exhibited significant correlation at p < 0.001, 7 pairs were significant at p < 0.01, and 6 pairs were significant at p < 0.05 ( Figure 1). Moreover, a positive correlation was observed in 24 pairs of traits, while 6 pairs of traits exhibited a negative correlation. For example, the effective tillering ratio (ETR) exhibited a positive correlation with tillering ratio (TR), while stalk diameter (SD) was negatively correlated with stalk number (SN) but showed a significant positive relationship with stalk weight (SW). Yield (Y) and sucrose yield (SY) exhibited the highest positive correlation coefficient of 0.93, and all were significantly correlated with stalk number (SN), stalk diameter (SD), internode length (IL), stalk height (SH), and stalk weight (SW). Sucrose yield (SY) was also significantly correlated with Brix (B) and sucrose content (SC). Finally, internode length (IL) was significantly correlated with stalk height (SH), Brix (B), sucrose content (SC), yield (Y), and sucrose yield (SY).

Cluster Analysis
The Ward method was used for generating a cluster tree of the 11 traits of the 135 F 1 genotypes. The 135 individuals were clustered into 3 groups with 55 hybrids in Group I, 69 hybrids in Group II, and 11 hybrids in Group III ( Figure 2). The main characteristics of Group I were as follows: higher stalk height (SH), wider stalk diameter (SD), longer internode length (IL), greater stalk weight (SW), higher yield (Y), and sucrose yield (SY). Moreover, group two had a higher effective tillering ratio (ETR), more stalk number (SN), and more sugar (brix (B) and sucrose content (SC)). The main characteristics of Group III were higher tillering ratio (TR) but lower effective tillering ratio (ETR) and stalk number (SN), shorter stalk (SH), and relatively low yield (Y) and sucrose yield (SY).

Principal Component Analysis (PCA)
PCA is a dimensionality method used to reduce the complexity of large data sets and increase interpretability while minimizing information loss [25]. In this study, the eigenvalues of the first four principal components (PC) were greater than 1, and the cumulative proportion of the first four PCs was 80.93%, indicating that they are representative of most of the 11 traits' information ( Table 2). PC1 had the largest variance proportion, with 30.11%. The trait vectors with the higher values were stalk number (SN), Brix (B), sucrose content (SC), and sucrose yield (SY), indicating that the PC1 was mainly reflecting information about these traits. The PC2 had the second largest variance proportion, with 26.19%. The trait vectors with the higher values were stalk diameter (SD) and stalk weight (SW), indicating that this PC was mainly affected by stalk diameter and weight. The variance proportion of PC3 was 12.97%, and the higher vectors were Brix (B) and sucrose content (SC). Thus, PC3 was mainly affected by traits related to sucrose content. The variance proportion of PC4 was 11.66%, and the higher values vectors were longer internode (IL) and stalk height (SH). The PCA plot of the F 1 population, drawn using PC1 and PC2, also showed a clear group separation consistent with the cluster tree ( Figure 3). The comprehensive evaluation showed that the PC values (F) of all samples were distributed between 12.29 and 46.78, with a median of 29.62. There were 24 F 1 genotypes with F values greater than the high yield parents ROC22 (35.47), corresponding to 17.8% of the F1 population, with 13 genotypes in Group I and 11 in Group II (Table 3). Cluster tree of the sugarcane F1 population based on the 11 phenotypic traits. The cluster tree was generated using the standardized traits data through the Ward method. The heatmap of traits was built using the standardized data. TR, ETR, SN, SD, IL, SH, B, SW, SC, Y, and SY are the abbreviations of tillering ratio, effective tillering ratio, stalk number, stalk diameter, internode length, stalk height, Brix, stalk weight, sucrose content, yield, and sucrose yield, respectively. The 135 F1 genotypes were divided into 3 groups (Group I-Group III), and the groups are indicated by green, blue, and red lines, respectively.

Principal Component Analysis (PCA)
PCA is a dimensionality method used to reduce the complexity of large data sets and increase interpretability while minimizing information loss [25]. In this study, the eigenvalues of the first four principal components (PC) were greater than 1, and the cumulative proportion of the first four PCs was 80.93%, indicating that they are representative of most of the 11 traits' information ( Table 2). PC1 had the largest variance proportion, with 30.11%. The trait vectors with the higher values were stalk number (SN), Brix (B), sucrose content (SC), and sucrose yield (SY), indicating that the PC1 was mainly reflecting information about these traits. The PC2 had the second largest variance proportion, with 26.19%. The trait vectors with the higher values were stalk diameter (SD) and stalk weight (SW), indicating that this PC was mainly affected by stalk diameter and weight. The variance proportion of PC3 was 12.97%, and the higher vectors were Brix (B) and sucrose content (SC). Thus, PC3 was mainly affected by traits related to sucrose content. The variance proportion of PC4 was 11.66%, and the higher values vectors were longer internode (IL) and stalk height (SH). The PCA plot of the F1 population, drawn using PC1 and PC2, also showed a clear group separation consistent with the cluster tree ( Figure 3). The comprehensive evaluation showed that the PC values (F) of all samples were distributed between 12.29 and 46.78, with a median of 29.62. There were 24 F1 genotypes with F values greater than the high yield parents ROC22 (35.47), corresponding to 17.8% of the F1 population, with 13 genotypes in Group I and 11 in Group II (Table 3). Cluster tree of the sugarcane F 1 population based on the 11 phenotypic traits. The cluster tree was generated using the standardized traits data through the Ward method. The heatmap of traits was built using the standardized data. TR, ETR, SN, SD, IL, SH, B, SW, SC, Y, and SY are the abbreviations of tillering ratio, effective tillering ratio, stalk number, stalk diameter, internode length, stalk height, Brix, stalk weight, sucrose content, yield, and sucrose yield, respectively. The 135 F 1 genotypes were divided into 3 groups (Group I-Group III), and the groups are indicated by green, blue, and red lines, respectively.

Rapid Sucrose Yield Estimation Model Construction and Verification
To establish the mathematical relationship of sucrose yield with its related traits, the sucrose yield (SY) was used as the dependent variable, and its significant correlated traits (p < 0.001), stalk height (SH), stalk diameter (SD), stalk number (SN), internode length (IL), stalk weight (SW), Brix (B), stalk weight (SW), and sucrose content (SC), were used as independent variables. Linear regression analysis indicated that sucrose yield (SY) was related to stalk number (SN) (p < 0.001), Brix (B) (or sucrose content (SC), p < 0.001), and stalk weight (SW) (p < 0.05) (Table S1). Subsequently, multivariable linear stepwise regression was performed, which indicated that sucrose yield (SY) was significantly correlated with the three abovementioned traits according to the following equation = 1.97 + 0.80 + 6.38 − 34.03, with an adjusted R 2 at 0.9328 (Table S2). Considering the convenience of traits directly measured in the field, the equation was rebuilt without stalk weight (SW). Linear regression analysis indicated that stalk height (SH), stalk diameter (SD), stalk number (SN), and Brix (B) were significantly related to sucrose yield (SY) (p < 0.001) (

Rapid Sucrose Yield Estimation Model Construction and Verification
To establish the mathematical relationship of sucrose yield with its related traits, the sucrose yield (SY) was used as the dependent variable, and its significant correlated traits (p < 0.001), stalk height (SH), stalk diameter (SD), stalk number (SN), internode length (IL), stalk weight (SW), Brix (B), stalk weight (SW), and sucrose content (SC), were used as independent variables. Linear regression analysis indicated that sucrose yield (SY) was related to stalk number (SN) (p < 0.001), Brix (B) (or sucrose content (SC), p < 0.001), and stalk weight (SW) (p < 0.05) (Table S1). Subsequently, multivariable linear stepwise regression was performed, which indicated that sucrose yield (SY) was significantly correlated with the three abovementioned traits according to the following equation y SY = 1.97x SN + 0.80x B + 6.38x SW − 34.03, with an adjusted R 2 at 0.9328 (Table S2). Considering the convenience of traits directly measured in the field, the equation was rebuilt without stalk weight (SW). Linear regression analysis indicated that stalk height (SH), stalk diameter (SD), stalk number (SN), and Brix (B) were significantly related to sucrose yield (SY) (p < 0.001) ( Table 4), and multivariable linear stepwise regression had the same result with an adjusted R 2 of 0.9301 (Table 5). The sucrose yield (SY) regression equation was y SY = 2.01x SN + 8.32x SD + 0.79x B + 3.44x SH − 47.64.  To validate the equation in genotypes with different genetic backgrounds, the phenotype data from 26 sugarcane cultivars were tested by comparing the estimated and yield exhibited an extremely significant correlation with the measured sucrose yield. The correlation coefficients of the 14 cultivars were 0.9440, 0.9746, and 0.9594 from 2018 to 2020, respectively. Moreover, the correlation coefficients of the 12 cultivars were 0.9806, 0.8772, and 0.8454, respectively. The one-way ANOVA analysis indicated that the estimated and measured sucrose yield variance were not significantly different (p < 0.05), and their mean values were not significantly different (p < 0.05) (Figure 4, Tables S3 and S4). In addition, the measured mean sucrose yield (SY) of the 26 cultivars in 3 years was compared with the estimated mean sucrose yield (SY). The mean yields were consistent except for the cultivar Dezhe 12-88 ( Figure 5). The mean yields of Dezhe 12-88 were significantly different (p < 0.05), which may be due to a lot of outliers being recorded for this cultivar that affected the prediction. Further, the sucrose yield (Y) of the 24 F 1 genotypes (with greater F values than the parents 'ROC22') were estimated using the seedling data, and the estimated sucrose yield distribution trends showed a good consistency with the measured mean yield of 2020 and 2021, with correlation coefficients of 0.9783 ( Figure 6). All the results above indicated that the equation could be used as a rapid sucrose yield estimation and selection model and may provide valuable reference during the sugarcane breeding. To validate the equation in genotypes with different genetic backgrounds, the notype data from 26 sugarcane cultivars were tested by comparing the estimated measured sucrose yield. The equation was used to estimate the sucrose yield of th cultivars in 2018-2020 and 12 cultivars in 2019-2021 using the measured data of height (SH), stalk diameter (SD), stalk number (SN), and Brix (B). The estimated su yield exhibited an extremely significant correlation with the measured sucrose yield correlation coefficients of the 14 cultivars were 0.9440, 0.9746, and 0.9594 from 20 2020, respectively. Moreover, the correlation coefficients of the 12 cultivars were 0. 0.8772, and 0.8454, respectively. The one-way ANOVA analysis indicated that the mated and measured sucrose yield variance were not significantly different (p < 0.05) their mean values were not significantly different (p < 0.05) (Figure 4, Tables S3 and In addition, the measured mean sucrose yield (SY) of the 26 cultivars in 3 years was pared with the estimated mean sucrose yield (SY). The mean yields were consistent ex for the cultivar Dezhe 12-88 ( Figure 5). The mean yields of Dezhe 12-88 were signific different (p < 0.05), which may be due to a lot of outliers being recorded for this cul that affected the prediction. Further, the sucrose yield (Y) of the 24 F1 genotypes ( greater F values than the parents 'ROC22') were estimated using the seedling data the estimated sucrose yield distribution trends showed a good consistency with the m ured mean yield of 2020 and 2021, with correlation coefficients of 0.9783 ( Figure 6). A results above indicated that the equation could be used as a rapid sucrose yield estim and selection model and may provide valuable reference during the sugarcane breed   . "ab" represent significant differences at p < 0.05, and "aa" represent no significant differences at p < 0.05. Figure 5. Comparison of the measured and estimated mean sucrose yield of 26 sugarcane cultivars in three crop seasons, including the plant cane, the first ratoon cane, and the second ratoon cane.
Mean-E and Mean-M indicate the estimated and measured mean sucrose yield. The sucrose yield of 14 cultivars were measured from 2018 to 2020 in (A), and the sucrose yield of the other 12 cultivars were measured from 2019 to 2021 in (B). "ab" represent significant differences at p < 0.05, and "aa" represent no significant differences at p < 0.05.

Discussion
The combination of the genotype and environmental interactions determines the phenotype. The phenotype is an important manifestation of genetic variation and can directly indicate functional gene diversity [22,26]. In sugarcane, many molecular markers based on genetic diversity have been implemented, such as SSRs [27], RAPDs [28], and SNPs [29], and facilitate the determination of the genetic variability degree occurring among conventional progenitor species, commercial cultivars, and exotic germplasm [30,31]. However, no effective linkage relationship has been established between agronomic traits and molecular markers of sugarcane due to the complexity of the sugarcane

Discussion
The combination of the genotype and environmental interactions determines the phenotype. The phenotype is an important manifestation of genetic variation and can directly indicate functional gene diversity [22,26]. In sugarcane, many molecular markers based on genetic diversity have been implemented, such as SSRs [27], RAPDs [28], and SNPs [29], and facilitate the determination of the genetic variability degree occurring among conventional progenitor species, commercial cultivars, and exotic germplasm [30,31]. However, no effective linkage relationship has been established between agronomic traits and molecular markers of sugarcane due to the complexity of the sugarcane genome and the laborious phenotypic measurements at the population level [5, 16,31,32]. Thus, assessing the phenotypic and genetic variation among sugarcane germplasm collections or hybrids is still indispensable in providing important information for sugarcane breeding.
In this study, 11 traits in 135 F 1 hybrids were evaluated and statistically analyzed. The Shannon-Wiener index of the traits ranged from 2.64 to 2.98, with an average value of 2.89, indicating that the trait diversity was high after the cross of 'Yacheng 9446' and 'ROC22' and may generate phenotypically overdominant to the parents' progeny. The coefficient of variation could reflect a trait's degree of dispersion to a certain extent [33,34]. Usually, when the variation coefficient is greater than 10%, the trait is considered to be diversified among the population [22]. Ten of the eleven traits, except Brix (B), were diversified with a coefficient of variation greater than 10%. Notably, the coefficient of variation of four traits (tillering ratio (TR), effective tillering ratio (ETR), yield (Y), and sucrose yield (SY)) was greater than 30%. Thus, abundant diversity was observed in the traits among the F 1 population, implying that the F 1 population parental lines had a good agronomic trait breeding potential. The broad-sense heritability of the 11 traits ranged from 0.75 to 0.84, indicating that they were controlled mainly by genetic factors, consistent with previous research using F 1 populations and a core sugarcane diversity panel of WCSRG [5, 14,15,18]. For example, the broad-sense heritability of stalk number (SN), stalk diameter (SD), and internode length (IL) were 0.88, 0.91, and 0.68 in the core sugarcane diversity panel of WCSRG [5]. Moreover, in 'R570' self-crossing F 1 population, the broad-sense heritability of stalk number (SN), stalk diameter (SD), stalk height (SH) and Brix (B) were 0.81, 0.91, 0.83, and 0.81, respectively [14]. However, the heritability of Brix (B), stalk height (SH), stalk number (SN), and yield (Y) in the Brazilian Panel of Sugarcane Genotypes (a mini core collection) was much lower, with 0.57, 0.57, 0.66, and 0.49 [12], which may be affected by the genetic difference as well as field identification.
Phenotype clustering can directly reflect the genetic background and be effectively used for a preliminary genetic resource assessment [35]. The cluster analysis divided the 135 F 1 hybrids into three groups: (1) higher stalk (SH), wider diameter (SD), longer internode (IL), higher stalk weight (SW), yield (Y), and sucrose yield (SY); (2) higher effective tillering ratio (ETR), greater stalk number (SN), and higher sugar (B and SC); (3) higher tillering ratio (TR). This analysis can aid the selection of elite individual F 1 hybrids and intermediate materials to support the sugarcane breeding programs. The principal component analysis is a comprehensive evaluation method used in the phenotypic evaluation and classifications of rice [36], cotton [37], peanut [38], et al. In this study, the cumulative variance explained by the first four principal components was 80.93%, covering a high proportion of the phenotypic variance. The PCA plot was consistent with the cluster tree dividing the F 1 population into three groups similarly. By calculating the comprehensive PC values (F) of the F 1 hybrids and their high-yield parent 'ROC22', 24 hybrids with greater F values were selected to proceed to the next breeding stage or be used as intermediate materials.
The sugarcane genome complexity and the quantitative nature of sugar-and yieldrelated traits are challenging for higher breeding-mediated gains for this crop [12,18]. Correlation analysis showed that stalk number (SN), stalk diameter (SD), internode length (IL), stalk height (SH), and stalk weight (SW) were significantly correlated with yield. These traits, along with Brix (B), sucrose content (SC), and yield (Y), were also significantly correlated with sucrose yield (SY). The relationship between yield (Y) and stalk number (SN), stalk height (SH), and stalk weight (SW) has also been reported by Barreto et al. [5] and Yang et al. [14]. In addition, evidence from genetics also showed that both yield and sucrose were governed by many quantitative trait loci (QTL) or genomic regions [14,39]. These studies also revealed the complexity of high-yield sugarcane breeding and high-sucrose yield breeding.
To accelerate crop genetic improvement and yield prediction, many efforts have been made. In the main food crop (rice, maize, wheat), genomic selection and phenomic slection have been widely used in breeding [40,41], which need high-precision genome, highthroughput phenome acquisition methods, and high-throughput data analysis methods. For sugarcane, cultivated sugarcane is still lacking a complete genome map and phenotypic analysis model for single breeding line [39]. Therefore, it is important to establish a model between sucrose yield and sucrose yield-related traits for promising single hybrid genotype assessment at the seedling stage. Using the multivariate linear stepwise regression analysis, Liu et al. constructed a yield prediction model for sugarcane in Yunnan province based on plant conditions and meteorological conditions with the accuracy rates of 81.1%, 89.3%, 67.8%, 85.3%, and 73.7%, respectively [42]. Wang et al. constructed a yield model using the grain weight of single sunflower plate, hundred grain weight, seed setting rate, plant height, grain width, and the disc diameter as the main variables, and the model could explain 86.9% of the yield. These studies revealed that multivariate linear stepwise regression is an effective method to construct models [43]. In this study, the significantly correlated traits of sucrose yield were selected to construct the regression equation, resulting in a high adjusted R 2 of 0.9301. The Brix (B) was selected in the model rather than the sucrose content (SC) due to the convenience of phenotyping. Then, the equation was evaluated by comparing the estimated and measured sucrose yield of 26 new sugarcane cultivars with different genetic background and suitable for different ecological regions in China and sucrose yield distribution trends of 24 F 1 genotypes. The model could estimate the sugar yield using the stalk height (SH), stalk diameter (SD), stalk number (SN), and Brix (B) values, and could be used at the seedling stage and could be further verified in other stages of sugarcane breeding. In brief, the model can provide a valuable reference for reducing sugar yield evaluation difficulty, aiding for the selection of elite F 1 lines and reducing time and cost constraints during sugarcane breeding.

Plant Material
The F 1 population was derived from two elite sugarcane resources, Yacheng 9446 and Roc22, which were planted and crossbred at the Hainan sugarcane breeding station (located at Yacheng, Sanya, Hainan, China) in the year 2018. In 2019, the F 1 seeds were grown into seedlings and planted with the row space of 1.1 m and plant space of 0.33 m in the experimental field of the South subtropical crops research institute (Located at 110 • 28 54 W and 21 • 16 27 N, Mazhang, Zhanjiang, Guangdong, China). The heterozygosity in hybrid F 1 plants was confirmed using three differentially amplified SSR markers in the parents we developed previously (Table S5) [24]. DNA of the F 1 plants were extracted from the young leaves using CTAB method [27]. PCR reactions were conducted following the procedures of Parthiban et al. [27]. The hybrids were identified according to the male parent specific amplified band; once each of the primers could detect the male parent specific band from a F 1 plant, the plant was a real hybrid. Moreover, if all the three markers could not detect the male parent specific band of a F 1 plant, the plant was judged to be a false hybrid. Then, the hybrid F 1 population was planted in the field using two-bud stalk segments and used for phenotyping in 2020 and 2021, respectively. The 26 sugarcane varieties were used as verified materials and their information of all was listed in Table S6.

Field Experiment Design
The experimental design was a randomized complete-block design with three replicates. Each F 1 hybrid was grown in three rows with a row length of 5.0 m and row space of 1.1 m in each replicate. Twenty-five two-bud stalk segments were planted per row in 2020 and 2021. Sugarcane cultivation management followed that practiced in conventional commercial production in the region, as described by Tang et al. [44]. The 26 sugarcane varieties were planted in the experimental field, adopting the same field experiment design, of which 14 cultivars were planted from the year 2018 to 2020, with the first year newly planted and the two following years stubble sugarcane, and the other twelve were planted from the year 2019 to 2021. All the sugarcane materials were planted in January and harvested in December.

Phenotypic Traits Identification
Eleven traits, including tillering ratio (TR), effective tillering ratio (ETR), stalk number (SN), stalk diameter (SD), internode length (IL), stalk height (SH), Brix (B), stalk weight (SW), sucrose content (SC), sugarcane yield (Y), and sucrose yield (SY) of the F 1 population, were measured according to the instructions for sugarcane germplasm resources by Cai et al. [45]. In brief, the tillering ratio (TR) was the ratio of tillering number and main bud seedling number, and effective tillering ratio (ETR) was the ratio of tillering stalk number and tillering number. Main bud seedling number, tillering number, and tillering stalk number of each row were counted per row during sugarcane seedling stage, tillering stage, and harvest stage, respectively. Main bud seedling number and tillering number were counted 10 days a cycle until there were no new gains. Stalk number (SN), stalk diameter (SD), internode length (IL), stalk height (SH), stalk weight (SW), and Brix (B) were measured from 10 random stalks per row 1 week before harvest in December of 2020 and 2021. Stalk diameter (SD) and internode length (IL) were measured at half-height of the stalk using a vernier caliper and meter ruler, respectively. Stalk height was measured using a flexible rule from bottom to the top. Stalk weight (SW) was the mean of 10 stalks weighed per row using an electronic platform scale. Brix (B) was measured using a hand refractometer on the juice taken at half-height of the stalk. Sucrose content (SC) was measured using mixed sample per row determined by HPLC, according to Chen et al. [46]. Yield (Y) was calculated by the stalk number (SN) per unit multiplied by stalk weight (SW). Sucrose yield (SY) was calculated by the Yield (Y) multiplied by the sucrose content (SC). The stalk number (SN), stalk diameter (SD), stalk height (SH), Brix (B), sucrose content (SC), and sucrose yield (SY) of the 26 sugarcane varieties were measured using the same method above.

Phenotypic Diversity and Statistical Analysis
Data statistical analyses. The trait data obtained were sorted using Microsoft Office Excel. The descriptive statistics of each trait, including the overall average (x), standard deviation (σ), range, variable coefficient, skewness, and kurtosis were calculated using the software Origin 2019b. Data distribution and correlation analysis were analyzed and visualized with the R package ggpairs [47]. The broad-sense heritability was calculated according to Hallauer and Miranda [48] as follows: where σ 2 g represents the genetic variance, σ 2 ge represents the interaction variance of the genotype with the environment, σ 2 e represents the variance of residual error, n represents the number of environments, and r represents the number of replications.
Data standardization. The traits data were standardized using the fuzzy membership function method with the following calculation formula: where X (ij) represents the membership function value of F 1 genotype i in trait j, X ij is the obtained value of the F 1 genotype i in trait j, and X jmin and X jmax represent the minimum and maximum values of the F 1 population in trait j. Phenotypic diversity. The degree of phenotypic diversity was expressed by the Shannon-Wiener index [49], with the following calculation formula: where H represents the diversity index, n represents the total number of classes, and Pi is the effective percentage of the material distribution frequency in the i-th class of the trait. The classes were set from the first level < −2σ to the tenth level ≥ 2σ, and every 0.5σ corresponded to one level.
Cluster analysis. The Ward method was used to generate the cluster tree of 11 traits among 135 F 1 genotypes, according to the method of [24].
Principal component analysis. Principal component analysis (PCA) was carried out using the prcomp function. The 11 traits of the F1 population and the parental line ROC22 were normalized to form a correlation matrix. The correlation matrix was used to determine the eigenvalues and relative contribution rates, and the factor values of the principal components (PC) of each genotype were calculated. All the F 1 genotypes were comprehensively evaluated according to the eigenvector matrix and phenotypic data. The formulas were as follows: 14i SY ; F 4 = −0.23i TR − 0.02i ETR + 0.19i SN + 0.24i SD − 0.64i IL − 0.58i SH + 0.12i B − 0.07i SW +0.12i SC + 0.17i Y + 0.19i SY ; F = 0.37F 1 + 0.32F 2 + 0.16F 3 + 0.14F 4 ; where i represents the F 1 genotype, and TR, ETR, SN, SD, IL, SH, B, SW, SC, Y, and SY are the trait abbreviations. The 2D-PCA plot was drawn using the R package ggplot [50].

Rapid Sugar Yield Estimation Model Construction and Validation
Sugar yield-related traits were selected based on the correlations between the 11 traits. Traits highly significantly correlated with yield or sugar yield (p < 0.001) were used as independent variables, and sugar yield was used as the dependent variable. Multivariate linear stepwise regression analysis was performed using the R package MASS to mine the sugar yield composition traits. The sugar yield selection equation was established using the easily quantifiable traits in the field. The sugar yield selection equation was validated using phenotypic data of 26 sugarcane varieties and 24 F 1 genotypes with greater sucrose yield potential than the high-yield parent 'ROC22'.

Conclusions
The sugarcane hybrid F 1 population, derived from 'ROC22' and 'Yacheng 9446', exhibited distinct variation and abundant phenotypic diversity. The trait broad-sense heritability and frequency distribution revealed that the 11 traits were controlled by genetic factors and multiple genes. Correlation analysis revealed eight traits significantly associated with the sucrose yield (SY), suggesting the genetic complexity underlying sugarcane high sucrose yield production. Based on the main component values of all samples, 24 F 1 genotypes had greater values than the high-yield parent 'ROC22' and were selected to proceed to the next breeding stage. The rapid sucrose yield estimation equation, y SY = 2.01x SN + 8.32x SD + 0.79x B + 3.44x SH − 47.64, was established and validated and could be implemented for sucrose yield assessment in sugarcane breeding.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/plants12030647/s1, Table S1: Linear-regression analysis between sucrose yield and seven related traits; Table S2: Multivariate linear stepwise regression between sucrose yield and three related traits; Table S3: one-way ANOVA analysis of the estimated and measured sucrose yield of the 14 cultivars from 2018 to 2020; Table S4: one-way ANOVA analysis of the estimated and measured sucrose yield of the 12 cultivars from 2019 to 2021; Table S5: differential amplification SSR marker information; Table S6 Information of cultivars used as verified materials.