Comparative Analysis of two Sugarcane Ancestors Saccharum officinarum and S. spontaneum based on Complete Chloroplast Genome Sequences and Photosynthetic Ability in Cold Stress

Polyploid Saccharum with complex genomes hindered the progress of sugarcane improvement, while their chloroplast genomes are much smaller and simpler. Chloroplast (cp), the vital organelle, is the site of plant photosynthesis, which also evolves other functions, such as tolerance to environmental stresses. In this study, the cp genome of two sugarcane ancestors Saccharum officinarum and S. spontaneum were sequenced, and genome comparative analysis between these two species was carried out, together with the photosynthetic ability. The length is 141,187 bp for S. officinarum and that is 7 bp longer than S. spontaneum, with the same GC content (38.44%) and annotated gene number (134), 13 with introns among them. There is a typical tetrad structure, including LSC, SSC, IRb and IRa. Of them, LSC and IRa/IRb are 18 bp longer and 6 bp shorter than those in S. spontaneum (83,047 bp and 22,795 bp), respectively, while the size of SSC is same (12,544 bp). Five genes exhibit contraction and expansion at the IR junctions, but only one gene ndhF with 29 bp expansion at the border of IRb/SSC. Nucleotide diversity (Pi) based on sliding window analysis showed that the single copy and noncoding regions were more divergent than IR- and coding regions, and the variant hotspots trnG-trnM, psbM-petN, trnR-rps14, ndhC-trnV and petA-psbJ in the LSC and trnL-ccsA in the SSC regions were detected, and petA-psbJ with the highest divergent value of 0.01500. Genetic distances of 65 protein genes vary from 0.00000 to 0.00288 between two species, and the selective pressure on them indicated that only petB was subjected to positive selection, while more genes including rpoC2, rps3, ccsA, ndhA, ndhA, psbI, atpH and psaC were subjected to purifying or very strong purifying selection. There are larger number of codons in S. spontaneum than that in S. officinarum, while both species have obvious codon preference and the codons with highest-(AUG) and lowest frequency (AUA) are same. Whilst, the most abundant amino acid is leucine in both S. officinarum and S. spontaneum, with number of 2175 (10.88% of total) and 2228 (10.90% of total) codons, respectively, and the lowest number is cysteine, with only 221 (1.105%) and 224 (1.096%), respectively. Protein collinearity analysis showed the high collinearity though several divergences were present in cp genomes, and identification of simple sequence repeats (SSRs) were included in this study. In addition, in order to compare cold tolerance and explore the expanding function of this environmental stress, the chlorophyll relative content (SPAD) and chlorophyll fluorescence Fv/Fm were measured. The significantly higher SPAD were observed in S. spontaneum than those in S. officinarum, no matter what the control conditions, exposure to low temperature or during recovery, and so was for Fv/Fm under exposure to low temperature, together with higher level of SPAD in S. spontaneum in each measurement. Aforementioned results suggest much stronger photosynthetic ability and cold tolerance in S. spontaneum. Our findings build a foundation to investigate the biological mechanism of two sugarcane ancestor chloroplasts and retrieve reliable molecular resources for phylogenetic and evolutionary studies, and will be conducive to genetic improvement of photosynthetic ability and cold resistance in modern sugarcane.

produced in the scaffold extension step during de novo assembly, were corrected manually. The errors relevant to heterogeneous Indels (i.e., insertions/deletions) caused from homopolymeric repeats in the genome were also corrected. To test the quality of assembly of the cp genomes of S. officinarum and S. spontaneum, three steps including: (1) using assembled genome as reference, statistical genome coverage, insert fragment size, etc.; (2) genome alignment with the reference sequence (NCBI accession: LS975131.1) to check conservation and rearrangement; (3) alignment with the reference for structure information. The final sequences of complete cp genomes for S. officinarum and S. spontaneum will be deposited in GenBank, and their corresponding accession numbers are MN204507 and MN204508, respectively, and they will be released on January 26, 2020.

Chloroplast Genomic Structure and Gene Diversification
The size of the cp genome for S. officinarum is 141,187 bp, and that is 141,181 bp for S. spontaneum, while the GC content (38.44%) is completely the same. The coverage map for assembly chloroplast sequence genomes were showed in Figure 1, which can help to distinguish the coding genes (CDS), tRNA, rRNA, the coverage depth and GC content of the genome. The gene maps of the complete circle chloroplast genomes of S. officinarum (A) and S. spontaneum (B) are shown in Figure 2. In the cp genome of S. officinarum, there is a typical tetrad structure, including the sequences of paired IRA and IRB, which encode in opposite directions, and so is in the species S. spontaneum. In addition, there are large and small single copy regions in both cp genomes. The length of the large single copy (LSC) region is 83,065 bp with base sequence position from 1 to 83,065, the small single copy (SSC) region is 12,544 bp (from position 105,855 to 118,398) and both the two inverted repeat (IR) regions are 22,789 bp in S. officinarum, of which IRa from position 118,399 to 141,187, and IRb from position 83,066 to 105,854. Among these regions, LSC and IR are 18 bp larger and 6 bp shorter than those in S. spontaneum (83,047 bp and 22,795 bp), respectively. Gene annotation showed that there were 134 annotated genes in cp genome of S. officinarum and S. spontaneum, which is one less than that in cp genome of Saccharum hybrid cultivar RB867515 (135) [29]. The gene numbers for all of the mRNA/protein coding genes (CDS) (88), tRNA (38) and rRNA (8) are same in two different species, with 40 duplicated genes in the IR region, while both lacking one tRNA gene compared to the modern cultivar RB867515 (39) [29]. In addition, the unique gene numbers of CDS, tRNA and rRNA are same in both species. Among the CDS genes, there are 12 genes have introns in both species, of which only one gene ycf3 with two introns, the other 11 genes including atpF, ndhA, ndhB, rpl2, rps16, trnL-UUU, trnS-CGA, trnL-UAA, trnA-UGC, trnV-UAC, trnT-CGU with one intron. Additionally, there are two introns in the gene rps12 but with a trans-splicing, and thus not counted in our statistic in this study, due to the start sites of both introns are located in the IR region, in spite of sometimes the aforementioned introns indeed having been counted, such as in the study of Mikania [7]. Exons existed in all 134 genes, of which most (116) have only one-, 14 genes have two-, and only three genes (ycf3 and two rps12) have three exons. In the IR region, there are 20 genes with two or more copies: ndhB, rpl2, rpl23, rps12, rps15, rps19, rps7, rrn16, rrn23, rrn4.5, rrn5, tRNA-ACG, tRNA-CAA, tRNA-CAU, tRNA-CAU, tRNA-CGU, tRNA-GAC, tRNA-GUG, tRNA-GUU, tRNA-UGC, ycf1 and ycf2, of which the gene rpl2 with three copies and tRNA-CAU with four copies. These characteristics and the number of exons and introns are same in two different species while the positions in the cp genome are divergent for most of exons and introns, in spite of the lengths are same for vast majority, and only two introns with different lengths. Of which, two introns with different length were located in the gene ycf3: 780 bp and 736 bp in size for S. officinarum, while 754 bp and 737 bp in size for S. spontaneum, respectively. Additionally, the intron in tRNA-UUU of S. officinarum is 2475 bp in size, one base-pairs less compared to that in S. spontaneum. Both species have four different types of hypothetical or putative chloroplast genes, including five genes (two ycf1, two ycf2 and one ycf3 gene) existed in the IR region and one gene psbJ existed in the other region. Besides, there is one undefined function gene rpl2 in two species. Details were presented in supplementary Table S1 (Saccharum_officinarum.exon.intron.stat) and Table S2 (Saccharum_spontaneum.exon.intron.stat). The coding protein genes (88) can be divided into four categories: (1) self-replicating genes; (2) photosynthesis genes including light systems I and II, adenosine-triphosphate (ATP) synthase, a cytochrome b6/f protein complex and other biosynthesis genes, such as cytochrome related genes; (3) other genes, including those genes related to biosynthesis; (4) Unknown function protein-coding gene. Details of the gene contents, including gene family, their functions and gene names, in the cp genomes of two sugarcane ancestors were presented in Table 1. other genes, including those genes related to biosynthesis; (4) Unknown function protein-coding gene. Details of the gene contents, including gene family, their functions and gene names, in the cp genomes of two sugarcane ancestors were presented in Table 1.  Orange box: rRNA; Green ring: the coverage depth, of which reverse repeat area is generally two times than that of other areas; Inner circle: GC content of the genome, of which green line representing greater than 50%, and blue for smaller than 50%. the coding genes; Purple box: tRNA; Orange box: rRNA; Green ring: the coverage depth, of which reverse repeat area is generally two times than that of other areas; Inner circle: GC content of the genome, of which green line representing greater than 50%, and blue for smaller than 50%.     Notes: * Two gene copies in IRs; # Trans-splicing gene; 1 Gene containing a single intron; 2 Gene containing two introns.

The Collinear Analysis
The collinear analysis by Mauve software revealed the highly conserved structures of two sugarcane ancestors, thus the high collinearity was observed between S. officinarum var. Badila and S. spontaneum var. Yunnan 83-184, no matter which types of genes. However, there still are some sites containing insertions and deletions in their cp genomes ( Figure 3). For example, one CDS gene near the site of 20,000, there appeared deletions in size in S. spontaneum, while two genes between the sites of 50,000 and 55,000 appeared insertions in S. spontaneum. It is interesting to note that two CDS genes between the sites of 65,000 and 70,000 on the positive chain appeared to be contrary in these two species when referring to the size, which is one deleted while the other inserted.
S. spontaneum var. Yunnan 83-184, no matter which types of genes. However, there still are some sites containing insertions and deletions in their cp genomes ( Figure 3). For example, one CDS gene near the site of 20,000, there appeared deletions in size in S. spontaneum, while two genes between the sites of 50,000 and 55,000 appeared insertions in S. spontaneum. It is interesting to note that two CDS genes between the sites of 65,000 and 70,000 on the positive chain appeared to be contrary in these two species when referring to the size, which is one deleted while the other inserted.

Analysis of Highly Variable Regions and Base Substitutions in CDS Genes
In spite of extreme similarities in the structure and organization in cp genomes of two sugarcane ancestors S. officinarum and S. spontaneum, the divergences may also exist in the noncoding regions, especially in the regions of intergenic sequence (IGS). Thus, we further investigated the level of divergence by analysis of nucleotide variability (Pi). In the aligned cp genomes of two different species, Pi obtained by sliding window analysis indicated the locations of the variation occurring. The values of Pi range from 0.000 to 0.01500, and those with the percentage of variation higher than 0.00600 (Pi > 0.006) were marked out from a total of 129 divergent genes ( Figure 4). According to this Pi value, six of the most variable sites were detected between the two species, of which five in the LSC region, namely trnG-trnM, psbM-petN, trnR-rps14, ndhC-trnV and petA-psbJ, together with one site named trnL-ccsA in the SSC region, while non-site meets the standard in the IR region, indicating the very conservative IRs in both genomes. Additionally, the percentages of variation among the divergent genes are 0.00167, and the gene of petA-psbJ in the LSC region was found to have the highest divergent value of 0.01500.

Analysis of Highly Variable Regions and Base Substitutions in CDS Genes
In spite of extreme similarities in the structure and organization in cp genomes of two sugarcane ancestors S. officinarum and S. spontaneum, the divergences may also exist in the noncoding regions, especially in the regions of intergenic sequence (IGS). Thus, we further investigated the level of divergence by analysis of nucleotide variability (Pi). In the aligned cp genomes of two different species, Pi obtained by sliding window analysis indicated the locations of the variation occurring. The values of Pi range from 0.000 to 0.01500, and those with the percentage of variation higher than 0.00600 (Pi > 0.006) were marked out from a total of 129 divergent genes ( Figure 4). According to this Pi value, six of the most variable sites were detected between the two species, of which five in the LSC region, namely trnG-trnM, psbM-petN, trnR-rps14, ndhC-trnV and petA-psbJ, together with one site named trnL-ccsA in the SSC region, while non-site meets the standard in the IR region, indicating the very conservative IRs in both genomes. Additionally, the percentages of variation among the divergent genes are 0.00167, and the gene of petA-psbJ in the LSC region was found to have the highest divergent value of 0.01500. A total of 65 protein genes shared between two sugarcane ancestors were used to estimate average genetic distances. Their distances vary from 0.00000 to 0.00440, with the average value of 0.00058 ( Figure 5). Of these, there are twenty-three genes with greater divergence than average genetic distance (0.00058) in the overall. Of which, petD located in the LSC presents the highest level of divergence (0.00440), followed by petB (0.00288) which also located in the LSC, and the top third is ccsA (0.00225) located in the SSC. Among them, a total of 32 genes showed to be zero referring to genetic distance: 21 in the LSC, each five in the IRa and IRb, and one (ndhE) in the SSC. To estimate the evolution pressure on the specific coding protein gene, we carried out an analysis on A total of 65 protein genes shared between two sugarcane ancestors were used to estimate average genetic distances. Their distances vary from 0.00000 to 0.00440, with the average value of 0.00058 ( Figure 5). Of these, there are twenty-three genes with greater divergence than average genetic distance (0.00058) in the overall. Of which, petD located in the LSC presents the highest level of divergence (0.00440), followed by petB (0.00288) which also located in the LSC, and the top third is ccsA (0.00225) located in the SSC. Among them, a total of 32 genes showed to be zero referring to genetic distance: 21 in the LSC, each five in the IRa and IRb, and one (ndhE) in the SSC. To estimate the evolution pressure on the specific coding protein gene, we carried out an analysis on nonsynonymous (Ka) and synonymous (Ks) substitution rates, of which synonymous mutation is termed as the mutation of nucleotide without resulting in the corresponding changes of the amino acid sequence, while contrary for the nonsynonymous mutations. In this study, among 65 shared protein genes, the Ka values vary from 0 to 0.0026, with a total average value of 0.0001, and the Ks values range from 0 to 0.00345, with a total average value of 0.00023. To investigate the selective pressure indicated by the value of Ka/Ks on these genes in these two species, the values of Ka/Ks were calculated ( Figure 5). According to the standards of criteria: neutral (Ka/Ks = l), positively selected (Ka/Ks > 1), and purifying selection (Ka/Ks < 1), the synonymous mutations are believed to be subjected to the natural selection, while contrary for the nonsynonymous mutations. Abide this rule and compared to S. officinarum, the genes of rpoC2 and rps3 located in the LSC region, together with the genes of ccsA, ndhA and ndhA located in the SSC region, were subject to very strong purifying selection in S. spontaneum due to all of their Ka/Ks values being zero, while only petB in the LSC region was subject to positive selection. The results indicate that the chloroplast genomes of two sugarcane ancestors have been affected by different environmental pressures during evolution, which may result in the differences in their cp genomes. Genes with Ka/Ks values equal to zero include psbI, atpH, and psaC, indicating that these genes are under very strong purifying selection.  In order to get the detailed information of base substitutions in CDS genes, single nucleotide polymorphism (SNP) locus identified in chloroplast genomes was carried out. An alignment of the cp genomes indicated that their sequences are highly conserved in two sugarcane ancestors, of which only 24 single nucleotide polymorphic sites identified in 16 genes among all 88 CDS genes in S. In order to get the detailed information of base substitutions in CDS genes, single nucleotide polymorphism (SNP) locus identified in chloroplast genomes was carried out. An alignment of the cp genomes indicated that their sequences are highly conserved in two sugarcane ancestors, of which only 24 single nucleotide polymorphic sites identified in 16 genes among all 88 CDS genes in S. spontaneum when compared to S. officinarum (Table 2). These genes are rpoC1, rpoC2, atpA, ycf3, atpB, psbE, rpoA, rpl14, rps3, ndhB, ndhF, ccsA, ndhD, ndhA, ndhH and ndhB, of which the gene ndhB has two copies. More than one locus presented in four genes (atpB, ndhB, ndhF and ccsA), of which ndhB have two copies, and five SNP sites exist in ccsA. Among the 24 SNP loci, most (16) appeared to be the nonsynonymous substitutions due to the mutation of a single base resulting in the change of amino acids, and only eight SNP loci belong to the synonymous substitutions due to the unchanged of amino acids when base mutation happening. The codons corresponding to the above SNP loci and amino acids in S. officinarum and S. spontaneum were listed in Table 2. They contain 10 transversions (Tvs) and 10 transitions (Ts) which including six Ts between A and G, four between C and T, and thus giving a ratio of 1:1 for Tv to Ts. Additionally, four codons contain both substitutions, i.e., transversions and transitions. There is a total of 19,994 codons, which represents the coding capacity of protein coding genes, in cp genome of S. officinarum, and 20,436 codons in S. spontaneum, indicating the stronger coding capacity in the wild species of S. spontaneum, while both coding the same number (21) of different types of amino acids. The amino acids, codons and their corresponding numbers, and relative synonymous codon usage (RSCU) were presented in Table 3. Among them, the most abundant amino acid is leucine, with number of 2175 (10.88% of total) and 2228 (10.90% of total) codons, followed by isoleucine, with number of 1639 and 1671 codons in S. officinarum and S. spontaneum, respectively. The lowest number is cysteine, with only 221 (1.105%) and 224 (1.096%) codons in S. officinarum and S. spontaneum, respectively. In general, the codon preference is similar in both species, while within species, there is different preference. The preferred codon is AUG encoding amino acid methionine (Met) with 2.961% RSCU in both species, followed by UUA encoding leucine with 1.992% and 1.966% RSCU in S. officinarum and S. spontaneum, respectively, and GCU, UAU, ACU, UCU, AGA, UUA, GAU, GGA CCU were listed at the third level. On the contrary, the lowest frequency codon is AUA with 0.013%, followed by GUG with 0.026% in both species. Interestingly, both AUA and GUG encode Met.

Analysis on IR Junctions Proline, Serine, Glutamine, Glycine, Alanine and/or Asparagine
The IR region is considered to be relatively conserved and exists four boundaries: IRa/LSC, IRa/SSC, IRb/LSC and IRb/SSC in plant cp genome, while border region contraction and expansion are found to be common and important in the process of evolution, which is the main reason of variation of angiosperm-plant cp genome length [43,44]. In current study, the IR boundaries in two sugarcane ancestor species were compared in detail, and presented in Figure 6. There are several genes showed to be contracted at the boundaries, such as: (1) the genes rpl22 and rps19 at the IRb/LSC border contracted 58 bp and 35 bp, but located in the LSC and IRb regions, respectively; (2) another rps19 with 36 bp contraction at the IRa/LSC border but located in the IRa region, and psbA located in LSC region with 90 bp contraction at the IRa/LSC border, together with the gene rps15 at the IRa/SSC border with 153 bp contraction. On the contrary, there also appeared the gene with the characteristic of expansion, such as the gene ndhF at the IRb/SSC border, which located in the SSC region but expanded 29 bp to the IRb region. In addition, the gene ndhH at IRa/SSC border without contraction or expansion. In a word, there are several genes with the characteristics of contraction and expansion, while without differences between S. officinarum and S. spontaneum. Besides, the ycf1 gene, which is traditionally regarded as the hypothetical gene, has been reported to be necessary for plant viability in the Arabidopsis recently [45]. In this study, two species S. officinarum and S. spontaneum are also observed presenting two ycf 1 genes in their cp genomes: one contraction at the IRb/SSC boundary and located in the IRb region, another at the IRa/SSC border but located in the IRa region.
or expansion. In a word, there are several genes with the characteristics of contraction and expansion, while without differences between S. officinarum and S. spontaneum. Besides, the ycf1 gene, which is traditionally regarded as the hypothetical gene, has been reported to be necessary for plant viability in the Arabidopsis recently [45]. In this study, two species S. officinarum and S. spontaneum are also observed presenting two ycf1 genes in their cp genomes: one contraction at the IRb/SSC boundary and located in the IRb region, another at the IRa/SSC border but located in the IRa region.

Repeat Structure Analysis
When referring to simple sequence repeats (SSRs), different performance parameters can result in different numbers of SSRs detected, strict parameters result in lower amount of SSRs. We performed three different definitions for SSR search, and the results were as follows. When following the stricter performance parameters (unit_size / min_repeats): 1/10 (mononucleotides ≥ 10 nt), 2/6 (dinucleotides ≥ 6 repeats), 3/5, 4/5, 5/5, and 6/5, only 30 SSRs were identified in the cp genome sequences of S. officinarum, and all are mononucleotides repeats with the longest repeats 14 except one sequence containing two SSR presenting in compound formation '(T)10ctctccta(T)10′ with 28 bp in size (Table 4). In S. spontaneum, the results showed to be some differences: one more SSRs (32) identified, one sequence containing two SSR, and two presenting in compound formation. These two compound formations are '(A)10ggaactatgattcatactcactatttagacctcgcaaccagactg(A)10′ with 65 bp in size, and '(T)10ctctccta(T)10′ with 28 bp in length (Table 4). In addition, the T-repeat unit was the most abundant in both species, with number of 20 for S. officinarum and 22 for S. spontaneum, respectively. Whilst, the highest frequency of classified repeat types was A/T (considering sequence complementary), with the number of 29 and 31 in S. officinarum and S. spontaneum, respectively, which is similar to the previous report [46], while both have only one G repeat.

Repeat Structure Analysis
When referring to simple sequence repeats (SSRs), different performance parameters can result in different numbers of SSRs detected, strict parameters result in lower amount of SSRs. We performed three different definitions for SSR search, and the results were as follows. When following the stricter performance parameters (unit_size / min_repeats): 1/10 (mononucleotides ≥ 10 nt), 2/6 (dinucleotides ≥ 6 repeats), 3/5, 4/5, 5/5, and 6/5, only 30 SSRs were identified in the cp genome sequences of S. officinarum, and all are mononucleotides repeats with the longest repeats 14 except one sequence containing two SSR presenting in compound formation '(T)10ctctccta(T)10 with 28 bp in size (Table 4). In S. spontaneum, the results showed to be some differences: one more SSRs (32) identified, one sequence containing two SSR, and two presenting in compound formation. These two compound formations are '(A)10ggaactatgattcatactcactatttagacctcgcaaccagactg(A)10 with 65 bp in size, and '(T)10ctctccta(T)10 with 28 bp in length (Table 4). In addition, the T-repeat unit was the most abundant in both species, with number of 20 for S. officinarum and 22 for S. spontaneum, respectively. Whilst, the highest frequency of classified repeat types was A/T (considering sequence complementary), with the number of 29 and 31 in S. officinarum and S. spontaneum, respectively, which is similar to the previous report [46], while both have only one G repeat.
When according to the parameters used in sugarcane by Melotto-Passarin et al. (2011) but with modifications, only 190 SSRs were identified, and the most abundant repeats were mononucleotide repeats (128), followed by tri-(47), tetra-(9), di-(5) and penta-(1) in S. spontaneum, with only one sequence containing more than one SSR (2) and three SSRs presenting in compound formations: (TATAA)3ttaat(ATA)3, (A)9t(A)8, and (T)10ctctccta(T)10 (supplementary Tables S3 and  S5). In S. officinarum, the situations have some differences in spite of being similar in sum: one less SSR (189), the most abundant repeat unit is mononucleotide (126), followed by tri-(47), one sequences containing more than one SSR (2) and three SSRs presenting in compound formation but only one showed to be divergent in size, i.e., (A)9t(A)11 with size 21 bp, though the position for most SSRs were different (supplementary Tables S4 and S5). Detailed information about SSR type, SSR sequence, size, start and end positions of the aforementioned SSRs in cp genome is presented in supplementary excel Table S5. Additionally, if a little less strict parameters were performed, i.e., 3rd set of parameters, there were a total of 477 SSRs from S. officinarum were identified, with only one sequences containing more than one SSR and four SSRs presenting in compound formation. While, in S. spontaneum, one less SSRs (476) are identified and five SSRs present in compound formation. According to the above definition, the most abundant repeats were penta-nucleotides, followed by hexa-nucleotides in both species.

Photosynthetic Ability Analysis
The chlorophyll relative content (SPAD) and chlorophyll fluorescence parameter Fv/Fm reflect the photosynthesis ability of species. The average SPAD was 29.25 for S. officinarum before cold stress, and 28.02 and 27.73 under cold stress for 3 days and 7 days, and 23.37 after removing from incubator to the field for 10 days recovery, while for S. spontaneum, the average values were 45.50, 41.42, 40.42 and 39.48, respectively. The significantly higher of SPAD values were found in S. spontaneum than those in S. officinarum, no matter the measurement is performed in control conditions or at the early (3 days) or late time (7 days) under cold stress, and so was after cold stress removing. A significant difference was also observed between two species at recovering stage after stress removing. In addition, unobvious differences were observed between the values measured in early (3 days) and late (7 days) time under cold stress and during recovery in S. spontaneum, though obvious differences were observed between the control conditions and the aforementioned conditions, which indicating its quick response to the environmental stress. However, the situation was different in S. officinarum: unobvious decrease of SPAD value between control conditions and cold stress, while significant decrease during recovery compared to that in the control conditions and in the cold stress, which indicating its slow response to environmental stress and resulting in impaired recovery. These results reflect that the wild species of S. spontaneum has stronger photosynthetic capacity than that of S. officinarum and is more tolerant to cold stress than that of S. officinarum. When referred to chlorophyll fluorescence parameters, the maximal photochemical efficiency in two sugarcane ancestors was estimated ( Table 5). The average value of Fv/Fm, i.e., the ratio of the real-time fluorescence vs. maximum fluorescence, was 0.364 ± 0.152 in S. spontaneum, which was significantly higher than that (0.194 ± 0.096) in S. officinarum after exposure to cold environments for 7 days. In addition, after relieving cold stress and cultured in the control conditions for 10 days, chlorophyll fluorescence could almost not be detected on the treated plants due to the most investigated leaves (5/6) appeared to be zero referring to the Fv/Fm value, which may be caused by excessive low temperature stress and resulted in impaired recovery even if the cold stress having been removed. The situation for this photosynthetic parameter value was similar in both investigated species. However, surprisingly, we found several new tillers appeared in S. spontaneum after for two additional weeks cultured in the control conditions, while no tillers appeared in S. officinarum ( Figure S2), suggesting again the stronger tolerance to cold stress and stronger growth compensation ability of S. spontaneum than that of S. officinarum.

Discussion
This study provides the new data obtained from two modern sugarcane ancestors S. officinarum (2n = 80, 8×) and S. spontaneum (2n = 80, 10×), and firstly presents the detailed comparison of the complete cp genomes between these two crucial species with the same chromosome number but a little different in ploidy, which may aid to reduce the divergences caused by the number of their chromosomes when referring to the investigation on the expanding function of cold tolerance, as we know that the number of chromosomes varies greatly in S. officinarum and S. spontaneum [24], in spite of the non-existence of chromosomes in chloroplasts. Additionally, S. spontaneum accession Yunnan 83-184 has been widely used in the program of basic hybridization to obtain innovative breeding materials and cross parents due to its strong cold tolerance, vigor and excellent resistance to sugarcane diseases, and Badila, in spite of widely cultivated in China as chewing cane, is one of the several limited original clones of S. officinarum. It has been successfully induced to flowering in recent ten years, and used widely in basic hybridization. In addition, this study firstly presents the data associated to the photosynthetic capacity, offers an opportunity to further understand the relations of genes and physiological characters. Besides, the detailed information of sequences of CDS genes can provide the basis for investigation of gene expression in chloroplast under environmental stresses, and SSRs and SNP loci can be used to develop the chloroplast markers to track the genetic background during their utilization in sugarcane breeding. Besides, the polymorphic sites may also be used as the basis for development of the molecular markers associated to interesting phenotypic traits, such as resistance, by detection of the SNP loci among the population derived from the hybridization of S. officinarum and S. spontaneum due to the limited utilization of accessions in two sugarcane ancestor species in modern sugarcane improvement.
Based on the comparison of the cp genomes among six important sugarcane hybrids, including NCo310, SP80-3280, Q155, RB867515, Q165 and RB72454, and two ancestor species S. officinarum (Badila) and S. spontaneum (Yunnan 83-184), highly conserved genome structures were observed, which are similar to other plant species in different genera [2,3,29,47] or in different families [7,11,27,48,49], while the length is divergent among different accessions. Based on the previous reports [6,[27][28][29], GenBank information (Accessions:NC_035224.1; LS975131.1; LN849912.1; LN849914.1; LN896359.1) and two species sequenced in this study, the complete cp genome sequences from 141,151 bp to 141,348 bp among these 11 accessions including the above six sugarcane hybrids and five sugarcane ancestor accessions ( Table 6). The length for SSC region is exactly the same except SP80-3280 with additional two base-pairs, which was sequenced in early time [28]. However, diversification mainly appears in the length of LSC, from 83,017 bp to 83,226 bp. The maximum difference in length existed in the species S. spontaneum (from 83,047 to 83,226 bp), which may come from much diversification in ploidy (2n = 4× to 12×) and in the number of chromosomes (2n = 40-128), in spite of chloroplast without containing the chromosome, followed by another ancestor S. officinarum, with 83,065 bp for Badila and 83,042 bp for IJ76-514. Merely, most modern sugarcane varieties share the same length in the LSC region except for RB72454 with 30 reduced base pairs. In addition to the divergences in length, differences are also present in gene content of cp genome. There are 38 tRNA genes presented in cp genomes of Badila and Yunnan 83-184, while there is 39 in RB867515 [29], and certainly differing more from the early report of sugarcane cp genome, such as 116 identified genes including 82 CDS, four rRNA and 30 tRNA genes in sugarcane hybrid SP80-3280 [28], which may be due to the difference in sequencing technique. Additionally, the number of the identified genes and duplicated genes in the cp genomes were divergent more in different genera and families. Additionally, as if it is not always appeared to have a positive relation between the size of cp genome and the gene number, in spite of most of them do. In this study, two sugarcane ancestors in genus Saccharum have 134 genes identified with 141,187 bp or 141,181 bp in size, and 20 duplicated genes, while Triticum turgidum subsp. Durum belongs to the same family (Gramineae), which has 135 genes with a shorter length (135,948 bp) [15] than those in sugarcane clones. This phenomenon is also observed between T. timopheevii cultivar TA944 (KJ614409) with 136,124 bp in size while only 85 annotated genes, and T. timopheevii cultivar Tim01 (NC_024764) with 136,157 bp in size with 125 annotated genes [15]. Different situation was found between Hordeum vulgare and sugarcane, and Sorghum bicolor and sugarcane, both groups have the characteristics of less annotated genes (131) with the smaller length (136,462 bp and 140,754 bp) [50], and thus suggesting a positive relation between the length of cp genome and the gene number. A similar situation was also found in Ipomoea L. from Convolvulaceae [51] and between Triticum turgidum subsp. Durum (KM352501) and T. urartu (NC_021762) [15].
More divergences were observed when referred to the base substitution events. In this study, we find 24 SNP loci in S. spontaneum when reference to S. officinarum, which was different from the findings of Vidigal et al. [29], in which sugarcane hybrid RB867515 was identical to Q155, and only four SNPs and one Indel differing from NCo310, six SNPs and two Indels differing from SP80-3280. Even though this rate of base substitutions (0.017%) between S. spontaneum and S. officinarum is low at the intrageneric level, but higher than that observed by Vidigal et al. [29]. This rate was much lower than the 5940, 6260 and 5992 between Fagopyrum luojishanense and each of F. dibotrys, F. esculentum and F. tataricum [46], and the 591 (0.38%) between Solanum bulbocastanum and S. tuberosum [48], and the 235 (0.15%) between Mikania micrantha and M. cordata [7] and the 231 (0.15%) between Machilus balansae and M. yunnanensis [47]. Specifically, the SNPs existed in 72 out of 79 CDS genes among seven Panax species [52], with the exception of seven genes: psaJ, psbN, rpl23, psbF, psbL, rps18, and rps7. In addition, when referring to SNP density, lower density in IR regions was observed than those in LSC and SSC regions, similar phenomena were observed in the other species, such as in Panax [52], Solanum [48], and Machilus [47], implying that it is a common phenomenon. Among the 24 SNP loci, most (16) appeared to be the nonsynonymous substitutions, and this rate (66.67%) of the nonsynonymous and synonymous substitutions is high. Whilst, the ratio of 1:1 for Tv to Ts between S. spontaneum and S. officinarum suggests that substitutions occur with unbias, while substitutions occurring with a bias and in favor of transversions was observed between M. micrantha and M. cordata with the rate of 1:0.74 [7]. The base substitutions identified in the current study may help to understand the phylogeny and population genetics of Saccharum, and recognize the real hybrids in sugarcane genetic improvement.
Codons play an important role in the process of transmission of genetic information. The number of total codons for protein coding genes are divergent in different species. There are only 19,994 codons in S. officinarum var. Badila and 20,436 in S. spontaneum Yunnan 83-184, less than those in M. micrantha (26,417) and M. cordata (26,414) [7], while coding more CDS genes (88) than that (80) in M. micrantha and M. cordata, implying the stronger coding capacity. However, the most and the least abundant amino acids are the same in two sugarcane ancestors in Gramineae. They are leucine and cysteine, respectively, and the usage frequency is also high similarity. This high similarity is also observed between different families, such as in Mikania (Asteraceae), the most abundant is leucine (10.7%) and the least abundant is cysteine (1.12% and 1.13%) [7] compared to 10.88% and 10.90% for leucine, and 1.105% and 1.096% for cysteine in S. officinarum and S. spontaneum, respectively, in the current study. This situation is similar in the same family but different genus, such as M. floridulus (accession: LN869215.1) in Gramineae [30]. In addition, a codon use preference is common in plant species. In this study, obvious codon use preferences were observed in two sugarcane ancestors S. officinarum and S. spontaneum, and the most preferred codon is AUG, followed by UUA, and then GCU, UAU, ACU, UCU, AGA, UUA, GAU, GGA and CCU with the similar preferences in both species. Similar opinion of obvious codon use preferences was also found in the previous studies of Pyrus [53] and F. dibotrys [44]. However, the preferred codons are divergent in different species. For example, the most preferred codon is AUG in our study, while the frequently used codons are ATT, AAA, GAA, AAT and TTT in Pyrus, and the most preferred one is ATT [53], indicating that different species have divergent codon preferences, which is formed during the long-term of evolutionary process.
Repeat structures are correlated with rearrangement and recombination of plastome including chloroplast genome. SSRs in cp genome are potentially useful markers for population genetics due to a high variation commonly appeared within the same species [54]. In our study, a large number of repeats were detected in the cp genomes of sugarcane ancestors, and most repeats were located in intergenic regions, which is similar to the species in Fagopyrum [44], and most are perfect types with only one to two compound SSRs in either of the three sets of parameters performed in this study, which is similar to the species M. micrantha and M. cordata [7]. However, performance parameters for searching SSRs determine the number of SSRs detected, and significant difference will be observed after carrying out three sets of parameters. There are 30 limited SSRs with mononucleotide repeats when parameters of 1/10 (mononucleotides ≥ 10 nt), 2/6 (dinucleotides ≥ 6 repeats), 3/5, 4/5, 5/5, and (6/5) were performed, compared to a large number of more than 470 SSRs when less strict parameters were performed. Thus, performance parameter is the most important factor in investigation of SSR, which can be adjusted according to the desired and objective.
In addition to the crucial function of performing photosynthesis in the chloroplast, it also evolves to extend functions, such as tolerance to environmental stresses of low temperature. In chloroplast, impaired chlorophyll biosynthesis was observed when plants exposed to cold stress, which is attributed to down-regulation of gene expression, protein abundance and enzyme activities [32,[55][56][57]. The reduced photosynthesis, caused by low temperature stress, is due to decline in Fv/Fm, PSII, inhibition of electron transport, and consequent declined photophosphorylation and CO 2 assimilation [58][59][60][61][62][63]. In the current study, compared to S. officinarum, the values of SPAD in S. spontaneum were always significantly higher no matter they are cultivated in control temperature, exposed to low temperature or during recovery, and so is for Fv/Fm when exposure to low temperature. In addition, during recovery, there are several new strong tillers observed in S. spontaneum but no tillers appeared in S. officinarum, indicating a strong growth compensation ability in S. spontaneum, in spite of the value of Fv/Fm can not be detected in both species, which may result from excessive low temperature stress. However, when compared with the Fv/Fm values obtained in sugarcane under control field conditions, these values observed in both accessions of Badila and Yunnan 83-184 are quite low [64]. For Badila, the Fv/Fm value observed in cold stress in this study is only 27% of its value in control field conditions [64]. It suggests that S. spontaneum Yunnan 83-184 exhibits the stronger tolerance to cold stress than S. officinarum Badila, and the expanding function to low temperature stress in chloroplast also observed in sugarcane. In addition, the stronger tolerance to low temperature environmental stress of S. spontaneum can also reflect in quick response to cold stress than that in S. officinarum, and thus results in less damage. Merely, low temperature still has a serious impact on chloroplast of S. spontaneum and S. officinarum, which results in obvious decrease of the Fv/Fm value but unobvious for SPAD during cold stress.

Plant Material, Sample Collection and DNA Preparation
Considering the complexity and diversity of ploidy and chromosome number of Saccharum officinarum and S. spontaneum, accessions from these two sugarcane ancestor species, used for analysis of chloroplast genome sequence, especially cold tolerance, should have the same or at least not much difference in ploidy and chromosome number, because both factors have great influence on phenotypes, including biomass. Thus, two accessions Badila and Yunnan 83-184 from the aforementioned ancestor species were selected for investigation. Young fresh leaves of Saccharum officinarum var. Badila (chromosome number 2n = 80, ploidy 8×) and S. spontaneum Yunnan 83-184 (2n = 80, 10×) were collected from the National Sugarcane Germplasm Resource Garden in Kaiyuan, Yunnan, China. Yunan 83-184 is an important wild type of S. spontaneum, which has good performance in drought, cold and salty stress resistance, and growth vigor, and has been widely used in germplasm innovation. Badila is one of the several limited original accessions of S. officinarum, which is also popularly used as the cultivar for chewing cane in China for a long time. A modified cetyl-trimethylammonium bromide (CTAB)-based method [65] was used for extracting leaf total DNA. Then, the purified DNA with suitable quality was stored at −80 • C for further use.

DNA Sequencing and Genome Assembly
The purified DNA was used for the construction of chloroplast DNA libraries. Illumina HiSeq 2500 was used to generate raw sequence reads for this project. Since the DNA sample for sequencing are a mixture from nucleus and organelles, the cpDNA sequences needed to be separated from the original raw reads. After removing adaptors and low-quality reads (Q ≤ 30), clean reads were assembled by Genome Assembler SPAdes (version 3.13.1) (cab.spbu.ru/software/spades/v) to get the seed sequences, then kmer (kmer = "55, 87, 121") iterative extend seed, and then using SSPACE to scaffold contigs. The software GapFiller v2.1.1 (https://jaist.dl.sourceforge.net/project/gapfiller/v2.1.1/gapfiller-2.1.1.tar.gz) was used for filling up the gaps. Before assembling, chloroplast DNA (cpDNA) sequences were drawn by the method of Bowtie2 version 2.2.4 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~{}langmea) from the paired-end reads by comparing to the known database of plant chloroplast genomes constructed by the Genepeer Biotechnology Company (Nanjing, China), in which the cp genome sequences were downloaded from NCBI, and then these reads were used to assemble the cp genomes. Assembled contigs were subjected to BLAST against the existing complete cp sequence of S. spontaneum IJ76-287 (accession number: LS975131.1), which was also used as the reference cp genome to assemble into the complete circular cp genome sequence. The following method was occupied for filling up the gaps between contigs, and described as: using BLAST to map the raw reads onto both ends of the assembled contigs (step one), and then scaffolds and joining overlapping reads to elongate the contig (step two). The above two steps were repeatedly carried out till all the gaps between contigs were filled up. Sequence assembly of clean reads was carried out according to the chloroplast genome sequences of reference species. Gene annotation, RSCU analysis, cpSSR analysis, statistical analysis of K2P, Ka and Ks, and co-linear analysis of two sequenced species were conducted based on the assembly results of cp sequences.

Photosynthetic Parameter Measurement and Statistical Analysis
Stalks of S. officinarum and S. spontaneum accessions with single bud were cultivated in pots with 50 cm (diameter) × 50 cm (high). Six-month-old plants with seven full-expanded leaves were used to investigate the chlorophyll content, chlorophyll fluorescence parameters and the effects of cold stress on these two parameters. Six biological replicates were set, and one pot for one replicate containing two plants. The middle part of the +1 leaf on one plantlet from each pot was used for measurement. Cold stress was carried out in the low temperature light incubator (LGX-1200D-LED, PRANDT Instrument Co., Ltd., Hangzhou, China) under 4 • C with 12 h light/12 h darkness, and light intensity is same (25,000 Lux), and stopped after 7 days, followed by removing pots to field. A SPAD-502 Plus (Konica Minolta Sensing, Inc., Osaka, Japan) was used to measure the chlorophyll relative content on six plants reference to the previous report [70]. A system of IMAGING-PAM fluorometer (Walz, Effeltrich, Germany) was used to measure the chlorophyll fluorescence parameters on the same six plants and the same leaves, which previously used to investigate the SPAD values according to the method described by Su et al. [71], following a dark adaptation for 2 h. The measurement of chlorophyll relative content was performed before cold stress, culturing for 3 days and 7 days under low temperature, and at 10 days after removing to field for recovering growth. Additionally, the software SPSS 19.0 system (SPSS IBM, Somers, NY, USA) was used to perform the test of significance of difference of physiological data SPAD and Fv/Fm under Duncan's significant difference test at the p < 0.05 level. 5

. Conclusion
The length of complete cp genome containing a quadripartite structure is 141,187 bp for S. officinarum, which is only six bp larger than that of S. spontaneum. Its LSC and IR are, respectively, 18 bp larger and 6 bp shorter than those of S. spontaneum, while GC contents are same (38.44%). They have the same number of the annotated genes, duplicated genes, and introns (without counting a trans-splicing gene rps12), which indicated the high conservativeness in their cp genome sequences. However, each species has its unique genes, and most (16/24) of the detected SNPs are nonsynonymous substitutions in S. spontaneum referenced to S. officinarum, of which ccsA has four SNP loci. In addition, a ratio of 1:1 for Tv (10) to Ts (10) was observed, and four codons contain both substitutions. IR junction analysis indicates that both two ancestors have the same number (5) of contraction genes, and only ndhF at the IRb/SSC border with characteristic of expansion. There are more codons for coding amino acids in S. spontaneum than that in S. officinarum, and there is different preference within species, while both have obvious codon preference and the codons with highest-and lowest-frequency are the same, and so are the most abundant and least amino acids. A batch of SSRs with different repeat types were identified, which could be used to infer the population genetic structure. The wild species S. spontaneum exhibits much stronger photosynthetic ability and cold tolerance, which are reflected by the significant higher SPAD in control conditions, under cold stress and during recovery than those in S. officinarum, together with the obvious higher chlorophyll fluorescence parameters Fv/Fm when exposed to low temperature, which can also be indicated by new tillers and implied by quick response to low temperature environments. This study adds our new knowledge of the two sugarcane ancestors and highlights the differences in the cp genomes, photosynthetic ability and cold tolerance between two important species, and provides the clues to promote genetic improvement of photosynthetic ability and cold resistance in sugarcane.