Comparisons of Chloroplast Genome Mutations among 13 Samples of Oil-Tea Camellia from South China

The differences in cpDNA SNPs and InDels of 13 samples from single trees of different species or populations of oil-tea camellia in South China were examined in this study, and phylogenetic trees were reconstructed based on CDSs and non-CDSs of cpDNAs to research the evolutionary relationships among all samples. The SNPs of all samples included all kinds of substitutions, and the frequency of the transition from AT to GC was highest; meanwhile, the frequencies of all kinds of transversions differed among the samples, and the SNPs exhibited polymorphism. The SNPs were distributed in all the different functional regions of cpDNAs, and approximately half of all SNPs in exons led to missense mutations and the gain or loss of termination codons. There were no InDels in the exons of any cpDNA samples, except those retrieved from Camellia gigantocarpa, although this InDel did not lead to a frame shift. The InDels of all cpDNA samples were unevenly distributed in the intergenic region and upstream and downstream of genes. The genes, regions of the same gene, sites and mutation types in the same region related to the distributions of SNPs, and InDels were inconsistent among samples. The 13 samples were divided into 2 clades and 7 or 6 subclades, and the samples of species from the same sections of the Camellia genus did not belong to the same subclades. Meanwhile, the genetic relationship between the samples of Camellia vietnamensis and the undetermined species from Hainan Province or the population of C. gauchowensis in Xuwen was closer than that between C. vietnamensis and the population of C. gauchowensis in Luchuan, and the genetic relationship among C. osmantha, C. vietnamensis and C. gauchowensis was very close. In sum, SNPs and InDels in the different cpDNAs resulted in variable phenotypes among the different species or populations, and they could be developed into molecular markers for studies on species and population identification and phylogenetic relationships. The conclusion from the identification of undetermined species from Hainan Province and the phylogenetic relationships among 13 oil-tea camellia samples based on cpCDS and cpnon-CDS sequences were the same as those from the former report.


Introduction
Oil-tea camellia is a special oil tree belonging to the Camellia genus whose seeds contain rich oil and that was first planted in China long ago. This group of plants includes approximately 100 species, and there are 30 common species among them, such as C. oleifera, C. meiocarpa and C. vietnamensis [1]. Camellia seed oil contains abundant tea polyphenols and tea saponin, whereas olive oil does not have both components. Olive oil contains harmful components such as cholesterol and sinapic acid but camellia seed oil does not. Therefore, camellia seed oil has better nutritional and healthcare value, and the oil-tea camellia industry is expected to develop broadly [2]. Oil-tea camellia plants strongly resist many kinds of stress and widely adapt to many kinds of environments, showing extremely strong resistance to typhoons [3], so they can be planted in low-yielding or abandoned land and produce high yields, resulting in better ecological and economic benefits.

SNP and InDel Analysis
The chloroplast genome of C. oleifera mentioned above was used as the reference genome, and the reads were mapped back to the reference genome with the BWA software. Then, the reads from PCR duplication were deleted with the Picard Tools software. According to the mapping results and comprehensively considering factors such as data features, sequencing quality and experiment operation, the probability of every possible genotype was calculated based on the actual data with the software GATK UnifiedGenotyper. The genotype with the highest probability was selected and regarded as the genotype at a specific site in the sequencing unit. Then, a quality value was assigned to reflect the accuracy of the gene, and the sequence identity was determined. According to the sequence identity, the polymorphic sites in the reference sequence were screened and filtered. The main steps were as follows: the SAM file was converted into a BAM file, the BAM file was sequenced, PCR duplicates were identified, reads of PCR duplicates were deleted, reads whose mapping Q values were no more than 10 were filtered and the indexes were calculated. The realignments of the sequences near the InDels were mapped, and then the variations with high accuracy were gained by filtering the results of variant calling based on SNPs and InDels GATK. SNP and small InDel annotation was carried out according to the off-annotation information of the reference genome with the Hannover procedure. With regard to the SNP and small InDel sites in the CDS region, their influences on the translation of the protein were identified.

Construction of a cpDNA Phylogenetic Map
Sixty-five complete cpDNA sequences were obtained according to the method in a previous report [5], and 10 sample sequences were added for analysis. CDSs and non-CDSs of the 75 full cpDNA sequences were extracted for phylogenetic analysis. The phylogenetic tree was constructed according to the method in the same former report [5]. Every group was analysed using the software MAFFT (default parameters), and sequence pruning was performed using Gblocks (with the following parameters: -t, d, -b5, h). The phylogenetic tree was constructed using the software IQTREE and the outgroup was set as Hartia laotica (NC_041509.1).

Identification and Preference Analysis of SNP Mutation Types
The cpDNA genome of C. oleifera (HD07) was used as the reference sequence, and the frequency distribution of the different SNPs from the different samples is shown in Figure 1. All samples had all the kinds of base substitution mutations, and the sites of every kind of substitution mutation were different from each other among the samples. However, transition mutations from AT to CG were the most common and transversion mutations from CG to GC and from TA to AT were the rarest and second rarest, ranking 6th and 5th, respectively. The other kinds of substitution mutations, such as transitions between CG and AT or transversions between TA and GC, ranked in the middle, and their rankings changed among samples. In sum, the frequency of transition mutations was higher than that of transversion mutations, and the ratio of the former to the latter was less than 2:1, so the substitution mutations of oil-tea camellia are biased towards transition mutations.
Meanwhile, the substitution mutations appeared to be unique and polymorphic among the oil-tea camellia species or populations. of substitution mutation were different from each other among the samples. However, transition mutations from AT to CG were the most common and transversion mutations from CG to GC and from TA to AT were the rarest and second rarest, ranking 6th and 5th, respectively. The other kinds of substitution mutations, such as transitions between CG and AT or transversions between TA and GC, ranked in the middle, and their rankings changed among samples. In sum, the frequency of transition mutations was higher than that of transversion mutations, and the ratio of the former to the latter was less than 2:1, so the substitution mutations of oil-tea camellia are biased towards transition mutations. Meanwhile, the substitution mutations appeared to be unique and polymorphic among the oil-tea camellia species or populations.

SNP Genotyping
The identification and statistical results of SNP genotyping are shown in Table 1. The total homozygous and heterozygous SNP sites of the different samples differed from each other, which meant that either the polymorphisms existed in the numbers of different genotypes of cpSNPs among the different species or they existed in the different populations of the same species. In other words, every sample had unique cpSNP sites. All samples showed more homozygous SNP sites than heterozygous SNP sites, which indicated that the variation in cpSNPs was relatively stable. Generally, the more total SNPs there were, the greater the sequence difference from C. oleifera (HD07), so the similarity to C. oleifera (HD07) exhibited the following order: C. polyodonta (HD04), C. meocarpa (HD05), C. gauchowensis from Gaozhou (HD01), C. gigantocarpa (HD03), C. vietnamensis (HD09), C. gauchowensis from Luchuan (HD02), C. gauchowensis from Xuwen (HD13) and the undetermined species from Hainan Province (HD10-HD12), C. osmantha (HD08) and C. semiserrata (HD06). Generally, the more heterozygous SNP sites there were, the greater the difference in SNP complexity from C. oleifera (HD07), such that complexity exhibited the following order: C. semiserrata (HD06), C. osmantha (HD08), C. gauchowensis from Xuwen (HD13) and the undetermined species from Hainan Province (HD10-HD12), C. gauchowensis from Luchuan (HD02), C. vietnamensis (HD09), C. gigantocarpa (HD03), C. gauchowensis from Gaozhou (HD01), C. meocarpa (HD05) and C. polyodonta (HD04). In sum, all the above results indicated that cpSNPs could be used as molecular markers to identify the different species and populations of oil-tea camellia.

SNP Annotation
The cpSNP annotations of 13 samples are shown in Table 2. The specific sequence regions and the site numbers of SNP locations are shown. Although they are not listed individually in this paper, this cpSNP location information would be essential to guide further research on the regulation of gene expression, molecular breeding and cpDNA genetic engineering.

40
Total 233 Note: The terms "upstream" and "downstream" are defined as 1 kb away from the transcription start site and transcription end site, respectively, taking into account the strand of the mRNA. If a variant is located in both downstream and upstream regions (possibly for 2 different genes), then "upstream; downstream" will be printed as the output. The intergenic space is >1 kb away from any gene of both. Gene names shown in the bold font and green, blue and red indicate a nonsynonymous mutation, stop-gain mutation and stoploss mutation, respectively. The chloroplast genome of C. oleifera was used as the reference genome. The same as below.
The SNPs in the different functional regions of cpDNAs were distributed unevenly. The sequences of the functional regions, according to the numbers of genes containing cpSNPs, were as follows: exons, upstream and downstream overlapping regions, upstream, downstream, introns and intergenic spaces. However, the kinds and numbers of specific related genes in the different regions led to different magnitudes of differences among the samples.
The samples could be divided into five groups on the basis of the number of cp-SNPs in the sequences of functional regions. The first group included the samples of the undetermined species from Hainan Province (HD10-HD12), the population of C. gauchowensis from Xuwen (HD13) and C. osmantha (HD08), with the sequence types showing the following order: exons, intergenic spaces, downstream, upstream, downstream and upstream overlapping regions and introns. The second group included C. vietnamensis (HD09) and the populations of C. gauchowensis from Gaozhou and Luchuan (HD01 and HD02, respectively), and the sequences showed the following order: intergenic spaces, downstream, exons, upstream, downstream and upstream overlapping regions and introns. The third group included C. osmantha (HD08), and the order was intergenic spaces, upstream, exons, downstream, downstream and upstream overlapping regions and introns. The fourth group included samples other than those of C. semiserrata (HD06), with the retrieved order being exons, intergenic spaces, downstream, downstream and upstream overlapping regions, upstream and introns. The last group included C. semiserrata (HD06), and the order retrieved for the sample was exons, intergenic spaces, downstream and upstream overlapping regions, downstream, upstream and introns. The samples of each group had similar sequence type orders based on the numbers of SNPs, but the differences in the numbers of SNPs among the specific functional regions varied in magnitude among the samples.
The genes and the sites of the cpSNPs in exons differed among samples, and these different genes and sites led to missense mutations and the gain and loss of termination codons. Additionally, all samples, except those of C. semiserrata (HD06), the undetermined species (HD10-HD12) and the Xuwen population of C. gauchowensis, which showed loss of the termination codon in the petL gene, had no mutations leading to loss of the termination codon but did show missense mutations and gain of the termination codon. All samples had 1-2 gains of termination codons, and the different genes and sites showing missense mutations and gain of termination codons led to variation in the magnitudes of differences, and the majority of these mutations were synonymous mutations. All of these results are summarized in Table 3. Table 3. The results of the annotations of SNPs in exons. HD01  HD02  HD03  HD04  HD05  HD06  HD08  HD09  HD10   Total  28  34  56  40  47  80  40  29  66   Nonsynonymous SNV  11  12  27  18  21  32  16  11  31  Stopgain SNV  1  2  2  2  1  2  1  1  2  Stopiloss SNV  -----1  --1  Synonymous SNV  16  20  27  20  25  45  23  17  32 In general, the SNPs of the different samples had some common characteristics, but they still showed differences among the samples.

SNP Comparisons among Samples
The samples were divided into different groups according to their evolutionary relationships, geographical relationships, and the characteristics of differences between important traits. Then, the samples were compared within each group, and the results are shown in Table 4. Table 4. The distributions of the same SNPs in the genes of the different sample groups.

HD03 HD05
Downstream psbA(1), psbI (12), atpA (1)  14 Exon matk (1), atpF (1), rps2(1), ycf4(1), petL (6), petG (2), (4), ndhJ (7), psbF (1), (1) 18 Intron rps16 (2), atpF (1) A total of 51 SNPs were consistent among the samples of the three populations of C. gauchowensis (HD01, HD02 and HD13), and nine in exons included a missense mutation of the rbcL gene and a gain of the termination codon of the accD gene. Seventy-six SNPs were consistent among the samples of the three populations of C. gauchouwensis (HD01, HD02 and HD13), C. vietnamensis (HD09) and the undetermined species from Hainan Province (HD10-HD12), and twenty-three of those distributed among exons included the missense mutation of the rbcL gene and a gain of the termination codon of the accD gene. The results of further comparison between the two groups indicated that all of the SNPs in the former were also present in the latter. The results of the analysis combined with Table 2 revealed that each sample of the different populations had unique SNPs beyond those displayed in Table 3. The SNPs in the intergenic space of the rpl20 gene differed significantly among the populations, and the samples of the undetermined species from Hainan Province (HD10-HD12) and the population of C. gauchowensis from Xuwen (HD13) had more SNPs than those of the populations of C. gauchowensis from Gaozhou and Luchuan (HD01 and HD02). The first two had more regions of unique SNPs, especially the exons of the genes psbC, psbJ, psbL, psbF, psbE, and petL, which all exhibited unique missense mutations.
3.4.2. Comparison among C. osmantha, C. gauchowensis and C. vietnamensis C. osmantha, C. gauchowensis and C. vietnamensis were distributed in the same production area, and the differences in the SNPs among them are worth noting. There were 48 SNPs consistent with each other, and they included the same missense mutations of the rbcL gene and the same gain of the termination codon of the accD gene. The SNPs in the intergenic spaces were completely different, and there were few SNPs in the other functional regions of genes. Forty-five SNPs were identified in the comparisons of the samples of C. gauchouwensis, C. vietnamensis and the undetermined species from Hainan Province, so there were three different SNPs upstream of the atpB gene and the overlapping regions of the petG and petL genes and psaJ and petG genes, respectively. The results of the analysis combined with Table 2 revealed that the genetic relationship between C. osmantha (HD08) and the population of C. gauchowensis from Gaozhou (HD01) was farther than that between C. vietnamensis (HD09) and the population of C. gauchowensis from Gaozhou (HD01), and the genetic relationship between the population of C. gauchowensis from Gaozhou (HD01) and C. osmantha (HD08) was similar to that between C. osmantha (HD08) and C. vietnamensis (HD09).
3.4.3. Comparisons among C. gauchowensis, C. vietnamensis, Undetermined Species from Hainan Province and C. osmantha Twenty-one SNPs were consistent among the samples of the three populations of C. gauchowensis (HD01, HD02 and HD13), C. vietnamensis (HD09), the undetermined species from Hainan Province (HD10-HD12) and C. osmantha (HD08). In addition, these SNPs were also observed in the comparisons of samples from C. osmantha, C. gauchowensis, and C. vietnamensis and those from C. gauchouwensis, C. vietnamensis and the undetermined species from Hainan Province. These comparisons also revealed the same missense mutation of the rbcL gene and gain of the termination codon of the accD gene. There were no identical SNPs in the intergenic spaces, and there were few identical SNPs in the other functional regions of genes, so the cpSNPs of C. osmantha were different from those of the undetermined species from Hainan Province.

Comparison between C. giganticarpa and C. meiocarpa
The fruits of C. giganticarpa (HD03) and C. meiocarpa (HD05) are the largest and smallest among the afforestation species, respectively, examined in this paper, so it was worth comparing their SNPs. There were 38 consistent SNPs, no identical SNPs in the introns and far fewer identical SNPs in the other functional regions of genes than in the comparisons of the samples from the three populations of C. gauchowensis (HD01, HD02 and HD13) and the three populations of C. gauchowensis (HD01, HD02 and HD13), C. vietnamensis (HD09), C. osmantha (HD08) and the undetermined species from Hainan Province, respectively. The same missense mutations were observed in the exons of the matk, atpF and psbF genes, and the same gain of the termination codon of the psbF gene were observed. Additionally, there were numerous missense mutations that differed from each other, revealing large differences in the cpSNPs. Moreover, the SNPs in the intergenic space of the rpl20 gene were different.

Comparison between C. polyodonta and C. semiserrata
The flowers of C. polyodonta (HD04) and C. semiserrata (HD06) are both red, so it was worth comparing the SNPs of these two species. There were 27 shared SNPs, fewer than in the comparisons mentioned above, but there were more identical SNPs in the exons (18), more than in the comparisons of the samples from the three populations of C. gauchowensis (HD01, HD02 and HD13). Moreover, the same missense mutations of the genes rpoB, ndhJ and psbF and gains of the termination codons of the genes ndhJ and psbF were observed. There were no identical SNPs in the intergenic spaces, and there were fewer identical SNPs in the other functional regions of genes.

Integrative Comparisons
The results of the comparisons of SNPs among the samples of C. giganticarpa (HD03), C. polyodonta (HD04), C. meiocarpa (HD05) and C. semiserrata (HD06) revealed only 3 shared SNPs, namely, 2 in the intron of the rps16 gene and 1 in the overlapping downstream region of the accD gene and upstream region of the psaJ gene. Yet, there were no identical SNPs in the other functional regions of the genes. The results of the comparisons among all 13 samples revealed no identical SNPs.
In general, the similarity among the samples of C. vietnamensis (HD09), the undetermined species from Hainan Province (HD10-HD12) and the populations of C. gauchowensis from Gaozhou and Xuwen (HD01 and HD13) was higher than that among the samples of the three populations of C. gauchowensis (HD01, HD02 and HD13), which indicated that C. vietnamensis (HD09), C. gauchowensis (HD01, HD02 and HD13) and the undetermined species from Hainan Province (HD10-HD12) might belong to different populations of the same species. The similarity of the SNPs among the different populations of the same species was higher than that among the different species, the difference among the SNPs in the intergenic spaces was higher than that in the regions within genes, the abundance and differences of the SNPs in the exons were higher than those in the introns and there were no identical SNPs among all samples.

Identification of the Kinds of InDels and Calculation of Statistics at the InDel Sites
The results for InDels of all samples are shown in Table 5. The numbers of total deletion and insertion sites of the different samples differed from each other, meaning that polymorphisms existed in the numbers of the different types of InDels among the species or populations of the same species. In other words, every sample had unique InDel sites. The ratios of the number of deletions to the number of insertions were all greater than one, suggesting that the frequency of deletions was greater than that of insertions. The ratios of the number of deletions to the number of insertions of the samples from the populations of C. gauchowensis from Gaozhou and Xuwen (HD01 and HD13), C. vietnamensis (HD09) and the undetermined species (HD10-HD12) were almost equivalent, and the ratios of the number of deletions to the number of insertions of the samples from C. osmantha (HD08) and C. semiserrata (HD06) were slightly higher than those from the populations of C. gauchowensis from Gaozhou and Xuwen (HD01 and HD13), C. vietnamensis (HD09) and the undetermined species from Hainan Province (HD10-HD12). However, those of the other three species (HD03, HD04 and HD05) and the population of C. gauchowensis from Luchuan (HD02) were much higher. Therefore, the cpDNA InDels tended to be deletions, and the relative abundance of deletions depended on the species or population of oil-tea camellia. In summary, all the above results indicated that cpDNA InDels could be used as molecular markers to identify different species and populations of oil-tea camellia.

InDel Annotation
The cpDNA InDel annotations of 13 samples are shown in Table 6. The specific sequence regions and the site numbers of InDels were mainly distributed in the downstream, upstream and intergenic regions, whereas fewer were distributed in the introns, and almost none were distributed in the exons. Although the InDels are not introduced one by one in this paper, their location information would be essential for further research on the regulation of gene expression, molecular breeding and cpDNA-based genetic engineering. Table 6. The distributions of the InDels in the genes.

Exon Regions
The rpoC2 exons of all samples, except the sample of C. gigantocarpa (HD03), had a sixbase TTAAAA deletion. The InDel in the exon of the rpoC2 gene of C. gigantocarpa (HD03) did not lead to a frameshift, but it might lead to the loss of the codon of phenylalanine and the loss of the termination codon. This meant that the InDels in the exon would be eliminated by mortality induction; yet, a few InDels not causing a frameshift, and thus having no effect on the phenotype, could occur occasionally.

Intron Regions
There were no InDels in the introns of the samples retrieved from C. gigantocarpa (HD03), C. polyodonta (HD04) and C. meiocarpa (HD05), but the introns of the rps16 and ycf3 genes contained InDels; the intron of the ycf3 gene contained one insertion, i.e., T; and that of the rps16 gene in the other samples, except those from the population of C. gauchowensis from Xuwen (HD13) and the undetermined species from Hainan Province (HD10-HD12), contained two deletions, i.e., CTTTTTC, and one InDel, i.e., TTTTCC. However, the samples from the population of C. gauchowensis from Xuwen (HD13) and the undetermined species from Hainan Province (HD10-HD12) contained only one deletion, i.e., CTTTTTC. As a result, there were fewer InDels in the introns of different oil-tea camellia species or different populations of the same oil-tea camellia species, but there were some differences in the numbers and kinds of InDels in the introns among the different species and among the populations of the same species.

Upstream, Downstream and Intergenic Regions
The InDels were mainly distributed in the upstream, downstream and intergenic regions of some genes in all samples, and the sizes of the InDels of all samples ranged from 1 to 14 bps. Sizes from 1 to 7 were common; sizes of 11, 12 or 14 bps were uncommon; and sizes of 9, 10 and 13 bps were rare. Importantly, there were two insertions (AACTTTAAATTGAA and AATTGAAAACTTTC) made up of 14 bases in the downstream sequence of the gene matk and one insertion (ATTCCAGTAAAATG) made up of 14 bases in the overlapping regions upstream of the gene petA and downstream of the gene cemA. There was one deletion (TTTATTCAATCA) made up of 12 bps downstream of the gene rps16, and one insertion (ATATAAATAAA) of 11 bps and one deletion (TATGGTAATCCA) of 12 bps in the gene rpl20. Additionally, the intergenic spaces of the rpl20 gene of all samples contained many insertions and deletions of different sizes. Although the genes, the functional regions of the genes in which the InDels were distributed and the number of InDels were not consistent among all samples, there were differences among species or populations. The similarities of the InDels in the samples from the three populations of C. gauchowensis (HD01, HD02 and HD13), C. vietnamensis (HD09) and the undetermined species from Hainan Province (HD10-HD12) were the highest, and the similarities of the samples mentioned above and C. osmantha (HD08) were also high. Table 6 shows only the genes, the regions of the genes containing InDels and the numbers of InDels; there were also differences in the sites, type of mutation (insertion or deletion), size and base composition, among other features. For example, there was one deletion (AAATAGAAATGAAATTTTTTT TATTTTATTAATAA) of 35 bps upstream of gene rps16 in the sample of C. meiocarpa (HD05) and one insertion (TCTAAATAGAATTAGTATT) of 19 bps in the overlapping region upstream of the gene ycf3 and downstream of the gene rps4 in the sample of C. polyodonta (HD04). Only one of all samples contained both InDels, so both might be specific to the two species.

Example Analysis of InDels
The results placing InDels in the intergenic space of the rpl20 gene are shown as an example in Table 7. All samples contained the deletion TATAATA composed of 7 bp. The samples from the populations of C. gauchowensis from Gaozhou (HD01) and Xuwen (HD13), the undetermined species from Hainan Province (HD10-HD12) and C. vietnamensis (HD09) had the same InDels, and the AATAT and T deletions were the most common InDels among them; yet, the samples from C. gauchowensis from Luchuan (HD02) and C. osmantha (HD08) had more InDels than the four mentioned above, such as the insertion of AT and the deletion of TCATGAT in HD02 and the two insertions of T and the deletion of TATGGTAATCCA in HD08. Moreover, the additional InDels in HD02 and HD08 distinguished the two; the additional InDels of HD02 could separate HD02 from the populations of C. gauchowensis, and the additional InDels of HD08 could separate HD08 from C. gauchowensis and C. vietnamensis. However, each of the samples from HD02 to HD08 had a unique combination of the different InDels, such as the insertion of AT and the deletion of TCATGAT in HD02; the insertion of A and the deletion of A in HD03; the insertion of AATAG and A and the deletions of CTTAC and TCTTTT in HD05; the insertions of AT and T and the deletion of TATGGTAATCCA in HD06; and the two insertions of T and the deletion of TATGGTAATCCA in HD08. In summary, the insertion of A in HD03 and the insertion of AATAG and the deletions of CTTAC and TCTTTT in HD05, were the most important because they were the unique InDels of each sample, and the deletion of TATGGTAATCCA in HD06 and HD08 was more important because it did not exist in the other samples and could separate HD06 and HD08 from each other and the other samples when used in combination with the other unique InDels. Additionally, the deletion of AATAT and the two deletions of T were more important InDels because they could be used to distinguish C. gaochowensis, C. vietnamensis, C. semiserrata and C. osmantha from the other species.  The other InDels in the cpDNAs of all samples are not introduced individually here, but the characteristics of the differences in the InDels could be summarised based on results such as those presented above. The cpInDels were mainly distributed in the non-CDSs of cpDNAs, and the genes, functional regions, number, sites, type (insertion or deletion), size and base construction of the cpDNA InDels changed depending on the oil-tea camellia species or population. In general, the cpInDels of different samples shared some common characteristics but they still exhibited polymorphism.

Phylogenetic Inference
H. laotica was taken as the outgroup, and cpDNAs of all samples and seven other Camellia species were applied for phylogenetic reconstruction by maximum likelihood (ML) based on the CDSs. The results are shown in Figure 2. It divided all samples and the sequences from the NCBI into 2 clades; one included 2 subclades, and the other included 5 subclades. The subclade of C. granthamiana included samples of C. vietnamensis (HD09), C. gauchowensis (HD01, HD02 and HD13), the undetermined species from Hainan Province (HD10-HD12), C. osmantha (HD08) and C. granthamiana. The subclade of C. azalea included the samples of C. semiserrata (HD06) and C. azalea. The sample of C. gigantocarpa (HD03) formed an independent subclade. The subclade of C. japonica or C. chekiangoleosa included the samples of C. polyodonta (HD04), C. japonica and C. chekiangoleosa. The subclade of C. oleifera included the samples of C. oleifera (HD07) and C. japonica (the serial number from the NCBI differed from the one in the above subclade). The subclade of C. sasanque included the samples of C. sasanque and C. meiocarpa (HD05). C. crapaelliana formed an independent subclade. The samples of C. gaochauensis and C. vietnamensis are notable in that they showed the cluster nodes of the samples from the population of C. gauchowensis from Gaozhou (HD01) and C. vietnamensis (HD09) in the outermost position. Then, the cluster nodes of these two and the samples of the undetermined species from Hainan Province (HD10-HD12) or the population of C. gauchowensis from Xuwen (HD13) were located in the second outermost position, with the next node associated with the samples from the population of C. gauchowensis from Luchuan (HD02) and C. osmantha (HD08). Therefore, the phylogenetic relationships among C. vietnamensis, C. gaochauensis and the undetermined species from Hainan Province were closer than that between the populations of C. gauchowensis from Gaozhou and Luchuan, meaning that C. vietnamensis, C. gauchowen-sis and the undetermined species from Hainan Province could be merged into the same species, with C. osmantha very closely associated with them. Through further analysis involving combining the above results, the different SNPs from the different samples were used as molecular markers for the identification of species, populations and phylogenetic relationships based on CDSs because there were more SNPs and almost no InDels in exons of cpDNAs.
gauchowensis from Gaozhou (HD01) and C. vietnamensis (HD09) in the outermost position. Then, the cluster nodes of these two and the samples of the undetermined species from Hainan Province (HD10-HD12) or the population of C. gauchowensis from Xuwen (HD13) were located in the second outermost position, with the next node associated with the samples from the population of C. gauchowensis from Luchuan (HD02) and C. osmantha (HD08). Therefore, the phylogenetic relationships among C. vietnamensis, C. gaochauensis and the undetermined species from Hainan Province were closer than that between the populations of C. gauchowensis from Gaozhou and Luchuan, meaning that C. vietnamensis, C. gauchowensis and the undetermined species from Hainan Province could be merged into the same species, with C. osmantha very closely associated with them. Through further analysis involving combining the above results, the different SNPs from the different samples were used as molecular markers for the identification of species, populations and phylogenetic relationships based on CDSs because there were more SNPs and almost no InDels in exons of cpDNAs.  H. laotica was taken as the outgroup, and the cpDNAs of all samples and the seven other Camellia species were applied for phylogenetic reconstruction via ML methods based on the non-CDSs. The results are shown in Figure 3. The results were similar to those based on CDSs, and generally showed the same phylogenetic relationships. The difference from the phylogenetic trees based on CDSs was that the sample of C. gigantocarpa (HD03) was merged with the subclade of C. crapaelliana rather than an independent subclade, so the second clade included 4 subclades instead of 5 clades, making the phylogenetic relationship between the sample of C. gigantocarpa and the sequence of C. crapaelliana closer. In regard to the subclade of C. granthamiana, the phylogenetic trees were extremely similar to those based on CDSs, proving that C. vietnamensis, C. gaochauensis and the undetermined species from Hainan Province belonged to the same species and that C. osmantha was closer than other species to them in terms of genetics. Through further analysis involving combining the above results, the SNPs and InDels from the different samples could be used as molecular markers for the identification of species, populations and phylogenetic relationships based on non-CDSs because there were some SNPs and InDels in the intergenic, upstream, downstream and intron regions of cpDNAs, and these SNPs and InDels were polymorphic among species and populations. determined species from Hainan Province belonged to the same species and that C. osmantha was closer than other species to them in terms of genetics. Through further analysis involving combining the above results, the SNPs and InDels from the different samples could be used as molecular markers for the identification of species, populations and phylogenetic relationships based on non-CDSs because there were some SNPs and InDels in the intergenic, upstream, downstream and intron regions of cpDNAs, and these SNPs and InDels were polymorphic among species and populations.

Primary Analysis of the Genetic Mechanism Underlying the Phylogenetic Relationships within the Camellia Genus
As shown in this paper, some exons in the cpDNA of all samples contained SNPs, and the SNPs appeared to be polymorphic among the different samples. There were inconsistent synonymous mutations, missense mutations and stop-gain mutations among the different samples, and a few samples, such as that of C. semiserrata (HD06), the unde-

Primary Analysis of the Genetic Mechanism Underlying the Phylogenetic Relationships within the Camellia Genus
As shown in this paper, some exons in the cpDNA of all samples contained SNPs, and the SNPs appeared to be polymorphic among the different samples. There were inconsistent synonymous mutations, missense mutations and stop-gain mutations among the different samples, and a few samples, such as that of C. semiserrata (HD06), the undetermined species from Hainan Province (HD10-HD12) and the population of C. gauchowensis (HD13), even contained one stop-loss mutation. Because the SNPs in the exons or the regions for gene expression regulation of all samples could lead to variation in the expression of some traits, the phenotypes varied among the different populations of C. gauchowensis and the different oil-tea camellia species. This phenomenon has been reported for nuclear genomes and chloroplast genomes. Synonymous codon usage bias results from synonymous SNPs [15], and the phenotype varies because of differences in gene expression depending on usage frequency and expression efficiency of the different synonymous codons [16]. The Brassica pekinensis mutation cer1, associated with the absence of wax, is a transition SNP of CT in the fourth exon of the BraA09g066480.3C gene and it leads to a missense mutation. Cabbage without wax resulted, so the phenotype changed greatly [17]. Two SNPs in the granulebound starch synthase I gene of Oryza sativa resulted in great changes in the structure and the function of the protein, causing the rice quality to change significantly [18]. SNPs in the tetrahydrocannabinolic acid (THCA) synthase gene could influence the strength of toxicity of Cannabis sativa, which allowed hemp products to be divided into those with strong toxicity and weak toxicity based on the difference in SNPs and to be distinguished from other plants' products [19]. In sum, cpSNPs had certain effects on the evolution of the different populations of C. gaochauensis and the different oil-tea camellia species.
According to the results in this paper, all samples, except for the sample of C. gigantocarpa (HD03), had no InDels in the exons of the cpDNAs, and C. gigantocarpa (HD03) had one deletion of 6 bp without a frameshift mutation. This InDel led to the loss of the amino acid and the gain of a termination codon, so it might have significant effects on the phenotype, making C. gigantocarpa different from the other plants of the Camellia genus. This result indicated that the InDels in the exons of cpDNA were responsible for variation among the populations and species if they led to missense mutations or nonsense mutations. However, cases of InDels in exons contributing to the phylogenetic relationships of organisms have not been reported, possibly because InDels cause frameshift mutations and result in serious harm and even mortality. The results in this paper showed that the InDels of the different samples were distributed in the regions of gene expression regulation, including the upstream, downstream and intergenic regions, even in the introns. Given that the mutation in the upstream promoter of the granule-bound starch synthase I gene of Solanum tuberosum could influence the expression of the gene [20], the InDels in the non-CDSs of cpDNA uncovered in this paper may also influence the phenotype by affecting gene expression, causing them to have certain effects on the evolution of Camellia plants.

Application Value Analysis of SNPs and InDels in the cpDNAs of Oil-Tea Camellia
The results in this paper reveal the following main points. The SNPs and InDels of cpDNAs were mainly distributed in the intergenic regions and the sequences upstream and downstream of genes and minorly in the introns, and the SNPs were still mainly distributed in the exons. Meanwhile, the CDSs of the samples contained different SNPs, and the non-CDSs of the samples contained different SNPs and InDels, so the cpSNPs and cpInDels could be used for the identification of oil-tea camellia species and their phylogenetic relationships, and they could also be developed into molecular markers.
The phylogenetic relationships of some plants, including 45 species of the Taxodiaceae family [11], 20 species of 20 genera of the Schisandraceae family [21], 63 species of the Populus genus [22] and 42 species of the Camellia genus [23], have been researched, and hypothetical results have been obtained. The phylogenetic trees of the different oil-tea camellia samples were constructed based on the CDSs and non-CDSs of the chloroplast genome in this paper, and the different species of oil-tea camellia and different populations of C. gaochauensis were clearly distinguished from each other. Furthermore, the phylogenetic relationships of all samples were illustrated, and the phylogenetic trees were consistent with those previously constructed based on full cpDNAs by our research group [5]. The following main conclusions were reached once again. C. gauchowensis and C. vietnamensis were merged into the same species, and the undetermined species from Hainan Province was C. vietnamensis. Additionally, verifying that C. osmantha is an independent species will require additional genetic evidence [24,25]. Moreover, because the samples of species from the same sections of the Camellia genus did not belong to the same subclades, divisions of oil-tea camellia species into different sections based on morphological characteristics might be unreasonable [26]. All these results proved that the cpSNPs and cpInDels could be used for the identification of species, populations and phylogenetic relationships in this genus.
Studies analysing the phylogenetic relationships among species or cultivars, constructing genetic maps and identifying the types of germplasm based on SNP or InDel molecular markers from the full chloroplast genomes or nuclear genomes of other plants, such as avocado (Persea americana), apple (Malus pumila), grape (Vitis vinifera), peach (Prunus persica) and yellow lupin (Lupinus luteus), have had desirable outcomes [27][28][29][30][31]. Therefore, the cpSNPs and cpInDels screened in this paper could be developed into molecular markers for the identification of species and populations, the analysis of genetic diversity, the construction of matrilineages and for research on phylogenetic relationships.

Conclusions
There were SNPs and InDels in the cpDNAs of all samples. They showed a preference for transition and deletion mutations, respectively. The cpSNPs in the CDSs caused synony-mous mutations, missense mutations and gain or loss of the termination codon, and the cpSNPs and cpInDels in the non-CDSs caused variations in gene expression. Both cause the phenotypes of different oil-tea camellia species or different populations of C. gaochauensis to differ through mutations in cpDNA genes, so both have certain effects on the evolution of oil-tea camellia species or populations. The cpSNPs and cpInDels were unique to species, and can be used for research on the detection and identification of genetic diversity, species and phylogenetic relationships. These differences can also be developed into molecular markers. The phylogenetic trees based on the CDSs and non-CDSs of cpDNAs indicate again that the undetermined species from Hainan Province is C. vietnamensis, and that C. gauchowenesis and C. vietnamensis must be merged into the same species. Additionally, the genetic relationship among C. osmantha, C. gauchowenesis and C. vietnamensis is much closer than the others, and determining whether C. osmantha is an independent species needs more genetic evidence. Meanwhile, the sectional division of the Camellia genus based on morphological characteristics may need to be readjusted on the basis of differences in cpDNAs.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest:
The authors declare no conflict of interest.