Phylogenetic Analysis and Substitution Rate Estimation of Colonial Volvocine Algae Based on Mitochondrial Genomes

We sequenced the mitochondrial genome of six colonial volvocine algae, namely: Pandorina morum, Pandorina colemaniae, Volvulina compacta, Colemanosphaera angeleri, Colemanosphaera charkowiensi, and Yamagishiella unicocca. Previous studies have typically reconstructed the phylogenetic relationship between colonial volvocine algae based on chloroplast or nuclear genes. Here, we explore the validity of phylogenetic analysis based on mitochondrial protein-coding genes. We found phylogenetic incongruence of the genera Yamagishiella and Colemanosphaera. In Yamagishiella, the stochastic error and linkage group formed by the mitochondrial protein-coding genes prevent phylogenetic analyses from reflecting the true relationship. In Colemanosphaera, a different reconstruction approach revealed a different phylogenetic relationship. This incongruence may be because of the influence of biological factors, such as incomplete lineage sorting or horizontal gene transfer. We also analyzed the substitution rates in the mitochondrial and chloroplast genomes between colonial volvocine algae. Our results showed that all volvocine species showed significantly higher substitution rates for the mitochondrial genome compared with the chloroplast genome. The nonsynonymous substitution (dN)/synonymous substitution (dS) ratio is similar in the genomes of both organelles in most volvocine species, suggesting that the two counterparts are under a similar selection pressure. We also identified a few chloroplast protein-coding genes that showed high dN/dS ratios in some species, resulting in a significant dN/dS ratio difference between the mitochondrial and chloroplast genomes.


Introduction
With the application of next-generation sequencing technology, an increasing number of mitochondrial genomes (mtDNA) have been sequenced in recent years, but only a small number of mtDNA genomes are available in colonial volvocine algae (Chlorophyta, Chlamydomonadales). For colonial volvocine algae, the first mtDNA genome sequence was reported in 1989. Moore and Coleman [1] constructed a physical restriction map of mtDNA from the species Pandorina morum. This species has a linear 20 kb genome, but the sequence of this mtDNA is not available online. In 2010, the second mtDNA of colonial volvocine algae was sequenced. Smith and Lee [2] reported a circular mtDNA genome of Volvox carteri, and proposed a mutational-hazard hypothesis. In 2013, Hamaji et al. [3] presented the complete mtDNA of the species Gonium pectoral. They, compared the circular mtDNA of Gonium pectorale with Volvox carteri and other unicellular species. Their comparison raised questions about the origin of linear mtDNA in Volvocales.
Meanwhile, Smith et al. [4] sequenced the mtDNA of Pleodorina starrii and explored the relationship between organelle genome complexity and organism size. The mtDNA genome of Tetrabaena socialis was the next to be sequenced. Featherston et al. [5] showed a circular-mapping mtDNA architecture at the origin of the colonial volvocine algae. These different mtDNA architectures prompted the community's interest in the origin and evolution of mtDNA conformation in colonial volvocine algae. To answer this question, Hamaji et al. [6] sequenced the mtDNA of Yamagishiella unicocca and Eudorina sp. Analysing all of the available mtDNA of colonial volvocine algae and several unicellular species at that time, they proposed at least three separate shifts in mtDNA architecture in the Reinhardtinia clade. Finally, in 2019, we presented two mtDNA genomes in the genus Eudorina [7]. We analysed the mtDNA organization and offered information to discover the composition of mtDNA in colonial volvocine algae. To the best of our knowledge, these mtDNA genomes are only available in colonial volvocine algae.
We have presented several chloroplast genomes (ptDNA) of colonial volvocine algae in our previous studies [7,25]. Together with the mtDNA sequenced in this study, we are able to compare the substitution rates between their mtDNA and ptDNA. mtDNA and ptDNA are very similar in many respects. Both genomes are housed in energy-producing organelles, are highly reduced, and are dependent on nuclear-encoded and organelle-targeted proteins [26]. Although mtDNA and ptDNA have many similar characteristics, they showed different substitution rates in different groups of species, such as seed plants and algae [27][28][29][30]. The different organelle substitution rates may be associated with the nature and accuracy of organelle-targeted DNA maintenance machineries [31], and the number of genes that an organelle possesses [26]. In this study, we compared the substitution rates between mtDNA and ptDNA among colonial volvocine algae, so as to offer more information about the substitution rates in organelle genomes.

Sequencing, Assembly, and Annotation
All of the strains were sequenced and assembled as described previously [7,25]. mtDNA was initially annotated using the MITOchondrial genome annotation Server (MITOS) [32]. Protein-coding and ribosomal RNA (rRNA) genes were further updated using Basic Local Alignment Search Tool (BLAST+) [33], with genes from the available colonial volvocine mtDNA. The transfer RNA (tRNA) genes were redetected using tRNAscan-SE [34]. All of the mitochondrial genome sequences were submitted to GenBank; the accession numbers are listed in Table 1. The synteny comparison was visualized using progressiveMauve [35].

Phylogenetic Analysis
The nucleotide sequences of the protein-coding genes were used for the phylogenetic analysis. Each gene was aligned using MAFFT v7.394 (a multiple sequence alignment program) [36], and the alignments of all of the genes were converted into a codon alignment using TranslatorX [37]. The ambiguously aligned regions were excluded using trimAl v1.2 [38], with the option -gt = 1 (gapthreshold). The third codon positions of each gene were removed because of substitution saturation.
We used supermatrix and coalescent-based analyses to construct the phylogenetic tree. For the supermatrix analysis, Phyutility [39] was used to concatenate all of the genes. The first and second codons of each gene were considered as different partitions. The selections of the evolutionary model and the partition of each gene were performed using PartitionFinder 2 [40]. The phylogenies were inferred using the maximum likelihood (ML) and Bayesian inference (BI) methods. The ML analyses were performed using Randomized Axelerated Maximum Likelihood program (RAxML) v8.2.10 [41]. A rapid bootstrap analysis with 1000 replicates of the dataset for ML was performed to estimate the statistical reliability. A Bayesian analysis was performed with MrBayes v3.2.6 [42]. A Markov chain Monte Carlo analysis was performed with four Markov chains (three heated and one cold) for 1,000,000 generations, with trees sampled every 1000 generations. Each time the diagnostics were calculated, a fixed number of samples (burnin = 1000) were discarded from the beginning of the chain. A stationary distribution was assumed when the average standard deviation of the split frequencies was less than 0.01. For the coalescent-based analyses, an ML analysis of each gene was performed in RAxML v8.2.10 [41] by conducting 1000 rapid bootstrap replicates with the General Time Reversible model with estimate of proportion of invariable sites and rate heterogeneity (GTRGAMMAI model). Then, the resulting best trees were used to infer the coalescence-based species tree phylogeny with Accurate Species TRee ALgorithm program (ASTRAL) v5.6.3 [43].
To examine the systematic errors, we used PAUP* version 4.0a166 for the minimum evolution and LogDet distances from the data, and coded both as standard nucleotides (NT-coding) and as purines and pyrimidines (A&G→R, C&T→Y, and RY-coding) [44]. We used MrBayes v3.2.6 as the covarion model for the data, as the other options were the same as the supermatrix approach.

Substitution Rate Estimation
After each gene was aligned and trimmed as performed in the phylogenetic analysis, synonymous and nonsynonymous substitutions were measured using the CODEML program of the phylogenetic analysis by maximum likelihood (PAML) v4.9 [45]. The ML model was used with the following options: runmode = −2 and CodonFreq = 2. Genes with synonymous substitution (dS) values greater than five were discarded from further analysis. Synonymous and nonsynonymous substitution values were averaged for all of the pairwise comparisons of each gene. Statistical analyses were implemented in R (http://www.R-project.org).

Mitochondrial Genome of Colonial Volvocine Algae
The general features of all of the available mtDNA to date among the colonial volvocine algae are listed in Table 1. The size of mtDNA ranged from 15 to 35 kb. Volvox carteri is the most complex organism and possesses the largest mtDNA size. However, in all colonial species, there was no correlation between organism complexity and mtDNA size. The GC content in all of the mtDNA in the colonial volvocine species was less than 45%. The GC content ranged from 33% to 45%. With the exception of Pandorina morum, sequenced in this study, which has five scaffolds, the other species have complete mtDNA or only one scaffold. The number of protein coding genes ranges from 7 to 12. Only seven genes were found in all of the species, namely: cob, cox1, nad1, nad2, nad4, nad5, and nad6. These genes were used for our downstream analysis. The number of rRNA genes ranged from 3 to 12. The rRNA genes were composed of two types of genes, namely: rrnL and rrnS. Normally, rrnL contains eight fragments, and rrnS contains four fragments. We noticed that Volvox carteri has 13 rRNA genes, because it has two copies of rrnL fragment 8. For the tRNA genes, most species have three tRNA genes, namely: trnM, trnW, and trnQ. Only Gonium pectorale has one additional copy of trnM. Volvulina compacta has one extra trnE gene. We only annotated one tRNA gene for Pandorina morum. The mtDNA synteny analysis is shown in Figure 1. All of the mtDNA of the colonial volvocine algae currently available were used for the analysis (except Pandorina morum, which has five scaffolds), with 13 mtDNA in total (12 species from nine genera). For Tetrabaena socialis (belonging to the family Tetrabaenaceae), Gonium pectorale (belonging to the family Goniaceae), and Pandorina colemaniae (belonging to the family Volvocaceae), considerable amounts of rearrangement and inversion have occurred. These three species belong to different families; this result is consistent with our previous study [7]. For the different genera in the family Volvocaceae (except for Tetrabaena socialis and Gonium pectorale, all of the other species we used belong to the family Volvocaceae), most of the genera showed a highly conserved synteny relationship, with a few exceptions (among Pandorina colemaniae, Volvulina compacta, and Colemanosphaera charkowiensis; between Pleodorina starrii and Volvox carteri). Species belonging to the same genera or the same species did not show any rearrangement or reversal.
Overall, the mtDNA structure in colonial volvocine algae showed a high variation among the different families, little variation among different genera from the same family, and it is highly conserved for species in the same genera or for different strains belonging to the same species.

Phylogenetic Analysis
For the supermatrix analysis, the phylogenetic trees inferred from the Bayesian and maximum likelihood analyses are the same. The ML tree was used to represent the results (Figure 2A). For the coalescent-based analysis, the resulting phylogenetic tree is presented in Figure 2B. Overall, the mtDNA structure in colonial volvocine algae showed a high variation among the different families, little variation among different genera from the same family, and it is highly conserved for species in the same genera or for different strains belonging to the same species.

Phylogenetic Analysis
For the supermatrix analysis, the phylogenetic trees inferred from the Bayesian and maximum likelihood analyses are the same. The ML tree was used to represent the results (Figure 2A). For the coalescent-based analysis, the resulting phylogenetic tree is presented in Figure 2B.

Phylogenetic Analysis
For the supermatrix analysis, the phylogenetic trees inferred from the Bayesian and maximum likelihood analyses are the same. The ML tree was used to represent the results (Figure 2A). For the coalescent-based analysis, the resulting phylogenetic tree is presented in Figure 2B.  When we compared the supermatrix and coalescent-based analysis results, the two methods showed different topologies among species within the clade Eudorina/Pleodorina/Volvox. Notably, the phylogenetic position of the genus Colemanosphaera, based on the coalescent-based analysis, is consistent with Nozaki et al. [23], but this phylogenetic relationship is not supported by our supermatrix analysis. Our results showed that Yamagishiella unicocca formed a clade. This clade is sister to the clade formed by all of the other Volvocaceae species with high support values. This phylogenetic relationship is different from all previous studies [20][21][22][23]. In phylogenomics, systematic errors become stronger when multiple genes are combined into a supermatrix, thus resulting in an incorrect phylogenetic relationship with a high support value [47]. A systematic error is mainly due to a compositional signal, rate signal, and heterotachous signal [48]. The LogDet distance takes irregular A, C, G, and T compositions into consideration [49]. The RY-coding strategy can discard fast-evolving transitions and make phylogenetic reconstruction less susceptible to multiple hits between lineages [44,50]. The covarion model allows for the rate at a site to change over its evolutionary history [51]. Here, we used LogDet distances, the RY-coding strategy, and covarion model to exam the systematic error. The results are shown in Figure 3. For the genera Colemanosphaera and Yamagishiella, all of our methods showed the same phylogenetic relationship with the supermatrix approach. Thus, the systematic error was not severe enough to affect the phylogenetic results of these two genera. However, the three methods showed different phylogenetic relationships in clade Eudorina/Pleodorina/Volvox and clade Pandorina/Volvulina. Thus, our phylogenetic analyses were generally affected by the systematic errors.
showed different topologies among species within the clade Eudorina/Pleodorina/Volvox. Notably, the phylogenetic position of the genus Colemanosphaera, based on the coalescent-based analysis, is consistent with Nozaki et al. [23], but this phylogenetic relationship is not supported by our supermatrix analysis. Our results showed that Yamagishiella unicocca formed a clade. This clade is sister to the clade formed by all of the other Volvocaceae species with high support values. This phylogenetic relationship is different from all previous studies [20][21][22][23]. In phylogenomics, systematic errors become stronger when multiple genes are combined into a supermatrix, thus resulting in an incorrect phylogenetic relationship with a high support value [47]. A systematic error is mainly due to a compositional signal, rate signal, and heterotachous signal [48]. The LogDet distance takes irregular A, C, G, and T compositions into consideration [49]. The RY-coding strategy can discard fast-evolving transitions and make phylogenetic reconstruction less susceptible to multiple hits between lineages [44,50]. The covarion model allows for the rate at a site to change over its evolutionary history [51]. Here, we used LogDet distances, the RY-coding strategy, and covarion model to exam the systematic error. The results are shown in Figure 3. For the genera Colemanosphaera and Yamagishiella, all of our methods showed the same phylogenetic relationship with the supermatrix approach. Thus, the systematic error was not severe enough to affect the phylogenetic results of these two genera. However, the three methods showed different phylogenetic relationships in clade Eudorina/Pleodorina/Volvox and clade Pandorina/Volvulina. Thus, our phylogenetic analyses were generally affected by the systematic errors.

Substitution Rate Estimation
In this study, we used the ML method implemented in PAML to calculate the synonymous and nonsynonymous substitution rate. This method is the most accurate method currently available for measuring substitution rates [45,52]. Substantial differences in synonymous substitutions were not noted among colonial species ( Table 2). The average dS values for all of the species varied from 1.97 to 2.85 for mitochondrial genes, and from 0.78 to 1.31 for chloroplast genes. All of the species showed higher average dS values in the mtDNA. Among them, Pandorina morum showed the biggest difference, with the dS values of the mtDNA genes being 3.5-fold times higher than for the ptDNA genes. The nonsynonymous substitutions for different species within the same genome were very similar ( Table 2). The average nonsynonymous substitution (dN) values for all of the species varied from 0.04 to 0.05 for the mitochondrial genes, and from 0.02 to 0.03 for the chloroplast genes. When we compared the dN values between the mitochondrial genes and chloroplast genes, we also found that all of the species showed higher average dN values in mtDNA. Specifically, the values for the mtDNA genes were 1.33-to 2.67-fold higher than the ptDNA genes. A different pattern was observed for the dN/dS ratio (Table 2). For instance, only three species (Chlamydomonas reinhardtii, Gonium

Substitution Rate Estimation
In this study, we used the ML method implemented in PAML to calculate the synonymous and nonsynonymous substitution rate. This method is the most accurate method currently available for measuring substitution rates [45,52]. Substantial differences in synonymous substitutions were not noted among colonial species ( Table 2). The average dS values for all of the species varied from 1.97 to 2.85 for mitochondrial genes, and from 0.78 to 1.31 for chloroplast genes. All of the species showed higher average dS values in the mtDNA. Among them, Pandorina morum showed the biggest difference, with the dS values of the mtDNA genes being 3.5-fold times higher than for the ptDNA genes. The nonsynonymous substitutions for different species within the same genome were very similar ( Table 2). The average nonsynonymous substitution (dN) values for all of the species varied from 0.04 to 0.05 for the mitochondrial genes, and from 0.02 to 0.03 for the chloroplast genes. When we compared the dN values between the mitochondrial genes and chloroplast genes, we also found that all of the species showed higher average dN values in mtDNA. Specifically, the values for the mtDNA genes were 1.33-to 2.67-fold higher than the ptDNA genes. A different pattern was observed for the dN/dS ratio (Table 2). For instance, only three species (Chlamydomonas reinhardtii, Gonium pectorale, and Pandorina colemaniae) showed higher dN/dS ratios in mtDNA. All of the other species had lower dN/dS ratios in mtDNA. However, the dN/dS ratios between the different genomes were very similar, and did not vary by more than two-fold.
We compared the substitution rates and dN/dS ratios between mtDNA and ptDNA in all of the species using the Wilcoxon rank sum test (Figure 4). All of the species showed significantly higher dN and dS values for mtDNA compared with ptDNA (p < 0.001). Regarding the dN/dS ratio, only five species (Chlamydomonas reinhardtii, Gonium pectorale, Pandorina colemaniae, Eudorina elegans, and Volvox carteri) showed significant differences between mtDNA and ptDNA.

Phylogenetic Analysis
In this study, we found phylogenetic incongruence between supermatrix and coalescent-based analyses. We also found phylogenetic incongruence between our results and previous studies. Thus, a portion of our phylogenetic results could not reflect the true evolutionary relationship between colonial volvocine algae. There are three main reasons for phylogenetic incongruence, namely: systematic error, stochastic error, and biological factors.
The reconstruction method cannot completely account for the properties of the data that would lead to systematic errors [53]. However, the LogDet distances, RY-coding strategy, and covarion model showed that a systematic error was not severe enough to affect the phylogenetic results of the genus Yamagishiella in our study. The stochastic error in the phylogenetic construction was caused by the limited length of the sequence used in the inference [53]. As we only discussed the validity of phylogenetic construction based on mtDNA protein coding genes in this study, and the total length of our concatenated dataset was only 5382 nt, our results may thus be affected by a stochastic error. Moreover, phylogenetic analysis based on unlinked loci is an appropriate way to reduce phylogenetic incongruence [54], but all of the genes we used are located in the mitochondrial genome, which does not combine with other genomes thus forming a single linkage group. Thus, the phylogenetic results of our study should only be considered merely as gene trees [55]. Here, we suspect that the stochastic error and linkage group we used may be partly responsible for the phylogenetic incongruence of the genus Yamagishiella between our study and previous studies.
The supermatrix and coalescent-based analyses showed phylogenetic incongruence in the genus Colemanosphaera. Our coalescent-based analysis for the genus Colemanosphaera is consistent with Nozaki et al. [23], but this phylogenetic relationship is not supported by the supermatrix approach. This incongruence may result from the different reconstruction approach. The supermatrix approach is the most accurate species tree reconstruction approach when incomplete lineage sorting (ILS) and horizontal gene transfer (HGT) are low [56][57][58], but this approach is less robust than coalescent-based approaches under high levels of ILS and HGT [59][60][61]. Here, we suspect that the mtDNA protein-coding genes may not reflect the true species tree as a result of biological factors (i.e., ILS and HGT).
In this study, we also found phylogenetic incongruence in the clades Eudorina/Pleodorina/Volvox and Pandorina/Volvulina. The genera in these two clades are all polyphyletic [62,63], and we only used a limited number of species. Thus, their incongruence will be assessed in future studies with more related species.

Substitution Rate Estimation
Drouin et al. [28] investigated the substitution rates in seed plants, and found that all seed plants showed higher substitution rates for ptDNA compared with mtDNA. However, Smith [26] analyzed the substitution rates in diverse lineages, and suggested that the opposite is true in algae. The results of our study are consistent with the results of Smith [26]. We also found that mtDNA showed a higher substitution rate than ptDNA. Two explanations were put forward to illustrate this phenomenon, as follows: the fidelity and efficacy of mitochondrial maintenance machineries are more variable than those of ptDNA, and fewer mtDNA genes allow for larger fluctuations in the mutation rate [26].
We also analyzed the dN/dS ratio. Most species showed no significant difference between mtDNA and ptDNA, indicating that protein-coding genes in the mitochondrial and chloroplast genomes are under similar selection pressure. Five species showed significant differences (p < 0.05) between mtDNA and ptDNA, but these species have similar average values, median values, and quartiles for the two counterparts (Table 2 and Figure 4). We noticed that ptDNA has more outliers (higher than upper quartiles) than mtDNA ( Figure 4). In these five species, we suspect that the limited number of ptDNA genes with high dN/dS ratios may be the main factor causing a significant difference between mtDNA and ptDNA.

Conclusions
In this study, we present six newly sequenced mitochondrial genomes of colonial volvocine algae. Our phylogenetic analysis showed that mitochondrial protein-coding genes are not suitable for reconstructing the phylogenetic relationship among colonial volvocine algae, given that the phylogenetic result may be affected by systematic errors, stochastic errors, and biological factors. We also analyzed the substitution rates of ptDNA and mtDNA among colonial volvocine algae. The results suggest that all species have higher mtDNA substitution rates compared with ptDNA. The dN/dS ratio is similar between ptDNA and mtDNA in most colonial volvocine species, indicating a similar selection pressure in the two counterparts. However, few ptDNA protein-coding genes have high dN/dS ratios, resulting in a significant difference between ptDNA and mtDNA in five colonial volvocine species.