Chloroplast genomes of five Oedogonium species: genome structure, phylogenetic analysis and adaptive evolution

The order Oedogoniales within the single family Oedogoniaceae comprised of three genera, Oedogonium, Oedocladium, and Bulbochaete based on traditional morphological criteria. While several molecular phylogenetic studies have suggested that both Oedogonium and Oedocladium may not be monophyletic, broader taxon sampling and large amounts of molecular data acquisition could help to resolve the phylogeny and evolutionary problems of this order. This study determined five chloroplast (cp) genomes of Oedogonium species and aimed to provide further information on cp genome for a better understanding of the phylogenetic and evolutionary relationships of the order Oedogoniales. The five Oedogonium cp genomes showed typical quadripartite and circular structures, and were relatively conserved in their structure, gene synteny, and inverted repeats boundaries in general, except for small variation in genome sizes, AT contents, introns, and repeats. Phylogenetic analyses based on 54 cp protein-coding genes examined by maximum likelihood and Bayesian analyses using amino acid and nucleotide datasets indicated that both Oedocladium and Oedogonium are polyphyletic groups. A positively selected gene (psbA) was identified in the two Oedocladium species and the terrestrial Oedogonium species, indicating that terrestrial Oedogoniales taxa may have undergone adaptive evolution to adjust to the difference in light intensity between aquatic and terrestrial habitats. Our results enrich the data on cp genomes of the genus Oedogonium. The availability of these cp genomes can help in understanding the cp genome characteristics and resolve phylogenetic and evolutionary relationships of the order Oedogoniales.

Chloroplast (cp) genomes have been found to be ideal for phylogenetic analysis and molecular evolution studies owing to advantages such as low evolution rate and maternal inheritance [21][22][23][24][25], and plastome has been increasingly used for phylogenetic and evolutionary studies of green algae. For example, Claude Lemieux et al. [26] conducted cp phylogenetic analysis based on the cp genes of 61 chlorophytes and revealed that Trebouxiophyceae is not monophyletic. Zhang et al. [27] demonstrated the adaptive mechanism of sea-ice environment by analyzing the molecular evolution of an Antarctic sea ice alga Chlamydomonas sp. based on cp protein-coding genes. However, only four cp genomes of Oedogoniales are currently available in public databases [28,29], restricting the phylogenetic analysis and molecular evolution studies based on cp genomes of this group.
Nucleotide substitution rates are often used as the criterion to reflect selection pressure. While nonsynonymous substitution rates (dN) can cause amino acid change, synonymous substitution rates (dS) do not cause amino acid change. The dN/dS ratio is the measure of natural selection acting on the protein. According to Yang [30], dN/dS < 1 denotes negative purifying selection, dN/dS = 1 signifies neutral evolution, and dN/dS > 1 indicates positive selection [31]. As most of the plastid protein-coding genes undergo negative or purifying selection to maintain their function, they are conserved and have a low dN/dS ratio. However, some genes might undergo positive selection in response to environmental changes, consequently presenting relatively high dN/dS ratio [32][33][34].
In this study, the cp genomes of five Oedogonium species, were sequenced and an in-depth analysis of these genomes, including comparative analysis with previously reported Oedocladium and Oedogonium cp genomes, was performed. Furthermore, phylogenetic analysis and evolutionary study of the order Oedogoniales were conducted based on cp protein-coding genes and a positively selected gene was identified in Oedocladium species. The results of this study could be useful to understand the phylogenetic and evolutionary relationships of Oedogoniales.

Introns content and insertion sites
The introns content and insertion sites of the nine Oedogoniales cp genomes are listed in Table 2 and Supplementary Tables S1 and S2. The nine cp genomes significantly differed with respect to the introns content.
Oe. sp. (strain FACHB-3311) has the maximum introns content with 26 group I introns and 11 group II introns. When compared with the other six Oedogonium cp genomes, multiple intron losses were observed in the cp genome of Oe. crispum (strain FACHB-3310), with four group I introns in trnL(uaa), psbC, atpA, and psbD, respectively, and four group II introns in psbI, petD, psaC, and psaB, respectively. Besides, similar to O. prescottii, Oe. crispum also exhibited introns losses in psbA. Oe. sp.
(strain FACHB-3311) presented two additional group II introns in chlB and chlL, introns were first observed in them. All the nine cp genomes included group I introns in trnL(uaa), which is common across all algal lineages and is considered to originate from the common ancestor of cp [35]. The nine cp genomes showed a certain variation in insertion sites. The common group I introns in trnL(uaa) and group II introns in petB, psaC, and psbI (only strain FACHB-3311 lost the intron in psbI) showed the same insertion sites. With regard to the other genes with introns, the insertion sites in different species showed similarities and variations. For instance, in psbA,

Synteny analysis and average nucleotide identity analysis
ProgressiveMauve was used to analyze the Oedogoniales cp genomes synteny, with Oe. cardiacum used as a reference to compare gene order among the cp genomes ( Fig. 3). More than 19 locally collinear blocks (LCBs) were identified in the cp genomes of the nine species of Oedogoniales, including six taxa from Oedogonium and three taxa from Oedocladium. The nine cp genomes showed high degree of syntenic conservation on the whole, with Oe. capilliforme exhibiting high similarity to Oe. cardiacum, and Oe. dentireticulatum resembling Oe. sp. (strain FACHB-3311). However, some rearrangements and inversions were still observed among certain short LCBs mainly owing to the inversion or loss of introns. The genes order and number were almost identical except for that an inversion between trnE(uuc) and petL with a length of less than 3 kb and including the genes petD and trnR(ucg) was detected in O. carolinianum (MT364369) and O. carolinianum (NC_031510). The average nucleotide identity (ANI) of the nine species of Oedogoniales was calculated using FastANI (Supplementary Fig. S6). Oe. crispum showed high ANI with Oe. dentireticulatum and Oe. sp. (strain FACHB-3311) (90.64 and 90.56%, respectively), Oe. dentireticulatum was similar to Oe. sp. (strain FACHB-3311) with 92.57% ANI, and Oe. capilliforme was similar to Oe. cardiacum with 97.03% ANI.

IR expansion and contraction
The IR boundary regions of the nine species of Oedogoniales were compared as illustrated in Fig. 4. Oe. cardiacum and Oe. capilliforme (strain FACHB-3312) showed larger IRs reaching 35,000 bp, whereas Oe. crispum and O. prescottii exhibited smaller IRs reaching 13,284 and 12,808 bp, respectively. The IRs of all the nine cp genomes contained the same four protein-coding genes, three tRNAs, and three rRNAs. However, in Oe. crispum and Oe. sp. (strain FACHB-3313), an additional Accession number The coding proportion only includes all annotated protein-, rRNA-, and tRNA-coding regions; b Gene and CDS numbers do not include ORF genes; c The plusminus sign means number of genes in plus strain (left side of slash) or minus strain (rightside of slash). d Non-overlapping repeat elements were mapped on each genome with RepeatMasker using as input sequences the repeats ≥30 bp identified with Vmatch trnR(ccg) was observed between psbA and rbcL; Oe. cardiacum included two additional protein-coding genes (int and dpoB) and one tRNA (trnR(ccu)); and the IRa of four cp genomes included parts of the 5′-end of ccsA (390 bp in Oe. cardiacum, Oe. capilliforme, and O. prescottii and 383 bp in Oe. crispum).
The nine Oedogoniales cp genomes showed high conservation at four regional boundaries, with little variation. The LSC/IRb junctions (JLBs) in the cp genomes of Oe. cardiacum, Oe. capilliforme, O. prescottii, and Oe. sp. (strain FACHB-3311) were located in trnR(ucu); as a result, 2 bp of the 3′-end of this gene were a part of the IR region. In Oe. sp. (strain FACHB-3311), the IR region contained 6 bp of the 3′-end of trnR(ucu), and in the other five cp genomes, the LSC/IRb boundaries occurred between trnR(ccu) and psbA. The IRb/SSC boundaries in all the nine cp genomes occurred between trnL(caa) and psaA, and the SSC/IRa junctions were located in rpoA. The IRa/LSC junctions (JSAs) of the two O. carolinianum cp genomes occurred between psbA and ccsA, while those of the other seven genomes were located in ccsA, with 390 bp of the 5′-end of this gene being a part of the IR region in Oe. cardiacum, Oe. capilliforme, and O. prescottii, and 383, 388, 608, and 389 bp of the 5′-end of this gene being a part of the IR region in Oe. crispum, strain FACHB-3313, strain FACHB-3311, and Oe. dentireticulatum, respectively.

Phylogenetic analysis and adaptive evolution analysis
Phylogenetic assays based on 54 cp protein-coding genes were conducted using maximum likelihood (ML) and Bayesian analyses with amino acid and nucleotide datasets, which generated two kinds of phylogenetic trees showing the same results (Figs. 5 and 6). Phylogenetic trees based on amino acid and nucleotide datasets both indicated that the nine species of Oedogoniales clustered into three clades Oe. sp. (MW250873) formed a separate clade with absolute high support value, the two O. carolinianum clustered together and formed another clade, and the other six Oedogoniales formed the third clade. With regard to the third clade, the two datasets showed a little difference in the location of O. prescottii. Based     Fig. S7). The phylogenetic tree showed that species of Bulbochaete was separated with Oedogonium and Oedocladium with absolutely high support value, and the species of Oedocladium formed two branches separated by two species of Oedogonium, the five newly included Oedogonium species separated with each other distributed in the other small clades. All these results indicated that both Oedocladium and Oedogonium are polyphyletic, which is in accordance with that reported in a previous study [36]. Based on the ML method of 54 chloroplast proteincoding genes, the value of dN and dS were compared between terrestrial and aquatic species of Oedogoniales. (Supplementary Table S3). No genes showed significantly different between the two group of algae at the levels of dN and dS. The ML method is a pairwise approach to estimate the dN/dS ratio, a dN/dS ratio may indicate in one or both species, and some specific sites under positive selection may remain undetected [37]. Positive selection analysis was performed based on branch-site model, and the null and alternative models were compared. The null model considered that the foreground branch only has dN/dS = 1, and the alternative model assumed that sites on the foreground branch have dN/dS > 1 (positive selection). When the two Oedocladium species and MW250875 were labelled as the foreground branch, the FDR-adjusted P value of psbA was less than 0.05 (Table 3). Based on Bayes empirical Bayes (BEB) assay, it indicated that psbA may possibly contain sites under positive selection, with 291SER showing posterior probability higher than 99%. However, owing to the lack of related functional sites information on closely related species such as Chlamydomonas reinhardtii and Stigeoclonium helveticum in UniProt, the positively selected sites of psbA require further investigation.

Discussion
In this study, we investigated five Oedogonium isolates from China, of which strains FACHB-3309, FACHB-3310, and FACHB-3312 were identified as Oe. dentireticulatum, Oe. crispum, and Oe. capilliforme, respectively.  Strains FACHB-3311 and FACHB-3313 were considered to belong to the genus Oedogonium owing to their unbranched filaments; however, they could not be identified at species level owing to their lack of entire sexual characters.
Comparative analyses of the nine Oedogoniales cp genomes showed highly conserved structures and gene numbers. The cp genomes of the newly sequenced five Oedogonium species were found to share the same structure as the previously reported Oedogoniales cp genomes, and the structures of the tetrad were not altered, but were different from the other two orders in the OCC clade (the IR is obliterated in the reported cp genomes in Chaetophorales and Floydiella of Chaetopeltidales). It has been indicated IR loss may be a synapomorphy marking the common ancestry of Chaetophorales and Chaetopeltidales [38]. The total length of these cp genomes was observed to vary within a relatively large range, extending from 146,367 bp (Oe. crispum) to 204,438 bp (O. carolinianum), which may be the result of contraction and expansion of IR regions and proportion of non-coding sequences, such as the introns content. Furthermore, the nine cp genomes showed highly conserved protein-coding genes and rRNAs number; however, they presented a slight difference in the tRNAs content. With regard to the introns content, the nine cp genomes exhibited relative variation, and the number of group I introns significantly differed, mainly owing to the diversity in the introns in psbA. In particular, introns (group II) were observed for the first time in chlB and chlL in Oe. dentireticulatum. All the nine cp genomes retained the group I introns in trnL(uaa) and group II introns in petD and psaC, and shared the same insertion sites. With regard to the other genes with introns, the insertion sites of different species showed similarities and variations.
Synteny analyses revealed a relatively high degree of syntenic conservation among the nine cp genomes, and only one inversed segment was detected in O. carolinianum FACHB-2453 and O. carolinianum UTEX LB 1686. The other variations were mainly owing to the introns, and no structural variation was observed in the six Oedogonium species. The results of FastANI also supported the findings of synteny analyses, indicating that Oe. capilliforme had high similarity with Oe. cardiacum, and Oe. dentireticulatum resembled strain FACHB-3311.
IR regions are the most conserved regions in the cp genomes. Frequent expansions and contractions at the junctions of SSC and LSC with IRs illustrate the relationships among the taxa and have been recognized as evolutionary signals [39][40][41][42][43]. The nine species of Oedogoniales examined in the present study showed only a few variations at the junctions. When compared with the two O. carolinianum, O. prescottii showed higher similarities to the five Oedogonium species, and the five Oedogonium species were similar to each other. The IR regions of O. prescottii and Oe. crispum presented a contraction, when compared with those of the other Oedogoniales taxa, and the cp genomes of both O. prescottii and Oe. crispum exhibited the shortest length. Previous studies have indicated that IR expansion and contraction frequently result in variations in genome size, which can be applied to phylogenetics and genome evolution analyses [40,41,44], and gene conversion during speciation is considered to be responsible for small IR expansions or contractions [40,41,[45][46][47].
Phylogenetic studies based on 54 cp protein-coding genes assayed using ML and Bayesian analyses with amino acid and nucleotide datasets and 18S rDNA all showed that Oedocladium and Oedogonium are polyphyletic, which is in accordance with that reported previously. However, the support value based on nucleotide dataset and 18S rDNA was not very high at the basal node, probably owing to the lack of sufficient representative taxa for this group as well as different evolutionary rates of the amino acid sequence and nucleotide data. Previous studies have proposed that larger sample sizes can substantially improve the phylogenetic results [48].
Positively selected genes are known to play a key role in adaptation to different environments and speciation [49][50][51][52][53], and it is necessary to understand the adaptive evolutionary history of Oedocladium species. The results of the present study showed that 291SER of psbA may be under positive selection with posterior probability higher than 99%. The genus Oedocladium (terrestrial) is presumed to have partly originated from Oedogonium species, which grow on moist soil surface and present underground filaments with slightly unbranched rhizoids [9]. The psbA encodes the photosystem II reaction center protein D1, which is one of the two reaction center proteins of photosystem II. Photosystem II is the first link in the chain of photosynthesis, and captures photons and uses the energy to extract electrons from water molecules [54].It has been reported that the genes in the cp genome (including psbA) of Curcuma sp. show adaptive evolution to adapt to the changes in light conditions [55], and that the green alga Chlamydomonas sp. ICE-L underwent adaptive evolution to adapt to extreme polar environment [27]. We speculate that the Oedocladium species and terrestrial Oedogonium species could have partly originated from the aquatic Oedogonium species, and might have undergone adaptive evolution during this process to adapt to the difference in light intensity between aquatic and terrestrial habitats. Nevertheless, more genomic data, especially for terrestrial species, may help to verify these hypotheses and further understand the phylogenetic and evolutionary relationships of the order Oedogoniales.

Conclusion
The present study determined the cp genomes of five Oedogonium speciesand revealed that the overall structure and gene contents of the Oedogoniales cp genomes were relatively conserved, except for some variations in genome sizes, AT contents, introns, and repeats. Phylogenetic analysis based on 54 cp protein-coding genes and 18S rDNA genes all indicated that both Oedogonium and Oedocladium are polyphyletic. The positively selected gene in the two Oedocladium species was identified, and the terrestrial Oedogonium species were speculated to have undergone adaptive evolution to adapt to the difference in light intensity between aquatic and terrestrial habitats. These findings not only strengthen our understanding of Oedogoniales cp genomes, but also help us to comprehend the phylogenetic and evolutionary relationships of the order Oedogoniales.

Methods
Sampling, culture conditions, DNA extraction, and morphological observation The strains described in this study were isolated from water or soil samples, and have been deposited to the  [56]. An Olympus BX53 (Tokyo, Japan) light microscope equipped with an Olympus DP80 digital camera and CellSens standard image analysis software (Tokyo, Japan) were used for morphological examination. The characteristics of the five species were summarized in Table 1.
Library preparation, sequencing, genome assembly, and annotation A NEB Next Ultra DNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, USA) was used for preparing sequencing libraries, which were sequenced on an Illumina NovaSeq 6000 platform by a commercial provider (Novogene, Beijing, China). The methods of genome assembly and annotation have been described elsewhere [36,57]. The raw data were trimmed using SOAPnuke software [58] to remove the low-quality and the adapter sequences (the reads of the five species were with a mean length about 150 bp) and then assembled using SPAdes [59]. The resulting assembly contigs were considered to have originated from the cp genome if the (1) BLAST searches in publicly available cp genomes returned Chlorophyta species with significant e-values (1e-5); (2) GC content of the contigs was less than 45% (the GC content of previously sequenced green algal cp genomes is typically less than 45%); and (3) sequencing depth was more than 100-fold coverage. Subsequently, trimmed clean reads were aligned to the resulting assembly contigs using BWA-MEM [60]. If the reads mapped to two contigs, the order of the contigs was determined and one sequence was produced, which was confirmed by Sanger dideoxy sequencing. The cp genomes were initially annotated using GeSeq [61] with the reported Oedogoniales cp genomes as references. Protein-coding and ribosomal RNA genes were further polished using Blast with genes from the available Oedogoniales cp genomes. The tRNA genes were identified using tRNAscan-SE [62]. BLAST was used to refine the annotation results. Intron boundaries were determined by comparing intron-containing genes with homologs without introns, and intron subgroup affiliation was determined by modelling intron secondary structures [63,64] using RNAweasel tool [65]. Forward and palindromic repeats larger than 30 bp were searched using Vmatch software (http://www.vmatch.de/) with the options -f -p -l -h -allmax and masked in the genome sequence by RepeatMasker (http://repeatmasker.org) running under the NCBI/RMBLAST (2.9.0+) search engine (http://blast.advbiocomp.com). The annotated sequences have been deposited to the NCBI GenBank database under the accession numbers MW250871-MW250875 (corresponding to strains FACHB-3309-FACHB-3313, respectively). Genome maps were generated using OrganellarGenomeDRAW [66].

Evolutionary analysis
The CODEML program of PAML v4.9 [30] with the ML model (runmode = − 2, CodonFreq = 2) was used to measure the values of dS and dN, the analysis was based on 54 chloroplast protein-coding genes. Comparisons of the evolutionary rates were conducted using the two-tailed Wilcoxon rank sum test. The multiple testing was corrected by applying the false discovery rate method (FDR) [79].Branch-site model was utilized to find genes that possibly underwent positive selection. The improved branch-site model (model = 2, Nsites = 2) was used to detect signatures of positive selection on individual codons in a specific branch [80]. The three Oedocladium species and the terrestrial Oedogonium sp. (strain FACHB-3313) were set as the foreground branch. The null model assumed that no positive selection occurred on the foreground branch (fix_omega = 1, omega = 1), and the alternative model assumed that sites on the foreground branch were under positive selection (fix_ omega = 0, omega = 2). LRT were used to test model fit and Chi-square test was applied for examining the LG rps11, psbL, rpl14 GTR + G rps2, rps8, rps9, rpl20, rpl23, rps18

MTZOA ycf12
Xiong et al. BMC Genomics (2021) 22:707 P values. A correction was performed for multiple testing using an FDR criterion, and BEB method was employed to statistically identify sites under positive selection. Genes with an FDR-adjusted P < 0.05 were considered as putatively selected.