The Rubisco small subunits in the green algal genus Chloromonas provide insights into evolutionary loss of the eukaryotic carbon-concentrating organelle, the pyrenoid

Pyrenoids are protein microcompartments composed mainly of Rubisco that are localized in the chloroplasts of many photosynthetic organisms. Pyrenoids contribute to the CO2-concentrating mechanism. This organelle has been lost many times during algal/plant evolution, including with the origin of land plants. The molecular basis of the evolutionary loss of pyrenoids is a major topic in evolutionary biology. Recently, it was hypothesized that pyrenoid formation is controlled by the hydrophobicity of the two helices on the surface of the Rubisco small subunit (RBCS), but the relationship between hydrophobicity and pyrenoid loss during the evolution of closely related algal/plant lineages has not been examined. Here, we focused on, the Reticulata group of the unicellular green algal genus Chloromonas, within which pyrenoids are present in some species, although they are absent in the closely related species. Based on de novo transcriptome analysis and Sanger sequencing of cloned reverse transcription-polymerase chain reaction products, rbcS sequences were determined from 11 strains of two pyrenoid-lacking and three pyrenoid-containing species of the Reticulata group. We found that the hydrophobicity of the RBCS helices was roughly correlated with the presence or absence of pyrenoids within the Reticulata group and that a decrease in the hydrophobicity of the RBCS helices may have primarily caused pyrenoid loss during the evolution of this group. Although we suggest that the observed correlation may only exist for the Reticulata group, this is still an interesting study that provides novel insight into a potential mechanism determining initial evolutionary steps of gain and loss of the pyrenoid.

been lost many times independently during the evolution of photosynthetic organisms [4,6,7], including during the origin of land plants [7]. Although recent papers have demonstrated that some Rubisco-binding proteins directly regulate pyrenoid morphology [8,9], the molecular basis of the evolutionary loss of pyrenoids is largely unknown.
Based on transformational experiments of the unicellular pyrenoid-containing green alga Chlamydomonas reinhardtii, Meyer et al. found that absence of two C. reinhardtii helices of the Rubisco small subunit (RBCS) results in no pyrenoid formation even though the spinach RBCS helices are present in the Rubisco [10]. They suggested that hydrophobic interactions between Rubisco molecules within the pyrenoid via two helices on the surface of RBCS are correlated with pyrenoid formation [10]. A substantial amount of work has subsequently highlighted that the interaction of the linker protein "Essential Pyrenoid Component 1 (EPYC1)" with RBCS, specifically the two helices, is critical for pyrenoid formation [8,11,12]. Recently, Goudet et al. [13] examined RBCS sequences across chlorophyte and streptophyte green algae and concluded that specific residues in the RBCS helices [10] were not sufficient to explain the pyrenoid occurrence across the entire green algal phylum. However, the relationship between the hydrophobicity of the two RBCS helices and presence or absence of pyrenoids within a closely related lineage has not been studied.
The unicellular green algal genus Chloromonas belongs to Chloromonadinia, a strongly supported primary clade [14] in the order Chlamydomonadales (= Volvocales), and includes both pyrenoid-containing and -lacking species [6,15,16]. Within this genus, the Reticulata group is a small clade that includes at least three pyrenoid-containing species and two pyrenoidlacking species; based on the phylogenetic tree constructed and distribution of presence or absence of pyrenoids in operational taxonomic units, it has been suggested that pyrenoids have been lost twice during the evolution of this group [17]. The CO 2 -concentrating mechanism, chloroplast ultrastructure, and Rubisco distribution in the chloroplast have been studied extensively in several strains and species belonging to the Reticulata group [6,18]. Many amino acid substitutions were found in the Rubisco large subunit (RBCL) in the Reticulata group or the genus Chloromonas and a possible relationship between RBCL substitutions and loss of pyrenoids was suggested [15]. However, the nuclear rbcS genes in this group have not been studied.
Here, we determined the rbcS sequences of 11 strains of five species in the Reticulata group [17] using de novo transcriptome analysis and Sanger sequencing of cloned reverse transcription-polymerase chain reaction (RT-PCR) products. We found that the hydrophobicity of the RBCS helices was correlated with the presence or absence of the pyrenoid within the Reticulata group.

Phylogeny of 11 Chloromonas strains of the Reticulata group
The sister relationship between C. chlorococcoides and C. reticulata was more robustly resolved [with 1.00 posterior probability in Bayesian inference (BI) and 96-100% bootstrap values [19] in three other phylogenetic methods] than in the previous study [17] (Fig. 1). Although the bootstrap value by maximum likelihood (ML) method was low (52%), C. rosae was sister to the clade composed of C. chlorococcoides and C. reticulata [with 0.99 posterior probability in BI and 83-98% bootstrap values in maximum parsimony (MP) and neighbor-joining (NJ) methods]. It was suggested that pyrenoids have been lost twice during the evolution of this group (Fig. 1).
Obvious contradictions of the single markers in phylogenetic positions of C. rosae, C. difformis and C. typhlos were recognized, but only supported with low statistical supports in 28S ribosomal DNA-and P700 chlorophyll a-apoprotein A1 gene (psaA)-based trees (Additional file 1: Fig. S1). Only ML analysis (with 58% bootstrap value) resolved sister relationship between C. difformis and C. typhlos in the 28S ribosomal DNA tree. The psaA tree suggested sister relationship between C. difformis and the other four Chloromonas species with 0.95 posterior probability in Bayesian inference and 57% bootstrap value in NJ method (Additional file 1: Fig. S1). In contrast, the phylogenetic positions of C. rosae, C. difformis and C. typhlos supported with high bootstrap values (83-98%) in NJ and MP analyses of the concatenated data set ( Fig. 1) were also resolved in the ITS-2 tree with 90-96% bootstrap values in NJ and MP methods (Additional file 1: Fig. S1). In addition, the previous phylogenetic analysis based on the combined data set lacking ITS-2 sequences resolves the most basal position of C. typhlos with 51-71% bootstrap values in ML, NJ and MP calculations, but does not demonstrate 50% or more bootstrap values or 0.95 or more posterior probability in the four phylogenetic methods for supporting the sister relationship between C. rosae and C. difformis shown in the tree topology [17]. Thus, the ITS-2 sequence information is considered to contribute largely to the resolution of the basal phylogeny within the Reticulata group in the tree based on the present combined data set (Fig. 1).

Paralogs of rbcS in the Reticulata group
Multiple rbcS sequences were detected in the de novo transcriptome assembly or cloned RT-PCR-products of all strains of the Reticulata group (Additional file 2: Table S1). In order to resolve the diversity and phylogeny of the rbcS paralogs in the Reticulata group, phylogenetic analyses were carried out. The rbcS phylogenetic tree did not resolve basal relationships of the genes in the Reticulata group (Fig. 2). However, six or three homologs from C. typhlos (two strains) or C. difformis (one strain), respectively, constituted a monophyletic group (Fig. 2). Three paralogs from a single strain (SAG 15.82) of C. chlorococcoides belonged to a clade composed of only strains of the same species. Six of 10 C. reticulata rbcS sequences formed a clade which included three paralogs from a single strain (SAG 32.86). Therefore, the rbcS genes might have been duplicated after the origin of each of the four Chloromonas species. Alternatively, gene conversion [20] could be considered to explain the apparent monophyly of the paralogs from the same species within the Reticulata group. Interestingly, two paralogs of C. rosae (one strain) were separated from each other, and each was sister to C. reticulata homolog(s) (Fig. 2). Because of presence of multiple rbcS paralogs in all five species and the discrepancy in the phylogeny of C. rosae and C. reticulata between the rbcS phylogeny and the previous species trees [16,17], rbcS sequences were not used in the present study for phylogeny of species within the Reticulata group (Fig. 1). Table 1 shows the calculated hydrophobicity of RBCS helices A and B from the chlamydomonadalean species examined in this study. Figure 3 compares the hydrophobicity between pyrenoid-containing and -lacking species of Chlamydomonadales.
Comparing the hydrophobicity of the RBCS helices from various lineages within Chlamydomonadales showed that the hydrophobicity of the RBCS helices does not necessarily correspond to the presence or absence of a pyrenoid (Fig. 3b). The RBCS helices of pyrenoid-lacking snow Chloromonas species had relatively high hydrophobicities, whereas the values for Phacotinia, Radicarteria, Hafniomonas and Crucicarteria with pyrenoids in the chloroplast were low (Fig. 3b). Very recently, other protein factors were reported to contribute to pyrenoid formation in the Chlamydomonas reinhardtii chloroplast [8,9,11,12]. Consequently, the hydrophobicity of the RBCS helices is not the only factor that controls pyrenoid formation. All snow species of Chloromonas lack pyrenoids and constitute a relatively large, pyrenoid-lacking lineage, the subclade 2 of clade A [26] or snow algae clade  (Table 1). Numbers at the branches indicate bootstrap values (50% or more). For details, see Methods of the main text  [21,27], which is phylogenetically separated from the Reticulata group of Chloromonas. Thus, the common ancestral species of the extant snow Chloromonas species might have lacked pyrenoids in the chloroplast and diverged in the distant past; many changes in pyrenoid formation factors would have accumulated independently from the Reticulata group after divergence. Therefore, it is difficult to discuss pyrenoid presence/absence only in terms of the RBCS protein among the lineages within Chlamydomonadales. However, the evolutionary loss of pyrenoids within the Reticulata group is recognized as recent (Fig. 1),  [48] e Formerly identified as Chlamydomonas incerta [49] f Formerly identified as Chloromonas perforata [50] g Not identified as strongly supported primary clade because of its uncertain phylogenetic position [14] and the hydrophobicity of the RBCS helices differed significantly between pyrenoid-containing and -lacking species (Fig. 3). Therefore, during the initial stage of pyrenoid loss in the Reticulata group, changes in the hydrophobicity of the RBCS helices might have directly caused the disappearance of pyrenoids from the chloroplast.

Comparison of sister species with and without pyrenoids
As discussed above, various molecular factors can be considered regarding the accumulation of Rubisco proteins to form pyrenoids [8][9][10][11][12]. Thus, comparison between closely related species with and without pyrenoids should be helpful to resolve critical factor causing pyrenoid loss/gain during speciation between these two species. Among sister species of the Reticulata group,  Chloromonas chlorococcoides has and C. reticulata does not have pyrenoids (Fig. 1). To investigate differences in the amino acid sequences of the RBCS helices that may cause the difference in pyrenoid formation between these two species, the RBCS helices from these two species were compared: 11 sequences from four strains of C. chlorococcoides and 10 sequences from three strains of C. reticulata (Fig. 4). Within the 21 amino acid positions of helices A and B that were used to calculate hydrophobicity, one position differed markedly in amino acid hydrophobicity between these two species: the first position of helix B, corresponding to the 131 amino acid position in RBCS from Chlamydomonas reinhardtii [28]. The amino acid at this position in all of the C. chlorococcoides RBCS sequences was alanine (amino acid hydrophobicity = 1.8 [29]), while that in all of the C. reticulata RBCS sequences was proline (amino acid hydrophobicity = − 1.6 [29]) (Fig. 4). Therefore, a mutation of this codon (− 3.4 difference in amino acid hydrophobicity) might have significantly contributed to the loss of pyrenoids during the divergence of the ancestor of the pyrenoid-lacking species C. reticulata from a common ancestral species that may have possessed pyrenoids. However, we consider that the loss of pyrenoids may be based on the total hydrophobicity of 21 amino acids of helices A and B of RBCS within the Reticulata group (Fig. 3a). Thus, hydrophobicity of the other 20 amino acid positions may also contributes to presence or absence of pyrenoids in the Reticulata group. Although the pyrenoid-containing species C. typhlos has proline in the 131 amino acid position of two of six RBCS proteins (Fig. 4), total amino acid hydrophobicity is relatively high (Fig. 3a).

Discussion
In the present paper, we resolved possible correlation between the hydrophobicity of RBCS helices and presence/absence of pyrenoids in the Reticulata group of Chloromonas. This is possibly due to the unique fact that the Reticulata group shows presence and absence of pyrenoids within closely related species or even between sister species (Figs. 1 and 3). We also found that the hydrophobicity of the RBCS helices does not necessarily correspond to the presence or absence of a pyrenoid among the large lineages (strongly supported primary clades [14]) within Chlamydomonadales (Fig. 3b). It is thus clearly difficult to resolve the correlation between the RBCS amino acid sequences and presence/absence of pyrenoids across major lineages of chlorophytes and streptophytes [13]. The environmental conditions play an important role in pyrenoid presence/absence in some species (e.g. Volvulina steinii [30]). However, the Reticulata group of Chloromonas does not show variability in presence or absence of pyrenoids within a species when cultured under usual light/dark conditions [6,[16][17][18]. Thus, presence or absence of the pyrenoids in the Reticulata group is not directly affected by cultural or environmental conditions, but it is genetically determined. The present study clearly demonstrated part of such genetic differences in rbcS genes between pyrenoid-containing and -lacking species.

Conclusion
Recent extensive studies demonstrated that various molecular factors are possible to contribute to interaction between Rubisco and other proteins to form pyrenoids [8][9][10][11][12]. Loss of pyrenoids might have occurred many times in the distant past independently during the evolution of photosynthetic eukaryotes. Thus, it seems difficult to discuss the critical factor that might have directly caused the initial evolution of pyrenoid loss. Based on the use of the Reticulata group of Chloromonas, however, we here suggested that the hydrophobicity of the helices A and B of RBCS is a possible factor that might have caused the initial loss of pyrenoid during speciation between the pyrenoid-containing and -lacking species (Fig. 3a).  (Table 1). Dot means that amino acid in the position is the same as that in the top sequence. Eliminated amino acids by less than 15% exposure ratio are grayed-out. The first position of helix B was alanine in all of the RBCS sequences from Chloromonas chlorococcoides, while proline in that from C. reticulata (surrounded by the red solid-line frames) Although the decrease in such hydrophobicity may be the major factor for evolutionary loss of pyrenoids in the Reticulata group, presence or absence of the EPYC-1-like protein is totally unknown in this group. RBCL amino acid substitutions may be related to the presence or absence of pyrenoids in Chloromonas [15]. Transformation protocols have not been established in Chloromonas. Further molecular genetic studies are needed to resolve actual molecular bases for evolutionary loss of pyrenoids in the Reticulata group.

Phylogenetic analyses of 11 Chloromonas strains of the Reticulata group
Molecular phylogenetic analyses were conducted based on Makino et al. [17] with additional sequences of nuclear internal transcribed spacer 2 (ITS-2) (Additional file 2: Table S1), using MrBayes 3.2.7 [31] for BI, RAxML-NG 0.9 [32] for ML method, and PAUP* 4.0b10 [33] for MP and NJ analyses. The combined 7109-bp data matrix for nuclear 18S and 28S ribosomal DNA, ATP synthase β-subunit (atpB), and P700 chlorophyll a-apoprotein A2 (psaB) and psaA genes, and ITS-2 sequences from the 14 operational taxonomic units (11 Chloromonas strains of the Reticulata group and three outgroup species) was analyzed and available from TreeBASE (https ://www.treeb ase.org/treeb aseweb/home.html; study ID: S26516). For concatenating the data matrices, our previous studies showed that robust discrepancies in phylogenetic relationships within the Chloromonadinia clade were not detected among 18S rDNA, atpB and psaB-based trees [22,27]. We also here confirmed that there are no robust discrepancies (supported > 60% bootstrap values) among 28S rDNA, psaA and ITS-2based trees in the Reticulata group (Additional file 1: Fig.  S1). The outgroup species were selected according to the previous phylogenetic analyses [16,17,27]. The appropriate substitution models for partitioned analyses in BI and ML method were selected by the Bayesian information criterion in Modeltest-NG v0.1.6 [34] with "-T mrbayes" option. The applied substitution models were as follows: K80+I for 18S ribosomal DNA, K80+G4 for 28S ribosomal DNA and ITS-2, HKY+I for the 1st codon position of atpB, psaA and psaB, F81+I+G4 for the 2nd codon position of atpB, psaA and psaB, and GTR+G4 for the 3rd codon position of atpB, psaA and psaB. BI was performed as in the previous study [17] with 1,000,000 generations of Markov chain Monte Carlo iterations and discarding the first 25% as burn-in. In each analysis, the average standard deviation of split frequencies was below 0.01, indicating convergence. For ML analysis, 10 randomized parsimony starting trees were generated. MP analysis was carried out based on random additions of 10 replicates from a heuristic search using the tree-bisection-reconnection branch-swapping algorithm. For NJ analysis, GTR+I+G model was selected by the Bayesian information criterion in jModelTest 2.1 [35]. Bootstrap values [19] based on 1000 replications were calculated in ML, MP and NJ analyses.

RNA extraction, library construction and sequencing
Total RNA from 11 Chloromonas strains of the Reticulata group and two snow species of Chloromonas (Additional file 3: Table S2) was isolated with TRIzol Reagent (Thermo Fischer Scientific, Carlsbad, CA, USA), as described by Featherston et al. [38] using cultures grown during the light photoperiod. The RNA was then treated using a TURBO DNA-free Kit (Thermo Fischer Scientific) to exclude genome DNA contamination and measured using an Agilent 4200 TapeStation (Agilent Technologies, Santa Clara, CA, USA For the Carteria cerasiformis strains (Additional file 3: Table S2), total RNA was extracted from freeze-stocked cells using an RNeasy Mini Kit (QIAGEN, Venlo, the Netherlands) according to the manufacturer's instructions. To synthesize the library, 1 μg of total RNA was treated as follows. The ribosomal RNA was removed using a Ribo-Zero rRNA Removal Kit (Plant Leaf ) (Illumina) as per the manufacturer's protocol. Sequencing libraries were prepared using the NEBNext mRNA Library Prep Kit for Illumina (New England Biolabs) with the following modifications. First-strand synthesis was performed without fragmenting the mRNA. After second-strand synthesis, double-stranded cDNA was fragmented to an average length of 500 bp using a Covaris S2 sonication system (Covaris, Woburn, CA, USA). Paired-end sequencing (300 bp × 2) was then conducted using an Illumina MiSeq with a MiSeq Reagent Kit v3 (600 cycles).

De novo assembly
The number of paired-end reads sequenced using Illumina MiSeq was shown in Additional file 3: Table S2. For the Chloromonas species, sequence adapters and low-quality bases in the MiSeq reads were removed using Trimmomatic V0.38 [39]. Searching from both read ends, any base that had a quality value lower than 3 was removed. Sliding window trimming was performed with a 4-base window, and bases with quality scores under 15 were cut.
The processed reads of the Chloromonas strains were assembled de novo into contigs using Trinity V2.8.5 [40] (Additional file 3: Table S2). Using tblastn [41], the transcriptome libraries were searched for rbcS cDNA sequences, with the RBCS amino acid sequence of Chlamydomonas reinhardtii [28] serving as the search database. From the cDNA library, multiple contigs that contained the almost complete coding sequence (CDS) of rbcS were acquired (Additional file 2: Table S2).
The transcriptome reads of the two Carteria cerasiformis strains were filtered using CLC Genomics Workbench ver. 9.5 (QIAGEN) with following parameters: Phred quality score > 30; ambiguous nucleotides = 0; and removal of truncated reads less than 50 nucleotides in length. The filtered reads were assembled de novo using the CLC Genomics Workbench with the following parameters: automatic word size and bubble size; minimum contig length, 300 bp; correction of contig sequence by reads mapping; mismatch cost = 2; indel cost = 3; length fraction = 0.7; and similarity fraction = 0.9. Using the assembled contigs, the rbcS cDNA sequences were searched as described above (Additional file 3: Table S2).

Cloning and sequencing
Because almost all strains of the Reticulata group possessed several rbcS paralogs (including partial CDS) in the de novo assembly, the cDNA sequences of the rbcS genes were verified by Sanger sequencing of the RT-PCR products. The cDNA was reverse-transcribed from the total RNA used for paired-end sequencing by Illumina MiSeq as described above, with Superscript III Reverse Transcriptase (Thermo Fischer Scientific). Approximately full-length paralog sequences of rbcS (covering CDS of helices A and B) were amplified by PCR with KOD FX Neo (Toyobo, Osaka, Japan) using newly designed primers based on our transcriptome data (Additional file 4: Table S3). The PCR products were cloned for sequencing using a Zero Blunt TOPO PCR Cloning Kit (pCR-Blunt II-TOPO Vector, Thermo Fischer Scientific) and TOPO TA Cloning Kit (pCR 4-TOPO Vector, Thermo Fischer Scientific). Then, 307 base pairs of the clones corresponding to positions 156-465 of the Chlamydomonas reinhardtii rbcS CDS (accession number XM_001702357) (with one amino acid deletion) were determined using a BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA) and ABI Prism 3130 Genetic Analyzer (Applied Biosystems). To eliminate sequencing errors from the analysis, only identical sequences detected in at least two clones were used to calculate the hydrophobicity and phylogeny (Additional file 3: Table S2). The results by the Sanger sequencing partially conflicted with those of the assembled Illumina data, possibly due to the chimeric rbcS CDS resulting from the assembly of short similar sequences. Thus, only sequences of cloned rbcS were used in the present analyses (Additional file 3: Table S2).

Phylogenetic analysis of rbcS paralogs from 11 Chloromonas strains of the Reticulata group
The 307 bp of rbcS cDNA from the 11 strains of the Reticulata group and the two snow species (outgroup) of the genus Chloromonas (Additional file 3: Table S2) were aligned with MAFFT V7.429 [42], and the phylogeny was analyzed with MEGA X [43]. ML analysis with 1000 bootstrap replications [19] was performed based on T92+G+I model selected by the Bayesian information criterion in MEGA X. The alignment is available at Tree-BASE (https ://www.treeb ase.org/treeb ase-web/home. html; study ID: S26516).

Calculating the hydrophobicity of RBCS helices A and B
Following a previous study [10], 27 amino acids corresponding to positions 68-80 and 131-144 in the Chlamydomonas reinhardtii RBCS protein (including transit peptide; accession number XP_001702409) were regarded as helices A and B, respectively. The hydrophobicity of the two helices for each RBCS paralog was calculated as the sum of the hydrophobicity of the amino acid residues forming the helices, excluding embedded amino acid residues (i.e., not exposed to the surface). The amino acid positions corresponding to the embedded residues were investigated using the GetArea [44] (http:// curie .utmb.edu/getar ea.html); the three-dimensional (3D) structure Chlamydomonas reinhardtii RBCS [45] served as input. The exposure ratio was used to evaluate how "embedded" an amino acid was. The exposure ratio is the ratio of the side-chain surface area to the "random coil" value, i.e., the average solvent-accessible surface area of amino acid X in the tripeptide Gly-X-Gly in a set of 30 random conformations. Previous research regarded an amino acid with an exposure ratio below 15% as "embedded" [46]; hence, these residues were eliminated from the calculation of hydrophobicity of the RBCS helices. Consequently, we ignored six amino acids corresponding to positions 71, 75, 78, 134, 138, and 141 in the Chlamydomonas reinhardtii RBCS protein (accession number XP_001702409) within the 27 amino acids constituting the RBCS A and B helices (Additional file 5: Fig. S2). The Kyte-Doolittle hydrophobicity scale [29] was used for the hydrophobicity analysis.

RBCS sequences of other Chlamydomonadales species
The RBCS sequences of other species in nine strongly supported primary clades (i.e., Chlorogonia, Dunaliellinia, Hafniomonas, Moewusinia, Oogamochlamydinia, Phacotinia, Radicarteria, Reinhardtinia, and Stephanosphaerinia) and an "uncertain phylogenetic group" (Spermatozopsis similis) of Chlamydomonadales [14,24] were extracted from published databases ( Table 1). The sequences were manually checked, and those which were too short or possible cross-contamination were removed. The hydrophobicity of the RBCS helices was calculated as described above.