Insights into the Structure of Rubisco from Dinoflagellates-In Silico Studies

Ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) is one of the best studied enzymes. It is crucial for photosynthesis, and thus for all of biosphere’s productivity. There are four isoforms of this enzyme, differing by amino acid sequence composition and quaternary structure. However, there is still a group of organisms, dinoflagellates, single-cell eukaryotes, that are confirmed to possess Rubisco, but no successful purification of the enzyme of such origin, and hence a generation of a crystal structure was reported to date. Here, we are using in silico tools to generate the possible structure of Rubisco from a dinoflagellate representative, Symbiodinium sp. We selected two templates: Rubisco from Rhodospirillum rubrum and Rhodopseudomonas palustris. Both enzymes are the so-called form II Rubiscos, but the first is exclusively a homodimer, while the second one forms homo-hexamers. Obtained models show no differences in amino acids crucial for Rubisco activity. The variation was found at two closely located inserts in the C-terminal domain, of which one extends a helix and the other forms a loop. These inserts most probably do not play a direct role in the enzyme’s activity, but may be responsible for interaction with an unknown protein partner, possibly a regulator or a chaperone. Analysis of the possible oligomerization interface indicated that Symbiodinium sp. Rubisco most likely forms a trimer of homodimers, not just a homodimer. This hypothesis was empowered by calculation of binding energies. Additionally, we found that the protein of study is significantly richer in cysteine residues, which may be the cause for its activity loss shortly after cell lysis. Furthermore, we evaluated the influence of the loop insert, identified exclusively in the Symbiodinium sp. protein, on the functionality of the recombinantly expressed R. rubrum Rubisco. All these findings shed new light onto dinoflagellate Rubisco and may help in future obtainment of a native, active enzyme.


Introduction
Ribulose1,5-bisphosphate carboxylase/oxygenase (Rubisco) is an enzyme employed by plants, algae, cyanobacteria and other autotrophic organisms to incorporate CO 2 into organic compounds, thus it is one of the key photosynthetic enzymes. Rubisco catalyses carboxylation reaction, during which it assimilates CO 2 and an oxygenation reaction, in which it oxidizes the substrate. In both reactions, the substrate is ribulose-1,5-bisphosphate (RuBP). Due to the fact that Rubisco's effectiveness of carboxylation is low, and that it also catalyses the unfavourable reaction of photorespiration, it is considered to be a limiting factor of photosynthesis. Consequently, Rubisco is the obvious target for the increase in agricultural production efficiency, and thus it is one of the best studied enzymes for this application [1]. Rubisco consists of at least two catalytic, large subunits (RbcL), and in some cases, of additional regulatory small subunits. To reach catalytic competence, lysine in the active site of Rubisco must first be carboxylated by a non-substrate CO 2 molecule, followed by the binding of a Mg 2+ ion. This process is called carbamylation and serves to position the substrate RuBP for an efficient electrophilic attack by the second CO 2 molecule that will be fixed in the Calvin-Benson cycle (CBB) cycle upon RuBP binding. The active site is Rubisco as templates. Then, we show similarities and differences, which we use to build an explanation for the unusual features of dinoflagellate Rubisco. In a basic experiment, we also show that one of the identified elements (an insert forming loop, exclusive for dinoflagellates) may influence Rubisco solubility.

Homologues of Form II Rubisco from Rhodospirillum Rubrum among Dinoflagellates
To find the best sequence for further modelling, we used the blastP tool to find homologues of the template R. rubrum Rubisco among dinoflagellates. As mentioned already, this protein is broadly accepted as a model form II Rubisco. The highest scoring entries are listed in Table 1. Homologues were searched using the blastP tool with the organism parameter defined to: Dino-flagellates taxid: 2864. Due to the high similarity of sequences between dinoflagellates, only the top 4 are listed in the table. Symbiodinium microadriaticum is listed here, as it is the name of an entry; however, in the hereby text we are using simply Symbiodinium sp., as it is a convention accepted in most of papers pertaining to dinoflagellates.
Heterocapsa triquetra showed the highest similarity of amino acid sequence to the R. rubrum sequence, as described by Query cover (97%, a number that describes how much of the query sequence is covered by the target sequence), E value (0.0, expected value, a number that describes how many times a match by chance in a database of that size is expected; the lower the E value is, the more significant the match) and percent identity (67.67%, a percent of identical amino acids in the same position of the sequence) [18]. The best studied Rubisco from dinoflagellates is the one from Symbiodinium sp., being the second with the highest score. It differs from the first hit by less than 2 in percent identity. Thus, we decided to choose Symbiodinium sp. as a case for further investigations in this paper.

Analysis of the Amino Acid Sequence of Dinoflagellate Rubiscos
To compare the primary structure of dinoflagellate Rubisco, we aligned sequences of Rubiscos listed in Table 1 on the R. rubrum template using Clustal OMEGA [14]. This comparison showed differences that might be crucial for further investigation of the eukaryotic form II Rubisco ( Figure 1A).
First of all, in our alignment dinoflagellate Rubiscos do not start with a methionine residue (like in R. rubrum), but with a leucine. The lack of an initial codon suggests that there might be a transit peptide encoded at the beginning of the rbcA locus, which encodes rbcL. Rubiscos from dinoflagellates are encoded in the nucleus, and therefore need to be transported into the chloroplasts. It was previously shown that there is an upstream sequence in the rbcA mRNA, with a pattern of conserved residues analogous to Euglena's Rubisco's small subunit precursor polyprotein [6]. Aranda and co-workers sequenced and analysed parts of the dinoflagellate genomes and transcriptomes, and identified this upstream sequence of the rbcA locus [8]. The second reason for the lack of methionine is the protein's encoding as a precursor polyprotein. This means that first result of translation is a longer peptide, bearing a transit peptide, and two or more proteins, which are separated with spacers. This pre-polyprotein trend occurs also in Euglena's proteome, where, for example, light-harvesting complexes consist as such, and are separated with a deca-peptide spacer [10].  As mentioned previously, more than 67% of the amino acid sequence is identical in aligned proteins. Most of the differences are equally distributed along the compared sequences. The charge distribution is similar; an isoelectric point of Symbiodinium Rubisco is slightly higher than that of R. rubrum enzyme (5.72 vs. 5.60). This is a result of a plus one negative and a minus one positive amino acid in the Symbiodinium sp. sequence. More notable might be the higher amount of cysteine residues in the dinoflagellate Rubisco. In the Symbiodinium sp. sequence, there are 9 such residues, which is almost twice their number (5) in R. rubrum. Notably, only two cysteine residues are conserved between R. rubrum and dinoflagellate Rubiscos (Cys59 and Cys180). Cysteine residues, although not involved directly in Rubisco activity, are known to be responsible for its redox regulation and conformational changes [3,19]. The importance of cysteine residues was also proven for Arabidopsis thaliana Rubisco; after oxidative inactivation, the enzyme was reactivated by redox treatment [20]. On this basis, we may hypothesise, that the higher content of Cys residues is responsible for possible oxygen-dependent inactivation of Symbiodinium sp. Rubisco upon isolation.
The most significant differences between dinoflagellate and R. Rubrum Rubiscos are the two insertions present in the dinoflagellate RbcL amino acid sequence ( Figure 1A, red rectangles). The first insertion contains three negatively charged amino acids in position 413, and the second insertion is made up of eight amino acids in position 425. Both inserts may be treated as one longer, dinoflagellate-specific motif. The possible role of those inserts will be further discussed on the base of constructed models.

Instability of the Enzyme
High instability of Rubisco from dinoflagellates is the main barrier for further improvement in understanding of this enzyme's function. We mentioned earlier that the enzyme's instability is most likely due to conformational issues. It is highly possible that for the folding and assembly of a holoenzyme, some chaperone proteins are needed. Such hypothesis was previously assumed, based on the fact that no precipitation of the protein or proteolysis was observed upon cell lysis [10]. Here, we used the Xtalpred tool to validate whether instability comes from the enzyme's disordered regions [21,22]. Xtalpred also allows for a calculation of crystallization probability based on the instability index and coil regions. Interestingly, Rubisco from Symbiodinium sp. belongs to the low crystallization classes, meaning that the crystallization of this protein might be successful ( Table 2). Its instability index is lower than 40, predicting the protein as stable. EP class (Expert Pool class, score from 1 (best) to 5 (worst)) is a prediction made by combining individual crystallization probabilities calculated for eight protein features into a single crystallization score. Based on this score, the protein is assigned to one of the five crystallization classes. RP (Random Forest Classifier, score from 1 (best) to 10 (worst)) has been extended with other protein features for example surface ruggedness, hydrophobicity, side-chain entropy of surface residues, and based on this score protein is assigned to one of the eleven crystallization classes.

A Template Selection for Structure Modelling of Symbiodinium sp. Rubisco Using the SWISS-MODEL Software
There are crystal structures of all known forms of Rubisco including RLPs. The first ever crystal structure of form II Rubisco was the one from Rhodospirillum rubrum [23]. Dinoflagellate Rubisco, as mentioned earlier, is also considered to be a form II Rubisco, based on its sequence homology. However, there is no crystal or NMR structure of this enzyme due to its high instability. Thus all that can be done now, is the structure modelling based on homology of known structures. A convenient tool for model prediction is SWISS-MODEL, a fully automated protein structure homology-modelling server [24,25]. As discussed already, we chose the Symbiodinium sp. Rubisco sequence for structure modelling. The first step consisted of identifying a proper template based on which homology model was to be built. The algorithms collect templates and list them together with relevant structural information that can be readily used to rank the templates and select the best one according to user-defined criteria. After manual inspection of the obtained results list, we chose the obvious hit of R. rubrum for further work. Surprisingly, among suggested templates, the Rubisco from Rhodopseudomonas palustris was shown to be the one with the highest similarity and best energetic parameters, and so we also included it as a template. This Rubisco is a unique hexamer with three pairs of catalytic large subunit homodimers around a central 3-fold symmetry axis [26]. Such facts also allowed us to hypothesize that the previous dogma of dinoflagellate Rubisco being a dimer, and not a higher quaternary structure, may not be true. A dimer was postulated on the basis of studies carried out in the 1990s, as well as, in the beginning of the 21st century and was not refined until now. Rubisco from R. palustris is an even more suitable template (compare Table 3) than the one from R. rubrum, based on the GMQE score (Global Model Quality Estimation). We chose to build the models on these two templates to verify whether Symbiodinium sp. Rubisco is a dimer or a hexamer. We built two models based on two different templates of the Rubisco protein, one from R. rubrum (L2 homo-dimer) and one from R. palustris (hexamer). Differences in parameters of both models are significant and show that the model built on the Rubisco from R. palustris is energetically more favourable. Model template alignment and the structures are presented in colours based on QMEAN model quality ( Figure 2). This allows the visualization of regions of a model that are either well or poorly modelled. Local quality is presented in blue and red colours, whereas blue presents a high quality of the modelled region and red shows poor accuracy. This value also represents the "degree of nativeness" of the structural features observed in the model. QMEAN Z-scores around zero indicate good agreement between the model and experimental structures of similar size. Scores of −4.0 or below represent models with low quality. In our case, the model based on the R. palustris Rubisco has a −1.13 QMEAN score, and the R. rubrum Rubisco based model a −4.37 QMEAN score meaning, that the first one shows the structurally closest model to the original one from R. palustris and has the highest quality. The accuracy of models may be a confirmation of an earlier hypothesis that dinoflagellate Rubisco is rather hexameric, in opposition to the previously suggested L2 type homodimer. There is only one poor quality region in our modelled Rubisco; this is the insert region with the peptide FGNISLSD. This insertion is conserved only among dinoflagellate Rubisco, thus there was no template available to model this fragment. be a confirmation of an earlier hypothesis that dinoflagellate Rubisco is rather hexameric, in opposition to the previously suggested L2 type homodimer. There is only one poor quality region in our modelled Rubisco; this is the insert region with the peptide FGNISLSD. This insertion is conserved only among dinoflagellate Rubisco, thus there was no template available to model this fragment.

Structure of the Active Site in a Modelled Rubisco from Symbiodinium sp.
There are two X-ray structures of form II Rubisco from R. palustris in PDB database: A structure of an activated CABP-bound form II Rubisco (4FL1) and of an activated apoenzyme with two sulphate ions bound (4FL2). For this project, we chose the 4FL1 structure, as we mentioned earlier this one has better parameters in terms of model building. This structure also contains CABP in the active centre. CABP is a naturally occurring sugar phosphate and a tight binding Rubisco inhibitor, causing the active site of carbamylated or decarbamylated enzyme to adopt a closed conformation [2,27]. Thus, the model we There are two X-ray structures of form II Rubisco from R. palustris in PDB database: A structure of an activated CABP-bound form II Rubisco (4FL1) and of an activated apoenzyme with two sulphate ions bound (4FL2). For this project, we chose the 4FL1 structure, as we mentioned earlier this one has better parameters in terms of model building. This structure also contains CABP in the active centre. CABP is a naturally occurring sugar phosphate and a tight binding Rubisco inhibitor, causing the active site of carbamylated or decarbamylated enzyme to adopt a closed conformation [2,27]. Thus, the model we built represented an activated, closed conformation ( Figure 3C,D). On the other side, based on R. rubrum Rubisco we built a model representing an activated, open conformation with a substrate, RuBP, bound in the active site ( Figure 3A,B). Comparison of all residues in the active sites of Rubisco from R. palustris, R. rubrum and our modelled structures of Symbiodinium sp. Rubisco showed that there are no significant differences except the open/closed conformation. All conservative amino acid residues of active sites among all forms of Rubisco have been noticed to be in the same positions (see Supplementary Information, Figure S1).

Analysis of a Possible Role of Insertions in the 413 and 425 Positions
The 413 insert consists of three amino acids (G/D, E, E) and extends a helix by one turn, while the 8-amino acid 425 insert in our model is a loop, exposed to water (Figure 4). The helix and the loop are in the C domain of the large subunit. A carboxyl terminus of Rubisco is the centre of the catalysis and has a unique conformation when is activated, and when it is bound with CABP [2]. However, the discussed inserts' location excludes direct involvement in catalytic activity of the enzyme, although it does not exclude involvement in regulation of its activity ( Figure 2B,D and Figure 4B,C). It does not seem to be involved in a dimerization interface between Rubisco's monomers, as well as in a oligomerization interface of higher-order oligomers. However, this motif is highly conserved among dinoflagellate Rubisco, suggesting that it plays an important role in these species.
built represented an activated, closed conformation ( Figure 3C,D). On t based on R. rubrum Rubisco we built a model representing an activated mation with a substrate, RuBP, bound in the active site ( Figure 3A,B). Com residues in the active sites of Rubisco from R. palustris, R. rubrum and our m tures of Symbiodinium sp. Rubisco showed that there are no significant diff the open/closed conformation. All conservative amino acid residues of acti all forms of Rubisco have been noticed to be in the same positions (see S Information, Figure S1).

Analysis of a Possible Role of Insertions in the 413 and 425 Positions
The 413 insert consists of three amino acids (G/D, E, E) and extends turn, while the 8-amino acid 425 insert in our model is a loop, exposed to 4). The helix and the loop are in the C domain of the large subunit. A carb of Rubisco is the centre of the catalysis and has a unique conformation wh and when it is bound with CABP [2]. However, the discussed inserts' loc direct involvement in catalytic activity of the enzyme, although it does n volvement in regulation of its activity ( Figures 2B,D and 4B,C). It does n involved in a dimerization interface between Rubisco's monomers, as w gomerization interface of higher-order oligomers. However, this motif is hig among dinoflagellate Rubisco, suggesting that it plays an important role in To gain a better insight into a possible function of the 425 insert (the 4 short for a such procedure), we performed a search using the blastP tool w dinium sp. insert as a template, to find any homological sequences. The sea only five hits of homological peptides (excluding obvious homology with late Rubisco), which are listed in Table 4.  Among found peptides there are two eukaryotic ones: a cubilin homologue from sophila willstoni, and a Heat Shock protein from Fasciola hepatica, as well as prokar ones from Proteobacteriaand Gemmatimonadetes. All of the found peptides (except for pothetical protein from Proteobacteria bacterium of unknown function) are chaperone teins that contribute to cellular response and ions uptake. This indicates a possible ro the insert in an interaction with an unidentified protein partner and may be therefo sponsible for the Rubisco enzyme stabilization in vivo. On this basis, we may also p late that the short, negatively charged 413 insert is an additional patch for binding the putative interaction partner.

Oligomerization Interface Analysis
The basic Rubisco functional unit is a homodimer. However, in many cases, dimers may form higher-level oligomers, which help to pack more molecules in the a able space, increasing net CO2 assimilation. Formation of an octamer is importan To gain a better insight into a possible function of the 425 insert (the 413 insert is too short for a such procedure), we performed a search using the blastP tool with the Symbiodinium sp. insert as a template, to find any homological sequences. The search resulted in only five hits of homological peptides (excluding obvious homology with the dinoflagellate Rubisco), which are listed in Table 4. Among found peptides there are two eukaryotic ones: a cubilin homologue from Drosophila willstoni, and a Heat Shock protein from Fasciola hepatica, as well as prokaryotic ones from Proteobacteria and Gemmatimonadetes. All of the found peptides (except for a hypothetical protein from Proteobacteria bacterium of unknown function) are chaperone proteins that contribute to cellular response and ions uptake. This indicates a possible role of the insert in an interaction with an unidentified protein partner and may be therefore responsible for the Rubisco enzyme stabilization in vivo. On this basis, we may also postulate that the short, negatively charged 413 insert is an additional patch for binding with the putative interaction partner.

Oligomerization Interface Analysis
The basic Rubisco functional unit is a homodimer. However, in many cases, such dimers may form higher-level oligomers, which help to pack more molecules in the available space, increasing net CO 2 assimilation. Formation of an octamer is important for higher plant Rubiscos (form I), as well as the recently described form I' (lacking the small subunit) from Anaerolineales [28]. Interestingly, some residues with potential to improve CO 2 fixation were identified in the oligomerization interface of Thermosynechococcus elongatus Rubisco [29]. In all cases, the oligomerization interface consists of hydrogen bonds and salt bridges.
Until now, Symbiodinium sp. Rubisco, and other dinoflagellate Rubiscos, were thought to be just homodimeric. Such conclusion was drawn based on their homology to the R. rubrum enzyme, which functions exclusively as a dimer. However, our study indicated that there is a high homology of dinoflagellate Rubisco to the same enzyme of R. palustris, shown recently to be a hexamer [30]. An indication of the possibility of a hexamer formation by Symbiodinium Rubisco may come from an analysis of the probable oligomerization interface. In Figure 5, we compared the molecular surfaces of Rubisco for R. rubrum, R. palustris and two models obtained for Symbiodinium sp. We found that only Rubisco from R. rubrum forms a dimer since the outer surface is mostly acidic, with a small amount of basic and hydrophobic patches. For the R. palustris enzyme, there is a clearly marked patch of basic and hydrophobic residues. The basic residues may easily form bridges with acidic ones over at the next dimer, while the hydrophobic strip may help to stabilize the binding, if matched to a similar one over at the partner molecule. Very similar patches are found over the Symbiodinium sp. Rubisco surface, indifferent to of the template used for modelling. This finding strengthens the possibility of the enzyme's hexamerization. amount of basic and hydrophobic patches. For the R. palustris enzyme, there is a clearly marked patch of basic and hydrophobic residues. The basic residues may easily form bridges with acidic ones over at the next dimer, while the hydrophobic strip may help to stabilize the binding, if matched to a similar one over at the partner molecule. Very similar patches are found over the Symbiodinium sp. Rubisco surface, indifferent to of the template used for modelling. This finding strengthens the possibility of the enzyme's hexamerization. For additional verification, we have calculated the theoretical energies of complex formation using the FoldX suite [31]. First, we identified the putative "between dimers" interfaces. These are created by interactions between monomers B and C (58 residues), D and E (57 residues), and F and A (57 residues). The FoldX output provides detailed parametrization of energy, responsible for each complex in the analysed structure; it also includes the internal dimer interface. In table S1, we summarized the binding energy of these two types of interfaces in template structures (R. rubrum, R. palustris), as well as models of Symbiodinium sp. and Δloop mutants of Symbiodinium sp. (lacking an insert of the 425 loop) and R. rubrum (with the same loop added). For comparison, we also included a model created with R. rubrum RbCL sequence on the R. palustris structural template. We also listed the electrostatic component of the binding energy, as we hypothesized that this might drive the interface formation. In the first attempt, we found that the energies were affected by a high contribution of the van der Waals clashes component; to avoid such artifacts, prior to the energy calculation, we attempted structure optimization in FoldX.
Of initial notice is the fact that the dimer stability of Symbiodinium sp. RbcL was significantly lower (so, binding is tighter), when the modelling template was R. palustris, than that of the R. rubrum protein structure (−48.93 kcal/moL vs. −34.07 kcal/moL). This is again an indication, that the R. palustris RbcL structure was the best template of choice for Symbiodinium sp. RbcL. Interestingly, this computational experiment also suggests that the 425 insert does not influence dimer stability of Symbiodinium sp. RbcL, but its introduction slightly destabilizes the R. rubrum protein.
The binding energy of the interface between dimers (responsible for the RbcL hexamer formation) is generally lower than the energy of dimer binding. For the X-ray confirmed hexamer, RbcL of R. palustris, it is on average at −5.86 kcal/moL (particular values are listed in Table S1). The electrostatic component of the binding energy is at −4.16 kcal/moL. As the opposite, there is the R. rubrum protein, as the X-ray confirmed dimer, For additional verification, we have calculated the theoretical energies of complex formation using the FoldX suite [31]. First, we identified the putative "between dimers" interfaces. These are created by interactions between monomers B and C (58 residues), D and E (57 residues), and F and A (57 residues). The FoldX output provides detailed parametrization of energy, responsible for each complex in the analysed structure; it also includes the internal dimer interface. In Table S1, we summarized the binding energy of these two types of interfaces in template structures (R. rubrum, R. palustris), as well as models of Symbiodinium sp. and ∆loop mutants of Symbiodinium sp. (lacking an insert of the 425 loop) and R. rubrum (with the same loop added). For comparison, we also included a model created with R. rubrum RbCL sequence on the R. palustris structural template. We also listed the electrostatic component of the binding energy, as we hypothesized that this might drive the interface formation. In the first attempt, we found that the energies were affected by a high contribution of the van der Waals clashes component; to avoid such artifacts, prior to the energy calculation, we attempted structure optimization in FoldX.
Of initial notice is the fact that the dimer stability of Symbiodinium sp. RbcL was significantly lower (so, binding is tighter), when the modelling template was R. palustris, than that of the R. rubrum protein structure (−48.93 kcal/moL vs. −34.07 kcal/moL). This is again an indication, that the R. palustris RbcL structure was the best template of choice for Symbiodinium sp. RbcL. Interestingly, this computational experiment also suggests that the 425 insert does not influence dimer stability of Symbiodinium sp. RbcL, but its introduction slightly destabilizes the R. rubrum protein.
The binding energy of the interface between dimers (responsible for the RbcL hexamer formation) is generally lower than the energy of dimer binding. For the X-ray confirmed hexamer, RbcL of R. palustris, it is on average at −5.86 kcal/moL (particular values are listed in Table S1). The electrostatic component of the binding energy is at −4.16 kcal/moL. As the opposite, there is the R. rubrum protein, as the X-ray confirmed dimer, with the dimerdimer binding energy of 9.41 kcal/moL (−0.50 kcal/moL of the electrostatic component). Positive binding energy of such high degree indicates that binding is thermodynamically unfavourable. The calculation for Symbiodinium sp. provided for a negative dimer-dimer binding energy (−1.15 kcal/moL), although higher, than the one of R. palustris protein (−5.86 kcal/moL). In fact, the value for Symbiodinium sp. may be even lower, as for one of the interfaces (A to B) the optimization did not eliminate all the clashes.
Intriguingly, the electrostatic component equalled to −3.47 kcal/moL, which is much closer to R. palustris than to R. rubrum. We may then hypothesize that indeed oligomerization into a hexamer is thermodynamically favourable and is driven by electrostatics.

The Loop of the RbcL from Dinoflagellate Has Measurable Impact on the Enzyme's Solubility
The novel identified insert 425, which appeared as a loop in the modelled structure, shows poor quality in the terms of energy accuracy. We decided to investigate whether this insert has an impact on solubility of RbcL. For this purpose, we designed two mutants: first with the loop removed from the dinoflagellate RbcL sequence, and a second, with the same loop inserted into RbcL from R. rubrum ( Figure 6). Then, we assessed the expression and solubility of such RbcL proteins.
lower, as for one of the interfaces (A to B) the optimization did not eliminate all the clashes.
Intriguingly, the electrostatic component equalled to −3.47 kcal/moL, which is much closer to R. palustris than to R. rubrum. We may then hypothesize that indeed oligomerization into a hexamer is thermodynamically favourable and is driven by electrostatics.

The Loop of the RbcL from Dinoflagellate Has Measurable Impact on the Enzyme's Solubility
The novel identified insert 425, which appeared as a loop in the modelled structure, shows poor quality in the terms of energy accuracy. We decided to investigate whether this insert has an impact on solubility of RbcL. For this purpose, we designed two mutants: first with the loop removed from the dinoflagellate RbcL sequence, and a second, with the same loop inserted into RbcL from R. rubrum ( Figure 6). Then, we assessed the expression and solubility of such RbcL proteins. The previous studies on RbcL from dinoflagellate suggested that this protein is not expressed in E. coli cells due to its high instability [5,10]. Surprisingly, the Symbiodinium sp. Rubisco turned out to be expressed in our E.coli system. Figure 7 shows the expression and solubility studies for all four proteins. At first sight, there was no significant difference in the amount of soluble protein in the cell lysate of E.coli expressing Symbiodinium sp. RbcL comparing to E.coli expressing R. Rubrum RbcL ( Figure 7A,B). Unfortunately antibodies against RbcL form II do not react with the E.coli expressed proteins for both R. rubrum and the dinoflagellate RbcL in denaturing conditions (after SDS PAGE analysis), so we could not clearly identify and quantify the RbcL bands. Therefore, we turned to Western blotting of a native PAGE gel, which allowed a proper detection ( Figure 7B.) As molecule's native PAGE mobility is not only mass-dependent, and we detected multiple bands, we launched a second direction electrophoresis (black arrows on Figure 7C show, which bands were chosen for the second direction electrophoresis). The molecular masses of Rubiscos from both R. rubrum and Symbiodinium sp. are expected to be around 51 kDa (as calculated based on amino acid composition), and the second direction electrophoresis produced a band at this level (indicated by the blue arrow on Figure 7C). There is no difference in the amount of protein at the 51 kDa level between R. rubrum WT and R. rubrum Δloop. The lower band of the native PAGE did not produce a band at the 51 kDa level after analysis by second direction electrophoresis (data not shown), and most probably is not a fully expressed RbcL peptide or its degradation product. The previous studies on RbcL from dinoflagellate suggested that this protein is not expressed in E. coli cells due to its high instability [5,10]. Surprisingly, the Symbiodinium sp. Rubisco turned out to be expressed in our E. coli system. Figure 7 shows the expression and solubility studies for all four proteins. At first sight, there was no significant difference in the amount of soluble protein in the cell lysate of E. coli expressing Symbiodinium sp. RbcL comparing to E. coli expressing R. Rubrum RbcL ( Figure 7A,B). Unfortunately antibodies against RbcL form II do not react with the E. coli expressed proteins for both R. rubrum and the dinoflagellate RbcL in denaturing conditions (after SDS PAGE analysis), so we could not clearly identify and quantify the RbcL bands. Therefore, we turned to Western blotting of a native PAGE gel, which allowed a proper detection ( Figure 7B.) As molecule's native PAGE mobility is not only mass-dependent, and we detected multiple bands, we launched a second direction electrophoresis (black arrows on Figure 7C show, which bands were chosen for the second direction electrophoresis). The molecular masses of Rubiscos from both R. rubrum and Symbiodinium sp. are expected to be around 51 kDa (as calculated based on amino acid composition), and the second direction electrophoresis produced a band at this level (indicated by the blue arrow on Figure 7C). There is no difference in the amount of protein at the 51 kDa level between R. rubrum WT and R. rubrum ∆loop. The lower band of the native PAGE did not produce a band at the 51 kDa level after analysis by second direction electrophoresis (data not shown), and most probably is not a fully expressed RbcL peptide or its degradation product.
ImageJ densitometry analysis indicated a lower amount of RbcL protein in Symbiodinium sp. With the deprived loop, compared to the WT version of the protein. On the other hand, the loop insertion did not change the solubility of the R. Rubrum Rubisco. These suggest that the loop is indispensable for Symbiodinium sp. RbcL, but has no positive impact on an already well soluble protein. ImageJ densitometry analysis indicated a lower amount of RbcL protein in S dinium sp. With the deprived loop, compared to the WT version of the protein. other hand, the loop insertion did not change the solubility of the R. Rubrum R These suggest that the loop is indispensable for Symbiodinium sp. RbcL, but has no p impact on an already well soluble protein.

Sequence Analysis
Sequence of the RbcL from Rhodospirillum rubrum [12niport number: P04718], t studied model of Rubisco form II, was used to search for Rubiscos among dinoflag using the blastp tool with default parameters set (Organism-dinoflagellate; taxi [18]. Next, found sequences of several RbcL from dinoflagellates were aligned usin tal OMEGA online tool [32]. Resulted RbcL sequences were then aligned to compa II Rubisco from eukaryotic dinoflagellates and prokaryotes. Finally, we chose S dinium sp. Sequence for structure modelling, due to its high level of homolog rubrum, but also because of the richest set of available literature data amongst dino late Rubiscos.

Crystallization Prediction
To verify whether, if purified, crystallization of the Symbiodinium sp. Rubisco

Sequence Analysis
Sequence of the RbcL from Rhodospirillum rubrum [uniport number: P04718], the best studied model of Rubisco form II, was used to search for Rubiscos among dinoflagellates using the blastp tool with default parameters set (Organism-dinoflagellate; taxid:2864) [18]. Next, found sequences of several RbcL from dinoflagellates were aligned using Clustal OMEGA online tool [32]. Resulted RbcL sequences were then aligned to compare form II Rubisco from eukaryotic dinoflagellates and prokaryotes. Finally, we chose Symbiodinium sp. Sequence for structure modelling, due to its high level of homology to R. rubrum, but also because of the richest set of available literature data amongst dinoflagellate Rubiscos.

Crystallization Prediction
To verify whether, if purified, crystallization of the Symbiodinium sp. Rubisco would be feasible we employed the Xtalpred tool for crystallization prediction [21,26,33,34].

Model of the Structure of Rubisco from Symbiodinium sp.
Structure of the Symbiodinium sp. was predicted by homology modelling using the SWISS-MODEL tool [24]. The online server was used for all steps of the modelling. Templates selected by the tool in Protein Data Bank (PDB) were manually inspected and two templates with the highest homology to Symbiodinium sp. Rubisco were used for modelling.

Computation of Chemical and Physical Parameters
Amino acid sequences of the Rubisco from Symbiodinium sp., R. rubrum and R. palustris were analysed to compare their chemical and physical parameters such as isoelectric point, instability index and aliphatic index using the online tool Protparam [35]. Calculation of energies was done with FoldXsuite 3 [31]. Densitometry analysis was done with ImageJ [36].

Construction of Expression Vectors pUC18RbcLrubrumLoop, pUC18RbcLdinoLoop
pUC18 expression vectors carrying the wild-type codon-optimized RbcL gene coding sequence of R. rubrum (GenBank: CAA25080) or S. microadriaticum (GenBank: OLP96161) were ordered from Genomed S.A., Warsaw, Poland. The latter was disposed of its chloroplastic signal peptide coding sequence. The R. rubrum ∆loop and the S. microadriaticum ∆loop mutants were generated by PCR-based site-directed mutagenesis of the expression vector (loop nucleotide sequence insertion and deletion, respectively), followed with the PCR product phosphorylation by the T4 PNK kinase (Thermofisher, Waltham, MA, USA), and a subsequent ligation by the T4 ligase (Thermofisher, Waltham, MA, USA). DNA sequences of all the resulting constructs used in this study were confirmed by sequencing (Genomed S.A., Warsaw, Poland). Primers for the PCR reaction were as follows in Table 5 (expression vector complement primer sequences are capitalized). Plasmids pUC18RbcLrubrumLoop, pUC18RbcLdinoLoop and plasmids with wild type Rubisco: pUC18RbcLRubrum, pUC18RbcLDino were transformed into the BL21 E. coli strain. Transformed cells were selected on LB-Amp medium (containing 100 µg mL −1 of ampicillin). Single colonies were grown in 2 mL LB-Amp liquid medium overnight at 37 • C, and 0.1 mL was used to inoculate 100 mL LB-Amp liquid medium. The cultures were grown at 37 • C to an absorbance at 600 nm of 0.25 before inducing with 1 mM IPTG overnight at 30 • C.

SDS-PAGE, Native-PAGE, Immunoblot Analysis, Protein Quantitation
Proteins were isolated from cells and separated on a Native-PAGE TGX 7.5% gel. Proteins were next blotted onto a PVDF membrane [37] and immunoprobed with anti-RbcL form II antibodies Agrisera ® AS15 2955(Gentaur Molecular Products BVBA, Kampenhout, Belgium). Bands chosen after immunoblot analysis where next cut from the gel and used for second direction electrophoresis in denaturing conditions (8% SDS-PAGE) [38]. For all PAGE analyses, the same amount of protein was used. Protein concentration was assayed using ROTI ® Nanoquant (Carl Roth GmbH, Karlsruhe, Germany), a modification of the Bradford method [39].

Chemicals
All used chemicals were pure for analysis. If not stated otherwise, they were purchased from (Carl Roth GmbH, Karlsruhe, Germany).

Conclusions
To conclude, we built a structural model of dinoflagellate Rubisco based on known form II homologs of this enzyme. Dinoflagellates, as mentioned, belong to the Eucaryota, but their Rubisco, classified as type II, is nuclearly encoded in three repeats, differently to other known eukaryotic Rubiscos of type I. This feature may reflect the evolutional history of the Rubisco enzyme, as dinoflagellate Rubisco shows characteristics of both eukaryotic and prokaryotic organisms. It should be kept in mind that this is an in silico study without crystallographic confirmation; however, it comes out with several indications, which may help in further studies. First, we confirmed that the catalytical site of the enzyme is conserved, and therefore is not an explanation for differences noted between dinoflagellate Rubiscos and its homologs from other organisms. Therefore, the experimentally observed loss of activity of isolated dinoflagellate enzyme must be linked to other structural features of the protein.
We found, that Rubisco from Symbiodinium sp. has twice as many cysteine residues as the Rubisco from R. rubrum. We postulate that the higher amount of cysteines, which are known to be responsible for redox regulation, might be the cause for high instability of dinoflagellate Rubisco. This observation suggests that the isolation of an active enzyme from a natural source may need additional optimization of redox conditions; the active enzyme expression in a heterological system may also require overcoming of the folding limitations.
Our analysis showed that the dinoflagellatae Rubisco is a hexamer (a trimer of dimers) rather than, as previously suggested, a L2 type enzyme. The indicated hexamer has a more complex structure than a simple dimer. This knowledge might help to obtain a stable purified enzyme, mostly by including chaperone proteins in the process, aiding in formation of a higher oligomer. We may hypothesize that these might be, at least in part, the chaperones alike to those of higher plants; however, it needs further experimental confirmation.
We also show that dinoflagellate Rubiscos contain a novel motif, consisting of a helix extension and a loop. Location of this motif excludes its direct involvement in a catalytical reaction, suggesting rather a role in interaction with an unknown protein partner of possible regulatory function. As a proof of concept, we expressed the Symbiodinium sp. RbcL without the loop, finding the protein solubility to be on a significantly lower level. This loop; therefore, maybe important for the interactions with other proteins, such as a possible unknown regulatory protein as well as chaperones. Again, this makes the dinoflagellate enzyme more similar to the eukaryotic Rubisco due to the similar need for a series of chaperone proteins in order to assemble into an active enzyme. All these findings bring us closer to explaining dinoflagellate Rubisco's surprising features. Full understanding of Rubisco characteristics will make possible reengineering it to gain a higher yield of CO 2 assimilation, what may benefit in higher crop yield and an overall improvement in biosphere CO 2 level.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/ijms22168524/s1, Figure S1: "Comparison of amino acid positions in active site regions of Rubisco from R. palustris, R. rubrum and Symbiodinium sp.", Table S1: "The comparison of the binding energies and its electrostatic component, calculated with FoldX suite for studied protein structures".