The atypical subunit composition of respiratory complexes I and IV is associated with original extra structural domains in Euglena gracilis

In mitochondrial oxidative phosphorylation, electron transfer from NADH or succinate to oxygen by a series of large protein complexes in the inner mitochondrial membrane (complexes I–IV) is coupled to the generation of an electrochemical proton gradient, the energy of which is utilized by complex V to generate ATP. In Euglena gracilis, a non-parasitic secondary green alga related to trypanosomes, these respiratory complexes totalize more than 40 Euglenozoa-specific subunits along with about 50 classical subunits described in other eukaryotes. In the present study the Euglena proton-pumping complexes I, III, and IV were purified from isolated mitochondria by a two-steps liquid chromatography approach. Their atypical subunit composition was further resolved and confirmed using a three-steps PAGE analysis coupled to mass spectrometry identification of peptides. The purified complexes were also observed by electron microscopy followed by single-particle analysis. Even if the overall structures of the three oxidases are similar to the structure of canonical enzymes (e.g. from mammals), additional atypical domains were observed in complexes I and IV: an extra domain located at the tip of the peripheral arm of complex I and a “helmet-like” domain on the top of the cytochrome c binding region in complex IV.


Results and Discussion
Purification of the mitochondrial complexes I, III and IV from E. gracilis. In a previous study we reported that the respiratory complexes of E. gracilis comprise many subunits which were also described in trypanosomatid respiratory complexes and, conversely, many subunits were lacking which are conserved among mammals and fungi 27 . In order to confirm this atypical subunit composition, we decided to further purify and characterize the subunit composition of complexes I, III, and IV. To this end, a purification protocol involving two chromatographic steps was developed (see Methods section for further details) to obtain enriched fractions of each of the three complexes. The fraction containing complex I was almost pure, as judged by the main 1.4 MDa complex observed after BN-PAGE analysis, while the fractions enriched in complex III (500 kDa) or complex IV (460 kDa) were slightly contaminated ( Fig. 1 and Suppl. Information Fig. S1). It is to note the estimated molecular masses determined here are slightly different from the values previously reported 27 .
The enriched complexes were applied to a 1D-BN/2D-glycine-SDS/3D-tricine-SDS gel system. The 3D gels thus obtained for each complex are shown in Figs 2-4. Compared to 2D BN-SDS PAGE analysis, the 3D gels offer a better resolution of the polypeptide components. The Coomassie-stained spots were excised from the gel and analyzed by tandem mass spectrometry (MS/MS). All the obtained sequences (Suppl. Information) were then analyzed in silico to search for possible homologs in protein databases (NRPS, TriTrypDB). The predicted physico-chemical properties for each subunit, such as hydrophobicity (GRAVY) and iso-electric point (IP), predicted structural features like transmembrane helices (TMH) and conserved domains (CD) were also determined by in silico approaches (Tables 1-3 and Suppl. Information). Finally, in order to investigate how the subunit composition of these complexes affects their overall structure, electron microscopy in combination with single particle analysis was performed. A total of ca. 2500-3000 raw images from each complex were recorded and analysed by electron microscopy. The particle projections from each complex were classified into several groups, out of which 8 best class averages were selected per complex (an average of 3000-4000 particles per class) (Figs 5-7). The main findings for each complex will be described and discussed in the following sections.
The atypical subunits build the extra module in the peripheral arm of complex I. With the above described 3D gel system, a total of 31 spots were obtained for complex I (Fig. 2), compared to the 22 spots SCIENTIFIC REPORTS | (2018) 8:9698 | DOI: 10.1038/s41598-018-28039-z obtained in our previous study by 2D BN/SDS-PAGE analysis 27 . Ten of these spots matched for a single polypeptide, eleven matched for two peptides co-migrating, one matched for four proteins and nine spots did not lead to identification in our database. Globally, this analysis revealed the presence of at least 45 polypeptides associated to complex I, with molecular masses ranging from 7.6 to 58.5 kDa, and 36 of them were identified by MS analysis (Table 1). Six polypeptides correspond to complex I core subunits (NDUFS1/2/3/6/8, NDUFV1), eight to the α-proteobacterial ancestor (NDUFA5/6/9/12/13, NDUFB10, CAG1/2), four are Euglenozoa-specific subunits (NDTB2/5/12/17), and fourteen proteins do not present homologs and thus remain as unnamed proteins (UP). This analysis also corroborated the association of two short domains related to DNAJ and the glyceraldehyde 3-phosphate dehydrogenase (GapC3) enzymes [erroneously annotated as glycerol-3-phosphate dehydrogenase (G3PD) in our previous study 27 ]. The ANT transporter and the Euglenozoa-specific ATP synthase subunit p18 were also detected bound to complex I (Table 1, subunits 16 and 30, respectively). The ANT is rather abundant in the mitochondrial inner membrane (12% of total protein) 38 , and therefore we cannot exclude contamination due   to hydrophobic interactions. The euglenozoan p18 subunit has been reported to be attached to the ATP synthase complex in many euglenozoan species 25,27,28,[39][40][41][42] and, recently, its position in the complex attached externally to the F 1 sector was determined 43 . The reason why subunit p18 is attached to complex I is unclear. The NDTB5 subunit (subunit 3, Table 1) was previously described as a paralogous of NDUFA9 subunit 27 . The euglenoid subunit presents a molecular mass higher (52.2 kDa) than the conserved subunit (38.7 kDa) (subunit 14, Table 1). This homology-pair characteristic is also present in the trypanomatid enzyme 44 , and thus suggests that euglenoids incorporated this new subunit by gene duplication.
Overall, these first results are similar to our previous study (Suppl. Information) and thus confirm the atypical subunit composition of Euglena gracilis complex I. The sum of the molecular masses from all the identified peptides (Table 1) is 1333 kDa, which is close to the value of 1.4 MDa estimated directly by BN-PAGE (Figs 1 and Suppl. Information S1). Additional subunits might also be present in the mature complex but have probably escaped from identification (see 27 for a discussion about the underlying reasons). Among those subunits not identified by mass spectrometry but previously identified at genomic level for Euglena complex 27 , there are subunits present in canonical complex I of mammals, fungi and plants, like highly hydrophobic core mitochondrion-encoded (NAD1, NAD4, NAD6) or nucleus-encoded (NDUFV2, NDUFA7, NDUFAB1, and NDUFB7) subunits (Suppl. Information), all inherited from the alpha-proteobacterial ancestor 45 . This value of 1.4 MDa for Euglena complex I contrasts with the ca. 1 MDa value described in many other organisms including mammals 46,47 , yeasts 48,49 , green algae 50 or flowering plants 51 , and prompted us to investigate the structure of Euglena complex I.
Negatively stained complex I adopts mainly three positions on the carbon support film (Fig. 5). The first four side-view projections (panels A-D) show that complex I has an "L shape" conformation, which is in agreement with the bacterial and mitochondrial complex I structures [e.g. 6,52 ]. The membrane arm is best seen in projections A and B (green arrow heads). The other projections correspond to the upper view (panel E) and to tilted side views (panels F-H). The comparison with the structure of bovine respiratory complex I (pdb: 5LDW 52 ) revealed two main differences: (i) a matrix-exposed protuberance attached to the membrane arm at a central position (visible in projections B and C, purple arrow heads) and (ii) an extra domain located at the tip of the peripheral arm (panels A-D and F-H, red arrow heads). A third putative extra region is located in the distal part of the hydrophobic domain (Panels E and F, orange arrow heads). However, we cannot exclude that this density is due to the presence of detergent micelles surrounding the membrane arm. All of these features are also highlighted upon comparison with the yeast Complex I (pdf: 4WZ7 53 ) (Suppl. Information Fig. S2). The first domain (purple arrow heads) probably corresponds to the gamma-carbonic anhydrase (CAG) domain, first described in complex I from flowering plant mitochondria 54 and later described in other organisms like the chlorophycean alga Polytomella sp. 55 or the amoeboid protozoon Acanthamoeba castellanii 56 . Members of the CAG family have been identified in all eukaryotic lineages but not in Opisthokonts (i.e. mammals and fungi) 56,57 . In this respect, two members (CAG1-2) have been found by proteomic approach in Euglena complex I ( Fig. 2 and Table 1). In contrast, the extra domain located at the tip of the peripheral arm (Fig. 5, red arrow heads) has no counterpart in the complex I structures described in representatives of other lineages. Interestingly, based on the presence of some additional subunits in trypanosomes, a fatty acid synthase (FAS) domain has been proposed as an additional complex I structural domain. Among the non-canonical subunits identified in trypanosomes 35,44 , NDTB2 and NDTB17 were also confirmed here as components of Euglena complex I. Neither NDTB2 nor NDTB17 subunits possess any putative transmembrane helices but comprise a functional domain related to fatty acid  Table 2 44 , but no extra density can be distinguished in this area in Euglena complex I. We propose that the FAS domain is therefore located at the tip of the matrix domain in Euglenozoa (Fig. 5, and Suppl. Information Fig. S2). It should be noted that a mitochondrial small acyl carrier protein NDUFAB1/ACPM is also found associated to complex I in mammals and fungi 61 but not in flowering plants 51 . This subunit has been found in Euglena at genome level (Suppl. Information). Altogether, these findings may indicate an ancient function related to fatty acid metabolism associated to complex I in Eukaryotes.
Overall conserved subunit composition and structure of Euglena Complex III. Euglena mitochondrial complex III was resolved into 10 protein spots of molecular masses ranging between 10.5 and 50.9 kDa (Fig. 3). Six corresponded to a single polypeptide in our database, one was found to comprise two different polypeptides, and two were not identified. Six of these polypeptides corresponded to canonical complex III subunits QCR1, QCR2, QCR7, CYT1, COB, and RIP1, whereas two polypeptides of about 20 kDa do not have homologs   Table 3.
in the databases and thus remained as unnamed proteins (UP) (   The reported structures for the eukaryotic dimeric complex III are composed of 10 to 11 subunits per monomer 7,62 . In Euglena, the sum of all the determined peptides (288 kDa per monomer, Table 2) is in good agreement with the size of a native dimeric complex (ca. 500 kDa, Fig. 1). The Euglena complex III particles classify into two principal groups on the carbon film (Fig. 6): Side-view projections (panels A-D) and slightly tilted side-view projections (panels E-H). The membrane region of the complex is evident in the side-view projections (Panels A-D green arrow heads). Panels A and B represent the side-by-side monomer placement, panel C represents the 90° rotation (one monomer in front of the other). The comparison with the structure of chicken dimeric complex III [pdb: 4U3F 63 ] explains all the projections obtained and corroborates the dimeric oligomeric state of this complex. The absence of additional electronic densities from any projection concurs with the absence of extra atypical subunits. In conclusion, the overall structure of E. gracilis complex III is highly similar to the one reported for other species. However, the lack of QCR8 coding sequence in Euglena nucleotide database 27 questions the complex III biogenesis in Euglena. QCR8, which has one TMH 63 , is required to form an early core subcomplex with COB and QCR7 during complex III biogenesis in S. cerevisiae 62,64 . The QCR7 subunit usually shields the quinone pocket of the CYT1 subunit from exposure to the aqueous environment on the matrix side 65 . Interestingly, Euglena QCR7 subunit presents an atypical N-terminal extension (~78 residues; Suppl. Information Fig. S3) with a putative TMH that may take the role of the missing QCR8 subunit. Another role for this QCR7 extension in Euglena could be to maintain the structural integrity of the quinone reduction site. Such a function has already been described for the atypical extension in the CYT1 subunit from Rhodobacter sphaeroides bc1 complex 66 . A subunit named QCR9 was also found in Euglena complex III by N-terminal sequencing 67 even though this subunit does not correspond to the canonical QCR9 subunit described in other species 27,67 . Altogether, this analysis confirmed the dimeric state of the purified complex and showed no atypical domains within the overall structure.

Atypical subunit composition of Euglena complex IV is associated to the presence of an extra domain in the cytochrome c binding region.
With the current 3D gel system a total of 14 spots were obtained for Euglena mitochondrial complex IV (Fig. 4), while it was previously resolved into 10 protein spots 27 . A single polypeptide could be identified in our database for eleven spots, one spot matched to four polypeptides of similar molecular masses and two spots remained unidentified. Three polypeptides correspond to classical complex IV subunits (COX1, COX3, and COX6B), three to Euglenozoa-specific subunits (COXTB2/4/5), six peptides do not present any similarity with other existing proteins and therefore remain as unnamed proteins (UP) ( Table 3 and Suppl. Information). Here again, no CD could be confidently identified for non-classical subunits (e-value threshold = 10 −5 ). Globally, our analysis allowed the identification of at least 16 polypeptides associated with Euglena complex IV with molecular masses ranging from 7.2 to 38.5 kDa. The stoichiometry of one subunit per monomer was proposed based on Coomassie staining 68 . Accordingly, the sum of all the associated peptides (365 kDa, Table 3) is lower than the molecular mass of 460 kDa determined for the whole complex by BN-PAGE (Fig. 1). Additional subunits might have escaped identification, such as previously identified core subunit COX2 or subunits identified at genomic level in Euglena like classical subunits (COX5A/5B/8 A) 27 , or even some of the originally proposed kinetoplastid complex IV subunits (COXTB1/6/8/10/12/16) 26 . This impressive shift in molecular mass when compared to the mammalian enzyme [200 kDa 46 ] is therefore due to additional subunits rather than to dimerization. In this respect, classification of the 460 kDa cytochrome c oxidase images obtained by single-particle analysis led to four principal groups (Fig. 7). All pictures depict asymmetric structures unlikely to be dimers: side-views (panels A and B), ~120° rotated views from A projection along a perpendicular axis to the membrane (panels C and D), and slightly tilted views (panels E-H). The membrane region of the complex is highly recognizable in the non-tilted projections (panels A-D green arrow heads). The similarity with the structure of the monomeric bovine complex together with the cytochrome c (pdb: 5IY5 13 ) was not obvious and the resulting overlays shown in Fig. 7 are subject to caution. In this regard, the opposite orientation of the complex with respect to domains located into the matrix and the intermembrane space led to more non-explained electronic densities at both side of the complex (Suppl. Information Fig. S2). Essentially, these comparisons brought two important pieces of information. They confirmed that the 460 kDa Euglena complex IV cannot be a dimer and is thus in a monomeric form. The overlays shown in Fig. 7 also highlighted a novel 5 nm extra density in the intermembrane space (red and yellow arrow heads). This extra density forms a "helmet-like" domain over the cyt c binding region. Because the projection of this extra domain varies across all the classes, we cannot exclude the possibility that in some of them this domain is not complete. For instance, the tip-like densities at the top of the domain (Fig. 7, yellow arrows) cannot be visualized on all pictures. Previous attempts to measure the in vitro oxidase activity using exogenous cyt c as electron donor revealed a specific requirement of Euglena complex IV for its endogenous cyt c 68,69 . This specificity was first explained by the atypical features found in the purified Euglena cyt c 70 even though other species could use the Euglena cyt c as electron donor in vitro 69 . Our results allow us to propose that the structure of the Euglena complex IV possesses a specific cavity for its endogenous cytochrome c.
Among the supernumerary subunits described in other eukaryotic species only Cox6b, the mammalian homolog of yeast Cox12, was identified. Cox6b/Cox12 is a structural but non-essential subunit in close contact with Cox2 and Cox3 subunits in the intermembrane space region 71 . Other classical subunits, such as Cox5A and Cox5B that have been identified in the previous genomic survey, are located in the matrix side of the complex with no TMH (Fig. 7, purple arrow heads), meanwhile Cox8A, an isoform of the ubiquitous Cox8B 72 , is located in the membrane in close contact with Cox1 subunit 13 . Six additional TMHs can be observed in the high resolution structures of this complex 13,14 , they correspond to subunits Cox4I1/6A2/6 C/7A1/7B/7 C which have not been identified at genomic or proteomic level in Euglena ( 27 and Suppl. Information). One putative TMH is predicted in five UP sequences (UP 7,9,11,13,15). These subunits could take the place of the classical subunits inside the membrane region to maintain the overall structure of the complex. The absence of putative TMHs among the three trypanosomatid subunits (CoxTB2/4/5) and the UP4/8/15 subunits may indicate that they participate to the 150 kDa "helmet-like" domain. Altogether, our analyses indicate that the atypical subunits of the Euglena complex IV may be involved in the construction of an atypical structure whose specific role remains to be elucidated.
Concluding remarks. The analysis of the subunit composition and the structure of proton-pumping respiratory oxidases (complexes I, III, and IV) of Euglena gracilis showed that this highly divergent organism (when compared to canonical yeast and mammal models) shares many atypical subunits described in respiratory complexes of trypanosomes 27 . While the canonical subunits and the lineage-specific subunits maintain the overall architecture of the respiratory complexes described in yeast and mammals, the lineage-specific subunits are presumably responsible for the atypical extra domains observed in the structure of monomeric complexes I and IV. Incidentally, the presence of these extra subunits/domains probably explains the shift in the molecular mass of complexes I and IV. One protein is also shared between complexes III and IV (UP7-CIII corresponds to UP9-CIV). Although we observed minor cross-contamination after the purification steps, the fact that this polypeptide is the only found in common between both complexes may suggest that it is involved in interactions that lead to the formation of the supercomplex previously observed 27 . Overall, the roles of these atypical subunits/ domains in enzyme activities or supramolecular associations remain to be elucidated.

Material and Methods
Algal strain, growth conditions and mitochondria isolation. Euglena gracilis (SAG 1224-5/25) was obtained from the University of Göttingen (Sammlung von Algenkulturen, Germany). Cells were grown in the dark under orbital agitation at 25 °C. Ethanol 1% was used as carbon source. The liquid mineral Tris-minimumphosphate medium (TMP) pH 7.0 was supplemented with a mix of vitamins (biotin 10 −7 %, B12 vitamin 10 −7 % and B1 vitamin 2 × 10 −5 % (w/v)). The cells were collected at the middle of the logarithmic phase by a 10-min centrifugation step at 7000 × g and stored at −70 °C until use. Mitochondria were obtained by differential centrifugation following the procedure previously described 28 and stored at −70 °C until use. Protein concentration was determined by the Bradford method (Biorad).
Purification of respiratory complexes. All steps were performed at 4 °C. Seventy five milligrams of mitochondrial proteins were solubilized with n-dodecyl-β-D-maltoside (DDM, 4 g detergent per g protein) in buffer A containing Tris-HCl 50 mM, amino caproic acid 50 mM, MgSO 4 1 mM, NaCl 50 mM, glycerol 10%, phenylmethylsulfonyl fluoride (PMSF) 1 mM, tosyl-lysyl-chloromethylketone (TLCK) 50 μg/ml (pH 8.4). The mixture was incubated with gentle stirring for 30 min, and centrifuged at 38,000 × g for 30 min. The supernatant was diluted three times in buffer A without NaCl and supplemented with DDM 0.01%. After a filtration step (0.22 μm) the sample was loaded on an anion exchange column (Mono Q HR 5/5, 1 mL) connected to an ÄKTA explorer 100 (GE Healthcare Life Sciences) equilibrated with the same buffer and washed until a base line was obtained. The column was washed with 50 mM NaCl in the same buffer (10 VC) and eluted with a 50-500 mM NaCl linear gradient (40 VC). Two milliliter fractions were collected and visualized by BN-PAGE.
The samples enriched with Complex I were pooled and concentrated with an Amicon Ultra-15 Centrifugal Filter 100 kDa (EMD Millipore) to a final volume of 500 µL and injected to a Superose 6 10/300 (GE Healthcare Life Sciences) previously equilibrated with buffer A containing 200 mM NaCl and DDM 0.01%. The elution was carried out at 0.3 mL/min, 0.5 mL fractions were collected and visualized by BN-PAGE. The samples enriched with Complex I were pooled separately and stored at −70 °C until use.
The enriched fractions with Complex III and IV from the anion exchange column were pooled separately and concentrated to a final volume of 500 µL. Each fraction was injected separately to the Superose column equilibrated with buffer A containing 300 NaCl and DDM 0.01%. The elution of each injection was carried out as described above, 0.5 mL fractions were collected and visualized by BN-PAGE, the fractions enriched with each complex were stored at −70 °C until use.
Non-denaturating and denaturating protein electrophoresis. Each complex was resolved by BN-PAGE using a 3-10% acrylamide gradient gel at 4 °C. The first dimension band was then excised from the gel and submitted to a 2D/3D Glycine/Tricine SDS-PAGE carried out as in 73 at room temperature. Briefly, the subunits of each complex were separated in a Glycine-SDS-PAGE (12% acrylamide) and then each 2D lane was excised and separated individually in a Tricine-SDS-PAGE (14% acrylamide).
Coomassie blue-stained protein spots were manually excised from the 3D gels and analyzed by mass spectrometry (MS) as described in 27 . Briefly, the Matrix Laser Desorption Ionization Analyses (MALDI) was performed on a 4800 MALDI time of flight (TOF/TOF) system (Applied Bisosystems). Internal digested fragments of trypsin as well as TOF/TOF Cal Mix 5 (AB Sciex) were used as internal calibration. For direct protein identification with MASCOT against our homemade database (available in https://figshare.com/s/57d2ba4ebfb-b472ae3de), protein scores greater than 60 were considered as significant (P < 0.05). To determine the molecular mass of each subunit, PageRuler plus pre-stained protein ladder (Thermo Scientific) were used as size molecular markers. The molecular mass of each protein spot was calculated from its migration distance by comparing it with the migration of molecular markers (the linear regression between the logarithm of the migration distance and the molecular mass (kDa) of the molecular markers has a R 2 of 0.989).
In silico analysis of the identified subunits from the respiratory complexes. To further characterize the sequences identified by MS, each protein sequence was submitted to a tBLASTn analysis against the expressed sequence tags (EST) database from Euglena gracilis (taxid: 3039) available in the NCBI server. The obtained translated nucleotide sequences were manually assembled to generate the longest possible polypeptide. The resulting polypeptides were submitted to similarity searches by BLASTp against the non-redundant protein sequences database (NRPS) (http://blast.ncbi.nlm.nih.gov/Blast.cgi), and against the Kinetoplastid SCIENTIFIC REPORTS | (2018) 8:9698 | DOI:10.1038/s41598-018-28039-z genomic resource database (TriTrypDB) (http://www.tritrypdb.org/tritrypdb/). The first methionine codon of each sequence was arbitrarily chosen as the putative start codon. The theoretical isoelectric point, molecular mass and grand average of hydropathicity (GRAVY) were determined using the algorithm ProtParam (http://web. expasy.org/protparam/), transmembrane helixes (TMH) were predicted using Phobius server (http://phobius.sbc. su.se/) and TMHMM Server v. 2.0 (http://www.cbs.dtu.dk/services/TMHMM-2.0/), local conserved domains (CD) were searched using the NRPS Blastp, DELTA-Blast and Conserved Domain Blast (http://www.ncbi.nlm. nih.gov/Structure/cdd/wrpsb.cgi), the e-value threshold for the in silico analysis results was 10 −5 , when a TMH or CD was found the topological localization inside the peptide was annotated. Clustal Omega (http://www.ebi. ac.uk/Tools/msa/clustalo/) was used for direct alignments of protein sequences.
Visualization of isolated complexes I, III and IV by transmission electron microscopy. 5 µl of each purified complex solution was absorbed onto freshly glow-discharged carbon-coated copper grids, the excess amount of sample was blotted with filter paper and subsequently stained with 2% uranyl acetate for contrast. Imaging was performed on a Tecnai T20 equipped with a LaB6 tip operating at 200 kV. The "GRACE" system for semi-automated specimen selection and data acquisition 74 was used to record 2048 × 2048 pixel images at 133,000 × magnifications using a Gatan 4000 SP 4 K slow-scan CCD camera with a pixel size of 0.224 nm. Single particles were analyzed with the Xmipp software (including multi-reference and non-reference alignments, multivariate statistical analysis and classification, as in 75 and RELION software 76 . The best of the class members were taken for the final class-sums.