Evolutionary analysis of cellular reduction and anaerobicity in the hyper-prevalent gut microbe Blastocystis

Blastocystis is the most prevalent microbial eukaryote in the human and animal gut, yet its role as commensal or parasite is still under debate. Blastocystis has clearly undergone evolutionary adaptation to the gut environment and possesses minimal cellular compartmentalization, reduced anaerobic mitochondria, no flagella, and no reported peroxisomes. To address this poorly understood evolutionary transition, we have taken a multi-disciplinary approach to characterize Proteromonas lacertae, the closest canonical stramenopile relative of Blastocystis. Genomic data reveal an abundance of unique genes in P. lacertae but also reductive evolution of the genomic complement in Blastocystis. Comparative genomic analysis sheds light on flagellar evolution, including 37 new candidate components implicated with mastigonemes, the stramenopile morphological hallmark. The P. lacertae membrane-trafficking system (MTS) complement is only slightly more canonical than that of Blastocystis, but notably, we identified that both organisms encode the complete enigmatic endocytic TSET complex, a first for the entire stramenopile lineage. Investigation also details the modulation of mitochondrial composition and metabolism in both P. lacertae and Blastocystis. Unexpectedly, we identify in P. lacertae the most reduced peroxisome-derived organelle reported to date, which leads us to speculate on a mechanism of constraint guiding the dynamics of peroxisome-mitochondrion reductive evolution on the path to anaerobiosis. Overall, these analyses provide a launching point to investigate organellar evolution and reveal in detail the evolutionary path that Blastocystis has taken from a canonical flagellated protist to the hyper-divergent and hyper-prevalent animal and human gut microbe.


INTRODUCTION
First described in humans by É mile Brumpt in 1912, 1 Blastocystis is possibly the most prevalent eukaryote colonizing the human gut. It is found in up to 100% of individuals in some populations, 2 and it is estimated that at least one out of every six humans worldwide could be carrying this organism 3 ; its prevalence in some groups of animals could be much higher. 4 Blastocystis is an obligate symbiont that has long been suspected of being a potential pathogen, but evidence for this is contradictory. 5 Indeed, it has been suggested recently that Blastocystis could, in fact, be a marker for a healthy human gut. 6 Blastocystis is also found in many other animal hosts, raising the potential for zoonotic transmission. One factor that may contribute to the uncertainty over both Blastocystis pathogenicity and importance of zoonotic transmission is its genetic diversity. Despite sharing morphological identity, humans have been demonstrated to harbor at least 12 distinct variants, known as subtypes (STs), four of which are common. 5,[7][8][9] These STs differ substantially from each other, with orthologous proteins sharing only 60% amino acid identity on average and genome comparisons revealing substantial differences in gene number. 10 At the time of description, the taxonomic affinities of Blastocystis were unclear and remained so for over 80 years. In the intervening period, it was variously described as a yeast, a sporozoan (older taxonomic term for the apicomplexans), and a flagellate cyst, among others. This lack of clarity was due in great part to the absence of useful morphological characters: spheres 5-10 mm in diameter, Blastocystis can be said to resemble soap bubbles or frog-spawn. As a result, it was usually listed as Incertae sedis in taxonomic schemes. It was only when DNA sequences became available that definitive links to other organisms could be made. When its relatives were finally identified, they were completely unexpected: Blastocystis is a member of the Stramenopila, 11 the lineage containing diatoms, large multicellular kelps, and relatively much smaller, heterotrophic biflagellates. The latter are considered to be the prototypical stramenopile morphology. 12 In order to understand the evolution of Blastocystis from a small biflagellated ancestor into today's non-flagellated intestinal symbiont, with the accompanying morphological simplification and metabolic adaptations, a suitable outgroup organism is needed. It has been clear since the first molecular identification of Blastocystis as a stramenopile that it is specifically related to the Slopalinida, a group of intestinal symbionts of various hosts comprising the Opalinidae and Proteromonadidae. The Slopalinida and Blastocystis together comprise the Opalinata. 13,14 One of the genera in the Slopalinida is Proteromonas, small biflagellate cells found commonly in the lower large bowel of amphibians and reptiles, and occasionally in rodents. 14 The best-studied species is Proteromonas lacertae, which has typical stramenopile appearance. It is 15-20 mm in length, with a pyriform cell body, two flagella (but without the characteristic mastigonemes 15 ), but with somatonemes (hairlike structures) on the posterior half of the cell. 14,16,17 Not only is the external appearance nothing like Blastocystis, but the internal cell structure also differs dramatically. P. lacertae has a single nucleus in close contact with a single large mitochondrion. The flagellar rootlet (rhizoplast) passes through the Golgi apparatus and a groove in the nucleus, ending on the mitochondrion. 14,16 This results in all these organelles being located near the base of the flagellum and close to the apical end of the cell. Both organisms have mitochondria with tubular cristae and form cysts with a single nucleus. However, trophic forms of Blastocystis have multiple nuclei and many mitochondrion-related organelles (MROs) distributed around the periphery of the cell together with the other organelles, including the Golgi apparatus. 18 Because Proteromonas is a member of the sister lineage to Blastocystis, structurally a more typical stramenopile, and yet is also a member of the anoxic animal intestinal microbiome, we have investigated the genome and cell biology of P. lacertae to better understand the reductive evolution and genomic peculiarities accompanying the transformation of Blastocystis into the highly divergent but dominant member of the protistan community in the human and animal gut microbiome today.

RESULTS
Microscopical investigations of P. lacertae Since Brugerolle and Bardele, 17 there has not been a comprehensive investigation of Proteromonas using modern microscopy techniques. Thus, to better understand the cellular structure of P. lacertae as a comparator to Blastocystis, we started with microscopical analyses using a combination of fluorescence, transmission, and scanning electron microscopy. Unlike Blastocystis ( Figure 1A), P. lacertae has an elongated cell shape that was confirmed using scanning electron microscopy ( Figure 1B). This revealed the two classic heterokont types of flagella (but lacking mastigonemes), the cortical ridges on the cell body, and the somatonemes on the lower end of the cell, consistent with past reports. 17 Anecdotally, a more circular form was also observed (Figure 1Cii), which we speculate could be a cell going through the process of encystation. Transmission electron microscopy confirmed the presence in P. lacertae of the various typical eukaryotic organelles (nucleus, endoplasmic reticulum [ER], Golgi apparatus), including a single mitochondrion network surrounding the nucleus (Figures 1Bv and 1Ci). This was further confirmed using MitoTracker labeling and captured using fluorescence microscopy ( Figure 1Bi). Given the highly divergent cellular structure of Blastocystis compared with the relatively canonical stramenopile form of P. lacertae, we decided to undertake an 'omics investigation to understand the cellular evolution of these two organisms.
The P. lacertae genome is substantially larger than that of Blastocystis The P. lacertae LA genome was assembled into a 52.3 Mb sequence consisting of 1,449 contigs with a maximum contig length of 864,525 bp and N50 value of 92,586 bp (Table 1). Genome annotation contains 35,706 gene models, of which 28,067 were supported by transcriptomic data. The gene set returned a BUSCO score of 85.6%, comparable with the best-annotated Blastocystis genomes 10,19,20 (Table 1). We also produced a transcriptome for the marine stramenopile Cafeteria burkhardae (formerly roenbergensis 21 ) as a genuinely free-living outgroup in comparative analyses; if a gene presents a reciprocal best match in the C. burkhardae transcriptome, this provides a means of confirming a definite Blastocystis loss if that gene is absent in Blastocystis but present in P. lacertae. The final transcript set of C. burkhardae consisted of 28,952 transcripts after decontamination, with a BUSCO score of 70.4%. The P. lacertae genome sequence has higher gene density (684 genes/Mb) and thus shorter average intergenic length (534.9 bp) compared with Blastocystis. No evidence was found for the widespread segmental duplications that are a distinctive feature of the Blastocystis genome. 10 Similarly, we did not observe the insertion of premature stop codons upon poly-adenylation of P. lacertae transcripts, as observed in Blastocystis. 22 Overall, the P. lacertae genome is roughly three to five times larger than a Blastocystis genome and contains around six times as many genes.
Blastocystis genomes are reduced in gene number and diversity relative to other stramenopiles The observed divergence in coding content could be due to gene gain in P. lacertae, gene loss in Blastocystis, or a combination thereof. To explore this, OrthoMCL was used to cluster the predicted proteomes of Blastocystis ST4 and ST7, P. lacertae, C. burkhardae, and a selection of other stramenopiles. To prevent poor gene models from influencing the clustering analysis, only genes with transcript support in P. lacertae were included. This reduced the number of genes to 28,067, with a BUSCO score of 75.1%. This approach established the phylogenetic distribution of gene clusters, distinguishing those that were species-specific (i.e., gene gains) from widely conserved genes that were absent in specific species (i.e., gene losses). OrthoMCL assigned 122,818 predicted protein sequences from 10 genomes or transcriptomes into 24,945 orthogroups, with an additional 52,806 sequences that did not cluster (see Table S1).
Of these clusters, 2,363 containing 13,585 sequences were unique to P. lacertae. In addition, 6,932 P. lacertae sequences did not cluster and were considered also species-specific, based on the current sampling. This analysis shows that up to 73% of the high-confidence P. lacertae gene set possessing transcriptomic support is species-specific and, therefore, suggests that much of the genome size discrepancy comes from unique genes in P. lacertae.
Nonetheless, besides these gains, gene loss has also contributed to the divergence between Blastocystis and P. lacertae. Figure 2A compares the proportions of all cluster types in Blastocystis ST7 and P. lacertae. For each organism, the genes missing from their genomes relative to the other (described as ''sister-lineage losses'' and listed in Table S1, sheets D-F for Blastocystis and sheets G-I for P. lacertae) are expressed as a proportion of their current gene number to reflect the scale of gene loss relative to the ancestor. After excluding species-specific gains, 3,161 P. lacertae genes were found to be missing from the Blastocystis ST7 genome (i.e., combined entries in Tables S1D-S1F, shown as indigo segment in Figure 2A); this is 52.5% of the total Blastocystis gene number (n = 6,020), or 93.7% of conserved Blastocystis genes (n = 3,372). Hence, almost as many ancestral genes have been lost from Blastocystis as retained; conversely, only 545 conserved genes are missing from P. lacertae, which is 1.9% of its high-confidence gene set that we are using here (n = 28,067) or 3.8% of conserved P. lacertae genes (n = 14,482). Thus, since sharing an ancestor, Blastocystis has lost many more conserved genes than P. lacertae, but gained far fewer new genes, resulting in a much greater reduction relative to its ancestral state.
To explore the functional consequences of these distinct evolutionary histories, we examined the functional terms associated with gene losses, identifying KEGG ontology (KO) and gene ontology (GO) terms that are significantly over-or under-represented relative to their frequency across the whole genome (Table S2). Figure 2A shows significant KO terms for gene losses, alongside conserved genes for comparison. Although conserved gene sets are naturally enriched for terms associated with core cell function, such as ''ribosome'' (KO03011) and ''transcription'' (KO03021), such terms are under-represented among Blastocystis gene losses (p < 0.01). Conversely, sequences associated with ''chromosome and associated proteins'' (KO03036) and ''cytoskeleton proteins'' (KO04812) are over-represented among Blastocystis gene losses (p < 0.0001) but not P. lacertae losses. KO04812 is associated with 13 gene losses, including intraflagellar transport proteins, dyneins, and kinesins. This association of gene loss with the motile cytoskeleton is also reflected among over-represented GO terms, the most significant of which is ''cilium'' (GO: 0044441; p < 0.0001) and ''ATP-dependent microtubule motor activity'' (GO: 1990939; p < 0.0001) (Table S2B).
Consequently, while the obvious disparity in genome size is mainly due to considerable P. lacertae gene gains, there is a definite asymmetry between the species in gene loss. This reflects a substantial loss of conserved gene functions in Blastocystis, potentially in line with its more simplified morphology and biochemical adaptations to the anaerobic/gut environment.

Article
Analysis of a large-scale flagellar protein dataset identifies candidate mastigoneme proteins and supports mastigoneme-somatoneme homology Enrichment of cytoskeleton-related GO terms among Blastocystis gene losses points to evolutionary changes associated with motility. Although there is both morphological and genomic variation, the flagellum and its associated molecular machinery are conserved across stramenopiles. 23,24 Our scanning electron microscopy shows well-developed flagella in P. lacertae (Figure 1Bvi), but no indication of flagella in Blastocystis (Figure 1Avi). Indeed, though different morphological forms of Blastocystis have been described with varying degrees of confidence, no flagellated stage has ever been reported, suggesting that flagellar motility has also been lost in this lineage. Our analysis of 16 proteins found previously 25 to be constant in flagellated organisms but absent in all non-flagellated organisms, found no credible candidates in Blastocystis but presence of 13/16 in P. lacertae (Figures S1A and S2; Table S3A). These data further increased our confidence that A B Figure 2. Comparative genomic analyses of orthogroups and flagellar proteins (A) Comparison of gene clustering for P. lacertae and Blastocystis. Predicted gene sets for P. lacertae (P; n = 26,100) and Blastocystis ST7 (B; n = 6,020) were each clustered using OrthoMCL. The pie charts show the proportion of genes falling into six categories. Gene clusters that were present in both B and P, as well as stramenopile outgroups (OGs), were categorized as ''conserved,'' as were clusters present in B or P (as appropriate) and OG. Clusters found in both B and P but not OG, as well as species-specific clusters and unclustered genes (assumed also to be species-specific), are also shown. These five categories cover all genes found in the genome. The sixth category, ''sister-lineage loss,'' is shown in the same pie chart to emphasize the scale of gene loss relative to contemporary genome size. This category includes those genes assumed to be lost from the P. lacertae or Blastocystis ST7 genome since their lineage separation. For example, the P. lacertae genome contains 3,161 genes that are conserved in other stramenopiles but absent from Blastocystis, and so assumed to have been lost from Blastocystis after separating from P. lacertae. When combined with the contemporary Blastocystis gene set, these losses are 36% of all genes, and 51% when Blastocystis-specific genes are excluded. Five KEGG orthology (KO) terms that are significantly enriched among conserved genes (right) or sister-lineage losses (left) in each organism are tabulated besides the pie charts. For each KO term, a hypergeometric test assesses the significance of the difference between the observed (O) and expected (E) incidences, with a p value adjusted for multiple tests using Bonferroni correction. Terms that are over-represented relative to their genomic frequency are shaded red, while under-represented terms are shaded blue. (B) Flow chart of stepwise homology searching for flagellar-associated proteins. This shows the datasets (circles with protein numbers), homology searching analyses (dark arrow), and filtering of results (light arrows). See also Figure S1B and Tables S1, S2, and S4A-S4F. Article Blastocystis truly lacks flagella, speculatively as a result of the specialization of living in the gut and due to the fecal-oral transmission mechanism. Importantly, we concluded that Blastocystis can be used as a de facto negative control for downstream analyses of stramenopile flagellar evolution. Stramenopiles were defined by the possession of tripartite hairs or mastigonemes (i.e., tinselation) on their posterior flagellum. 26 Despite this synapomorphy, there is a range of flagellar states within the group. The loss of flagella has taken place on at least two occasions, once in Blastocystis and once in pennate diatoms. 27 Moreover, there are a few taxa possessing flagella but lacking tinselation. The protein composition of the mastigonemes is poorly understood, with only three proteins having been localized to the structure. 28,29 Their exclusivity for this feature, and whether there are remaining components, is unclear. Notably, P. lacertae lacks tinselated flagella but its somatonemes have been proposed as homologous. 26 The P. lacertae genome thus provides a unique opportunity to identify candidate mastigoneme proteins and to assess this homology argument.
We performed a series of comparative genome analyses to identify a core set of flagellar-and mastigoneme-correlated proteins ( Figure 2B; Tables S4A-S4F). From a set of 592 proteins previously used for investigating stramenopile flagellar evolution, 30 a reduced set of 236 was first identified using Blastocystis as a negative filter to remove proteins that have promiscuous or additional non-flagellar functions. This set was then searched against diverse stramenopile genomes or transcriptomes chosen to represent the canonical tinselated state (11 taxa), the non-tinselated state (Incisomonas marina 31 and Halocafeteria seosinensis 32 ), non-flagellated state (Blastocystis and Phaeodactylum tricornutum), or somatonemal state (P. lacertae). We found 116 proteins in the majority of the flagellated taxa but absent in both Blastocystis and P. tricornutum, thus being more confidently flagella-associated. Of these proteins, 37 were found to be widely present but missing in I. marina and H. seosinensis, 32 hence representing mastigoneme-associated candidates that warrant molecular biological investigation. Notably, 30/37 were found to be present in P. lacertae, consistent with homology between somatonemes and mastigonemes.
Both Blastocystis and P. lacertae possess a conserved membrane-trafficking system Several GTPases in the Rab family have flagella-associated function, including IFT27, Rab23, Rab28, and RABL2. 33 A comprehensive molecular evolutionary analysis of the Rab complements across P. lacertae, Blastocystis, and a selection of stramenopile genomes was undertaken. Altogether we identified 40 Rab sequences from P. lacertae and assigned these to specific Rab subfamilies ( Figure S1B; Data S1), allowing us to deduce Rab complement in the last stramenopile common ancestor (LSCA) and providing context for presence and absence of components in Blastocystis. The LSCA is deduced not to have possessed five Rab proteins present in the last eukaryotic common ancestor (LECA) 34 : Rab4, Rab14, Rab20, Rab24, and Rab34 ( Figure S1B). Notably, however, P. lacertae and other stramenopiles encode the flagella-associated RABs, i.e., Rab23, Rab28, RABL2, and IFT27, while these are not found in Blastocystis. 10 The difference in the Rab complement between Blastocystis and P. lacertae raised questions regarding the conservation of other machinery in the membrane-trafficking system (MTS). Membrane-trafficking is critical for basic cellular function in eukaryotes and is important for pathogenic mechanisms in diverse protistan parasites, being responsible for the movement of cellular material between organelles, as well as import and export of material from the cell, and cell surface modulation 35 (inter alia) .
Comparative genomics and phylogenetics were used to identify and classify the membrane-trafficking machinery of P. lacertae. Overall, the P. lacertae genome encodes a relatively complete MTS, similar to that of free-living eukaryotes ( Figure S3; Table S3B). Notably, P. lacertae encodes the complete TSET complex and, using P. lacertae proteins as queries, we were able to find orthologs in Blastocystis (Data S2). This is the first instance of complete TSET complexes in stramenopiles, suggesting it was likely present in the LSCA. It also means that this complex can play a role in membrane-trafficking in Blastocystis, in contrast to previous reports. 10 Given its role as a primary modulator of clathrin mediated endocytosis in plants, 36 the presence of this complex has potentially important implications for understanding the cell biology of material uptake from the host in Blastocystis.
Furthermore, examination of the multi-subunit tethering complex complement identified three of the four proteins of the Dsl1 complex in P. lacertae but confirmed identification of only a single Dsl1 complex subunit in the majority of Blastocystis STs (Figure S3; Table S3B). In yeast, Dsl1 functions at the ER for vesicle tethering, but also peroxisome assembly. 37 Notably, paucity of Dsl1 complex components is correlated with the loss or modification of peroxisomes 38 and, consistent with this, peroxisomes have not been visualized in Blastocystis nor have any of the peroxin proteins (Pex), which are involved in peroxisome proliferation and assembly, been identified in the Blastocystis genomes. 10 The identification of Dsl1 machinery in P. lacertae raises the possibility that a peroxisomal organelle is present in this organism.

P. lacertae possesses the most rudimentary peroxisome ever reported
To test this possibility, we searched for Pex orthologs in the P. lacertae genome, identifying homologs for the peroxisomal membrane E3 ubiquitin ligase Pex10 and the farnesylated receptor of peroxisomal membrane proteins (PMPs) Pex19, as well as a possible homolog of the ubiquitin-conjugating protein Pex4 ( Figure 3A; Table S3C). Notably, while Pex10 and 19 are considered to be unequivocal informatic markers of peroxisomes, 39 we did not identify any other Pex proteins. Consistent with the lack of Pex proteins comprising the peroxisomal targeting signals 1 (PTS1) and 2 (PTS2) recognition machinery in P. lacertae ( Figure 3A; Table S3C), our searches for proteins bearing these targeting signals failed to identify any of the known peroxisome matrix proteins normally targeted by those methods. Although we did identify 126 and two proteins harboring PTS1 and PTS2 motifs, respectively (Table S5), these either had no known hits in the database (90 PTS1 and both PTS2) or were attributed to a variety of other cellular functions, and so we anticipate that the P. lacertae peroxisome does not function via proteins in its matrix. By contrast, examination of diverse additional stramenopiles revealed a relatively complete complement of peroxins. The

Article
Pseudofungi encode nearly all of the examined proteins, but even considering taxa outside of the Bigyra and Aureococcus anophagefferens, which appear to have degenerated their complement independently, we found an average of 14.8 of 17 examined proteins encoded, suggesting that the LSCA possessed a full peroxin set, consistent with previous data (Figures 3A and S4; Table S4G). However, we noted that within the Bigyra we could trace the losses of several peroxins upstream of P. lacertae and Blastocystis. Pex22 was not found in any of the bigyran taxa, while progressively smaller complements were seen within the opalozoan taxa examined ( Figure S4; Table S4G).
Most notably, in model systems (e.g., yeast and mammals), Pex3 (Pex16) and Pex19 function together for the incorporation of PMPs such as Pex10 into the peroxisomal membrane. Although Pex10 and Pex19 were confidently reported in P. lacertae and C. burkhardae, we mapped the loss of Pex3 to the base of the Opalozoa within Stramenopila ( Figure S4), with the caveat that Pex3 may display a low degree of conservation and its identification can be problematic using bioinformatics tools. 40 Nonetheless, examining the primary sequence conservation of the established Pex3 and Pex10 binding regions of Pex19 (Tables S6A and S6B), we found that the P. lacertae and C. burkhardae Pex3 binding regions of Pex19 are less conserved (9.5% and 12.2% average identity, respectively) compared with those orthologs from organisms possessing Pex3 (14.8%). By contrast, the Pex10 binding regions of Pex19 in P. lacertae and C. burkhardae are slightly better conserved (26.2% and 29.7% average identity, respectively) than when compared among Pex3-possessing taxa (22.5%). This is consistent with a degeneration of the Pex3-binding region but conservation of the Pex10-binding region in Pex19-possessing organisms that have lost Pex3. Overall, the P. lacertae Pex complement is minimal, but does suggest the presence of a peroxisome-derived organelle.
Because the in silico analysis suggested the possible presence of a minimal peroxisomal organelle, we used a multiphasic approach. Western blotting using two heterologous anti-Pex19 antibodies 41 revealed a cross-reacting band at 68 kDa in P. lacertae protein extracts ( Figure 3B) that corresponds to the predicted size of P. lacertae Pex19.
Pex3-Pex19 binding is the best-established mechanism of Pex19 membrane-association. However, an alternate mechanism has been proposed, whereby Pex19 is targeted to membranes via a C-terminal farnesylation. 42,43 Using the program GPS-Lipid, 44 we identified putative farnesylation motifs (CAAX) in the P. lacertae Pex19, which are conserved in the majority of the Pex19 orthologs (Table S6C), meaning that Pex19 retains the capacity to interact with PMPs. 42 Therefore, we used confocal microscopy to localize the binding locations of anti-Pex19 (Abcam) in P. lacertae cells, which revealed punctate localization within the cell and no co-localization with the nucleus ( Figure 3C). To increase resolution of this localization, we employed immuno-electron microscopy using the same antibody. This mainly resulted in gold particles localized in the periphery of single-membrane-bound bodies, which is consistent with a peroxisomal localization ( Figure 3D). To confirm these observations, we raised a Pex10 antibody against a specific peptide of the predicted protein. Western blotting and immunofluorescence microscopy demonstrated the specificity of the antibody (clear band at 59 kDa), a punctate localization, and co-localization with Pex19, consistent with the protein being present in the same organelle ( Figures 3B and 3C; Video S1, 0:00-0:50 min). These data suggest the presence of a peroxisomal organelle in P. lacertae, but also raise further questions regarding its function.
The two hallmark activities of peroxisomes are b-oxidation of fatty acids (FAs) and metabolism of reactive oxygen species (ROS). 45 Therefore, we searched the predicted proteome and genome of P. lacertae for these enzymes and compared the same pathways of C. burkhardae and Blastocystis. Consistent with the absence of PTS-recognizing peroxins in P. lacertae and peroxisomes in Blastocystis, 10 we did not identify any of the peroxisomally targeted FA b-oxidation enzymes in these two species ( Figure 4A). On the other hand, we reconstructed the full pathway in C. burkhardae. Surprisingly, the mitochondrial b-oxidation of FA is also incomplete in P. lacertae and Blastocystis, with 3-hydroxyacyl-CoA dehydrogenase, 3-ketoacyl-CoA thiolase, and trifunctional protein missing in the former and 3-ketoacyl-CoA thiolase and trifunctional protein missing in the latter, while mitochondrial b-oxidation is still operating in C. burkhardae ( Figure 4A).
There are several ROS metabolizing enzymes known from different cell compartments. The best-known peroxisomal ROS metabolizing enzyme is catalase. However, catalase was shown to be missing in the Blastocystis genome, 10,19 and we further did not identify this protein in P. lacertae or C. burkhardae. Superoxide dismutase was identified in P. lacertae and C. burkhardae, yet both were predicted to be mitochondrion-localized. Peroxiredoxin, an additional enzyme able to reduce H 2 O 2 , is similarly predicted to operate only in the C. burkhardae mitochondria (Table S6D). PXMP2 and PXMP4 ( Figure 4A), two additional peroxisomal proteins involved in ROS metabolism, are transmembrane and thus targeted to the peroxisome via the Pex19 system. Both are encoded in C. burkhardae and, notably, while we were unable to identify homologs of either in Blastocystis, we did identify a PXMP2 homolog in P. lacertae. A validated antibody was raised against the P. lacertae PXMP2 protein ( Figures 4B and 4C) and showed co-localization with Pex10 in immunofluorescence microscopy ( Figure 4D; Video S1, 0:51-1:40 min). Taken together, the evidence is consistent with a highly reduced peroxisome-derived organelle in P. lacertae, with the only known enzyme localized within being PXMP2, which is speculated to be involved in ROS metabolism.
(C) Cellular localization of Pex19 and Pex10 in P. lacertae cells by immunofluorescence. Rabbit anti-Pex19 antiserum or rat anti-Pex10 shows a discrete localization in the cells and co-localization of the two. DAPI stains the P. lacertae nucleus and mitochondrial DNA. Differential interference contrast (DIC) images show the cells used for immunofluorescence. Scale bar, 5 mm (rows 1, 2, and 4) and 20 mm (row 3). (D) Localization of Pex19 in P. lacertae cell by immuno TEM shows compartmental localization. Densities of labeling in different compartments of P. lacertae cells suggest that Pex19 is mainly localized in membrane vesicles. Scale bar, 5 mm and 200 nm (insert). See also Figure S4; Tables S3C, S4G, S5, and S6; and Video S1 (0:00-0:50).  Article It has been generally held that anaerobic and parasitic lineages have reduced peroxisomes. The recent description of the anaerobic peroxisomes in Archamoebae (Entamoeba histolytica, Mastigamoeba balamuthi, and Pelomyxa schiedti) are striking counter-examples and raise the possibility of alternate metabolic functions for the organelle in oxygen-shunning organisms. [46][47][48] Nonetheless, microaerophilic organisms such as Trichomonas and Giardia most prominently seem to have lost the organelles entirely, as apparently has Blastocystis. To our knowledge, the organelle present in P. lacertae is the most reduced, but putatively functional, peroxisomal organelle currently described and represents a tremendous opportunity to study a late intermediate stage in the evolutionary degeneration of this organelle in anaerobic lineages. Given the paucity of Pex proteins encoded in the P. lacertae genome but the presence of what appears to be a peroxisome-derived organelle, as well as the recent examples of peroxisomes in Entamoeba and Toxoplasma, 47,49 where the organelle was held not to be present, it may well be worthwhile re-examining some of the other organisms where the peroxisome has been presumed lost.
Blastocystis achieves a comparable metabolism to P. lacertae, but with reduced redundancy Given this minimal peroxisomal complement and the integrally linked nature of this organelle with the mitochondria, we next investigated metabolic pathways with a particular focus on those that are modulated by these two unusual compartments. The comparison showed that Blastocystis STs have retained largely similar metabolic capabilities to both P. lacertae and C. burkhardae, with 291 pathways shared between all Blastocystis STs, P. lacertae, and C. burkhardae (Tables S3D and S6D). The greatest discrepancy comes from the overall number of genes that mapped from each genome. There is a difference of 347 genes between Blastocystis ST1 and P. lacertae, which appears to be made up of redundant KO terms, albeit with a notable difference in the aspartate biosynthetic pathway. Blastocystis encodes the fewest genes of these pathways, which may suggest that Blastocystis STs have lost complexity from conserved metabolic pathways without compromising capacity. Despite the difference in genome sizes and the numbers of sequences mapped to KEGG between Blastocystis STs and P. lacertae, they contain remarkably similar repertoires of pathways. If there is a fundamental difference between them, it is with respect to the number of genes involved in each pathway, the ''gene richness'' of metabolism. Blastocystis seemingly achieves a near comparable metabolic capacity to P. lacertae, but with substantially fewer genes ( Figure 5A).
The most striking aspect of the metabolic comparison was found in the glycolytic pathway. It was previously shown that stramenopiles partitioned the second half of glycolysis in the mitochondrion. 50 We therefore checked the genome for the enzymes encoding glycolysis in P. lacertae to assess the presence or absence of possible mitochondrion-targeted enzymes. Similar to Blastocystis, 50 it seems that P. lacertae has replaced some ATP-utilizing enzymes for pyrophosphate utilizing ones. P. lacertae encodes all ten glycolytic enzymes. As is the case for all other studied stramenopiles, 50 glycolysis is branched and the enzymes for the second half of glycolysis (the C3 part) are located in both the cytosol and mitochondrion. Blastocystis, however, has lost the cytosolic branch and solely relies on the mitochondrial C3 branch. 50 Pyruvate kinase does not have a mitochondrial targeting signal in Blastocystis, but Blastocystis uses the pyrophosphate utilizing alternative to pyruvate kinase, phosphoenolpyruvate synthase (pyruvate dikinase), which does have a mitochondrial targeting signal. P. lacertae also seems to use phosphoenolpyruvate synthase, but this enzyme does not seem to have a recognizable mitochondrial targeting signal ( Figure 5B) although transcriptomic data suggest a short amino-terminal extension, which could function as targeting signal. Targeting to MROs has been shown to be non-canonical. 51 It is currently not clear how the last step from phosphoenol pyruvate to pyruvate proceeds in P. lacertae if the preceding steps are mitochondrial but the last one is not. Unlike Blastocystis, but like all other stramenopiles, P. lacertae contains a putative mitochondrial pyruvate carrier (MPC).
P. lacertae and Blastocystis have comparable MROs and associated aerobic metabolism Blastocystis has attracted attention for its MROs, which have been studied as an example of an intermediate stage between  26)  classical mitochondria and mitochondrial remnants. 52,53 Although the functions of aerobic (canonical) mitochondria are very well known, the functions of, and distinction among, anaerobic mitochondria, MROs, hydrogenosomes, and mitosomes is still blurred. 54 Our in silico predictions demonstrate that P. lacertae mitochondrial protein composition is similar to that of Blastocystis ST1, with 314 and 292 predicted proteins, respectively ( Figure 5A). Despite the morphological differences between the two organelles ( Figure 1), both MROs seem to have similar functions. Notably, the major distinction between the two organelles relates to the protein composition of the mitochondrial protein import machinery and proteins involved in organellar transcription and translation ( Figure 5A; Table S3D). P. lacertae shows a reduced mitochondrial protein import machinery compared with Blastocystis, lacking proteins predicted to localize in the outer membrane of the organelle (e.g., Tom40, Tom70). By contrast, P. lacertae encodes almost double the number of proteins involved in mitochondrial transcription and translation when compared with Blastocystis ( Figure 5A; Table S3D). Although the Blastocystis and P. lacertae mitochondrial genome complements have the identical number of protein coding genes, those of P. lacertae do encode more tRNAs and, notably, they do have distinctly different genomic organization (circular versus linear). 55 Whether the increased complement of transcriptional/translation machinery in P. lacertae reflects a requirement due to the linear mitochondrial genome organization is a matter for future molecular characterization.
Biochemically, anaerobic energy metabolism is the most striking difference between P. lacertae and Blastocystis MROs. The P. lacertae genome does not encode the [FeFe]-Hydrogenase and its maturase (HydE) that were shown to be present and localized in the Blastocystis organelle. 53 As the genes are also absent in C. burkhardae, the gene acquisitions parsimoniously took place in the Blastocystis lineage. Notably, attempts to show activity of this protein in Blastocystis have been unsuccessful, possibly due to incomplete machinery for maturation of the enzyme (it is lacking HydG and HydF 10 ). In addition, in silico predictions have revealed that the Blastocystis ST1 genome encodes multiple pathways for the decarboxylation of pyruvate into acetyl-CoA and CO 2 , 10,53 including the aerobic pyruvate dehydrogenase complex (PDC) and the anaerobic pyruvate:ferredoxin oxidoreductase (PFOR) and pyruvate:NADP + oxidoreductase (PNO). 10 Despite the absence of [FeFe]-Hydrogenase, P. lacertae encodes both enzymes for anaerobic decarboxylation (PFOR and PNO), with both having predicted mitochondrial targeting signals (Table S3D), and lacks all the genes coding for the aerobic complement (PDC and the pyruvate dehydrogenase kinases [PDK] 2/3/4 present in Blastocystis). In vitro, P. lacertae does not require inoculation into pre-reduced medium, in contrast to Blastocystis axenic culture (STAR Methods), implying that P. lacertae is more oxygen-tolerant. The PDC absence, while maintaining an anaerobic means for pyruvate decarboxylation, is puzzling and requires further investigation.
Like Blastocystis MROs, P. lacertae mitochondria harbor components of complex I and complex II of the electron transport chain, along with proteins involved in the (anaerobic) quinone metabolism, including the rhodoquinone biosynthesis enzyme RQUA 56 and alternative oxidase (AOX), which have been previously shown to associate with both complexes 57 ( Figure 5B). These organelles also contain pathways for amino acid metabolism, cofactor/vitamin metabolism (folate, B5, B12, steroid, and lipoate), FA biosynthesis, and an incomplete tricarboxylic acid cycle, as well as maintenance of a mitochondrial genome ( Figure 5). Like Blastocystis, P. lacertae encodes proteins of ISC assembly and export (e.g., ATM1) to support Fe-S assembly in the cytosol (CIA machinery) ( Figure 5). In addition to the components of this machinery, P. lacertae encodes a fused sulfur mobilization protein (SufCB) that in Blastocystis was shown to bind to Fe-S clusters (e.g., [4Fe-4S]) and was expressed under oxygen stress and localized in its cytoplasm. 58 In summary, the P. lacertae MRO seems to have a patchy conservation of anaerobic metabolism, while maintaining similar or more reduced functions when compared with the Blastocystis organelles. Together with the peroxisomal data, P. lacertae seems to have a more aerobically inclined metabolism, and thus could represent an intermediate stage between the microaerophilic stage and the more advanced anaerobic metabolism that is found in Blastocystis.

DISCUSSION
P. lacertae is a morphologically typical stramenopile, the genome of which offers a better comparison to the derived condition of Blastocystis than previous stramenopile genomes. It is the most closely related stramenopile to Blastocystis sequenced to date and is also adapted for life in the gut. In providing some indication of the ancestral state, it shows how the Blastocystis genome is genuinely small by stramenopile standards. 10 The reduced size is due to both the loss of specific cellular functions, such as motility and peroxisomes, as well as a profound genome-wide streamlining that caused as many genes to be lost as were retained in the Blastocystis lineage since divergence from the last common ancestor with Proteromonas. Genomic reduction in the Blastocystis genomes has influenced almost all aspects of cellular physiology, but, in The cartoon illustrates the proposed mechanism of evolutionary contingency that, due to shared pathways of lipid metabolism and defense against reactive oxygen species, whichever organelle starts to degenerate first places a constraint on the reductive evolution of the other until such time that these metabolic requirements become lifted through parasitism or full anaerobiosis. In the case of P. lacertae/Blastocystis the degeneration of the peroxisomal organelle manifests as a more conserved MRO. In other lineages the opposite may be the case. The span of organelle reduction is shown for peroxisomes (above the line) and MROs (below the line) with the line linking them representing their shared metabolic burden. I-V, respiratory complexes 1-5.

ll
OPEN ACCESS most cases, this has led to a simplification and not total loss of metabolic function.
Our data also contrast the relatively complex and conserved complement of mitochondrial metabolism genes to the almost completely reduced shared pathways in the peroxisomes. Co-evolved degeneration of these two organelles in the transition to anaerobiosis is seen convergently across multiple eukaryotic lineages and yet rules guiding this dynamic, if any, remain unclear. Because some inferred losses of peroxisomal proteins (e.g., Pex22 and Pex3) appear to have predated the move to anaerobiosis in the Opalozoa, our results raise the intriguing possibility that the earlier degeneration of peroxisomes could have acted as a brake on pathway loss in the mitochondria. As other lineages have more degenerate MROs, but more complete peroxisomal pathways (e.g., M. balamuthi 46 and E. histolytica 47 ), the more general speculative evolutionary mechanism ( Figure 6) would be that, for peroxisomes and mitochondria, the organelle that begins degeneration first results in negative selective pressure on pathway loss in the other. Once full adaptation to anaerobiosis has been achieved, then degeneration of both organelles proceeds. Whether this is simply a matter of contingency in this anaerobic lineage or reveals an evolutionary constraint in the transition to anaerobiosis remains to be investigated once many more lineages have been sampled.
Overall, the evolution of Blastocystis genomes has been characterized by a progressive, but pervasive genome-wide streamlining, with general loss of redundancy and punctuated by the loss of systems associated with cellular organelles, such as flagella and peroxisomes. This resulted in a lack of genomic versatility that mirrors the developmental and ecological uniformity we see in contemporary Blastocystis. This streamlining process is consistent with adaptation to a restricted niche within the host gut.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Anastasios D. Tsaousis (A.Tsaousis@kent.ac.uk).

Materials availability
This study has generated two unique antibodies that can be obtained from the lead contact. Antibodies will be made available on request, but we may require a payment and/or a completed Materials Transfer Agreement if there is potential for commercial application.
Data and code availability d Raw reads and assemblies have been deposited at NCBI and are publicly available as of the date of publication. Accession numbers are listed in the key resources table. This paper does not report original code. d Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

EXPERIMENTAL MODEL AND SUBJECT DETAILS
Culturing conditions for P. lacertae and C. burkhardae Proteromonas lacertae LA cultures 92 were grown axenically in LYI-S-2 93 with 15% adult bovine serum (Sigma Aldrich) at 22 C in sterile borosilicate glass tubes and passaged every 2-3 weeks. Cafeteria burkhardae ATCC 50561 was maintained in artificial seawater for protozoa (ASWP) at 4 C or at room temperature in light-shielded T-25 culture flasks.

METHOD DETAILS
'Omics sequencing, assembly and initial analysis P. lacertae and C. burkhardae nucleic acid extraction P. lacertae genomic DNA was extracted using DNeasy mini kit (Qiagen) according to the manufacturer's protocol. Twenty mg of DNA of high molecular weight was submitted for sequencing by the University of Liverpool's Centre for Genomic Research. Libraries were prepared by shearing DNA to approximately 10 kb fragments. Sequencing was done primarily on a single SMRT cell.
Whole RNA from P. lacertae and C. burkhardae was extracted using a RNeasy kit (Qiagen) according to the manufacturer's protocol. RNA samples were pooled and processed by the University of Liverpool's Centre for Genomic Research using polyA selection according to the manufacturer's protocol. P. lacertae genome and transcriptome sequencing and assembly Genome sequencing was done on a PacBio RSII sequencer with nine SMRT cells and assembled using SMRT Portal software (HGAP 3). The RNA samples were used to produce three Illumina RNASeq libraries from enriched RNA using the strand-specific ScriptSeq kit. Paired-end sequencing (2x125 bp) was carried out on one lane using Illumina HiSeq platform.
Assembly parameters were default except for genome size, which was set to 35 Mb, based on kmer size estimation as previously described, 94 and minimum seed read length, which was increased to 10,000 bp.
Gene calling was done using AUGUSTUS v2.4 60 and SNAP 61 utilizing a training set of 188 genes. The final gene set was manually curated. Gene annotations were assigned based on homology (BLASTx 62 ), InterProScan v5.21-60.0, 63  C. burkhardae transcriptome sequencing and assembly Library preparation and sequencing was performed as for P. lacertae. Reads were assembled using Trinity v2.1.1. 72 Bacterial contamination was removed using a PCA of kmer frequencies to filter sequences with >98% sequence identity with a known bacterial sequence.
The initial transcriptome contained 40,858 unique transcripts. Although polyA selected, bacterial transcripts from the xenic cell culture were frequent contaminants in the assembly. To remove prokaryotic transcripts, we examined kmer frequencies to identify a Cafeteria-specific signature. A positive transcript set included sequences with a top BLAST hit to a stramenopile, with >40% sequence identity. A negative transcript set included transcripts >95% sequence identity to a bacterial sequence. Kmer frequencies were calculated for these and all other transcripts and analysed using Principal Component Analysis (PCA). It was possible to clearly distinguish the positive and negative control groups, and thereby assign all remaining transcripts as eukaryotic or prokaryotic accordingly ( Figure S5). 11,633 transcripts that clustered with the negative control group were removed after manual checking of their BLAST affinities. The final transcriptome contains 28,952 transcripts, which returned a BUSCO score of 70.40%.
For each of our datasets, completeness was calculated from the BUSCO v1.1.1 73 using eukaryota_obd9 dataset.

Genomics analysis
Comparative analysis of the P. lacertae and Blastocystis gene sets was carried out using OrthoMCL, to arrange genes into orthologous clusters that could then be examined for evolutionary conservation and loss. To establish whether genes were gained, lost, or conserved in each genome, we clustered them with various stramenopile outgroups, including the C. burhardae transcriptome produced here. We included genome assemblies with low scaffold and contig counts, high N50 values and high BUSCO scores for completeness: Blastocystis sp. ST4 WR1, Blastocystis sp. ST7 Singapore isolate B, Pythium ultimum DAOM BR144, Phytophthora sojae strain P6497, Saprolegnia diclina VS20, Ectocarpus siliculosus strain Ec32, Thalassiosira pseudonana CCMP1335, and Schizochytrium sp. transcriptome. All genomes and transcriptomes were downloaded from NCBI Genome. OrthoMCL v2.0.9 74 was used with an E-value threshold of 1e-5 for all-v-all BLAST to generate clusters of homologous genes. We acknowledge the inherent potential for false positives and negatives when using RBH and high-throughput methods. However, given the dataset size phylogenetic analysis of all proteins was untenable. In cases of specific cellular system analyses, phylogenetics was used as described below.
A cluster was considered 'conserved' if they contained at least one sequence from P. lacertae, at least one Blastocystis subtype, and at least one outgroup (including C. burkhardae). A cluster was 'species-specific' if it only contained sequences from a single genome (except Blastocystis where it may contain representative sequences from both subtypes). Clusters were considered to represent losses from the Blastocystis genomes if they were absent from all Blastocystis genomes, but present in both P. lacertae and at least one other stramenopile.
To confirm the P. lacertae species-specific clusters and rule out contamination several points were considered. 18,425 (89.8%) of the putative P. lacertae-specific genes are contiguous with conserved stramenopile genes. Furthermore, the remaining putative species-specific genes not physically linked to conserved loci have codon usage or predicted amino acid composition not significantly different to conserved genes (t-test, p>0.3), indicating that they are authentic coding sequences. Finally, these species-specific genes do not have close affinity to other organisms, which would be indicative of contamination; instead 78.8% have no homology with known proteins using BLASTp and, among those genes that do display homology, average sequence identity is low (30.1%) and does not exceed 87%. Thus, it is likely that the high proportion of P. lacertae-specific genes reflects the poor genomic sampling of stramenopiles to date.
After separating gene clusters in this way, KEGG orthology (KO) terms were associated with genes using the KEGGmapper tool. The observed incidence of each KO term among P. lacertae or Blastocystis conserved genes and gene losses respectively was compared to the expected incidence given its frequency in the whole genome, and a hypergeometric test with Bonferroni correction was applied to identify KO terms with significant under-or over-representation. To identify Gene Ontology (GO) terms that were significantly enriched, human homologues for P. lacertae or Blastocystis genes in the conserved and loss categories were identified using BLASTp, and these were compared to the background human Gene Ontology using Fishers exact test with Benjamini correction using GOnet. IFT22 (Intraflagellar transport protein 22; NP_073614.1); IFT52 (Intraflagellar transport protein 52; NP_057088.2); IFT57 (Intraflagellar transport protein 57; NP_060480.1); IFT88 (Intraflagellar transport protein 88; NP_001340501.1); SPAG6 (Sperm-associated antigen 6; NP_036575.1); and RIBC2 (RIB43A-like with coiled-coils protein 2; NP_056468.1).
A reciprocal best match by BLASTp between the human protein query and the subject protein in the non-human genome was required to confirm that an ortholog was 'present' in the latter. Homologous sequences that were not best matches in reverse comparisons were considered to be non-orthologues and the query protein was recorded as 'absent'. KIF3C, a kinesin-like protein that is found in multiple copies in the human genome, among others ( Figure S1A), was found in the three non-flagellated organisms and is a microtubule-based anterograde translocator with multiple functions, possibly unrelated to motility. BLASTp matches to other flagellar proteins were found in Blastocystis, (e.g. to cytosolic dynein DYNC1I2, dynein regulatory complex subunit 3, DRC3, and intraflagellar transport protein 22, IFT22) but these were not reciprocal best hits and, in fact, did not fully align with the query. This is detailed in Table S3A; for example, a Blastocystis protein was homologous to DRC3 but only to a 72-amino acid span, rather than to the >500 amino acids typical of true orthologues in flagellated organisms. Therefore, we conclude that these partial hits are caused by homologous domains shared by otherwise unrelated proteins.
For the 13 P. lacertae and 9 C. burkhardae proteins identified, phylogenetic analyses ( Figure S2) showed that the genetic distances to P. lacertae or C. burkhardae homologues are consistent with orthologues in other flagellated species. To visualise the orthology between P. lacertae or C. burkhardae proteins and matches from other flagellated organisms, a Maximum-Likelihood phylogeny was estimated from an alignment of each query protein. The phylogeny was estimated with PhyML 75 after automated model selection using the Akaike Information Criterion. Default settings for tree-searching were employed, and 100 non-parametric bootstrap replicates were applied to assess node robustness. Given that we used human protein sequences to initially screen the Blastocystis gene set, and that Blastocystis and human are separated by a large phylogenetic distance, we note that the absence of conserved flagellar protein genes in Blastocystis is not changed when Proteromonas/Cafeteria orthologues are used as search queries instead. Cross-checking their gene names in Table S1 (highlighted in yellow shading) shows that no Blastocystis orthologues were identified by OrthoMCL, although paralogues for KIF3C are present, as noted above.
We used a large-scale dataset of flagellar associated proteins generated previously 30 for their investigation of stramenopile flagella (Tables S4A-S4F). As this dataset included 592 proteins that had been implicated as acting in the flagellum, but were not necessarily exclusive to the organelle, we instituted a series of bioinformatic filters aimed at identifying strong candidates for exclusive action in our structures of interest. The dataset was first searched against the Blastocystis genome as a negative control. Any proteins present in Blastocystis were removed as likely acting in other cellular processes. The resulting 236 protein dataset was then searched against a curated dataset of stramenopile genomes and transcriptomes. All 236 proteins were searched via the AMOEBAE bioinformatic workflow 76 that incorporates BLAST analysis against predicted proteins and nucleotide contigs, as well as HMMer analyses, and applies a reciprocal best hit e-value cut-off.
To increase the selection for proteins likely acting exclusively at flagella, we filtered the resulting dataset to identify proteins present in 7 of the 11 organisms possessing tinselated flagella, to account for possible false negatives in the genomes and/or transcriptomes, but importantly absent in the flagellum-lacking P. tricornutum. This yielded 116 candidates, including 37 previously annotated as flagellar associated (dynein, kinesin, IFT, radial spoke, flagellar) and 54 conserved unknown proteins. To identify candidate mastigoneme proteins, these 116 proteins were filtered to identify those missing in the combined Halocafeteria seosinensis transcriptome and genomic datasets. Of the 37 such proteins, only 7 were missing in all three organisms lacking tinselated flagella (P. lacertae, H. seosinensis, 32 and Incisomonas marina 31 ), with the large remainder being present in P. lacertae, but missing from the other two.
It is worth reiterating that the list of candidate proteins for flagellum and mastigonemes that we have generated is intentionally non-exhaustive, as it excludes any proteins that may have redundant cellular functions or localizations, and could have been repurposed and thus retained in taxa that have lost the traits of interest. For example, only 236 of the 592 flagellar associated proteins reported previously, 30 were retained since their presence in the flagellum-lacking taxa strongly suggests promiscuous or redundant function, but this in no way invalidates their flagellar functions. Similarly, the three known mastigoneme-associated proteins were rejected by our filters due to their presence in H. seosinensis. This notwithstanding, our analysis has generated a list of 37 candidate proteins for investigation as to their involvement in mastigonemes. Moreover, the fact that 30/37 such candidates are present in P. lacterae but absent in the other two nontinselated taxa is consistent with the hypothesized homology between somatonemes and mastigonemes, the first such molecular evidence brought to bear on this argument. Membrane-trafficking proteins Rabs identified previously in Phytophthora sojae 34 served as queries in BLASTp and tBLASTn searches against P. lacertae predicted proteins and transcriptome, respectively. All identified hits above E-value threshold of 1e-04 were subjected to reverse BLAST against home-built database of GTPases and NCBI nonredundant database. Only those ( Figure S1B) that recovered Rab as their best blast hit in reverse search in at least one of the databases were added to the dataset from a previous analysis. 34 Rabs were aligned using MAFFT v7.458 77 under L-INS-I strategy with a maximum of 1,000 iterations and poorly aligned positions were removed with trimAl v1.4 78 using -gt 0.5 option. Maximum-Likelihood trees (Data S1) were inferred using the LG+C20+F+G model, the Transmission and immuno-electron microscopy Three tubes of P. lacertae culture (4-10 days post passage) were pelleted at 800 x g for 10 minutes at room temperature. The supernatant was discarded, and each pellet was resuspended in 2 ml of 2.5 % glutaraldehyde (Sigma Aldrich) in 100 mM sodium cacodylate (Sigma Aldrich) buffer pH 7.2 and left to fix for two hours at room temperature. Following fixation, the sample was pelleted at 1,900 x g for two minutes and washed twice in cacodylate buffer for 10 minutes to remove the fixative. Once pelleted, the buffer was discarded and the pellet was resuspended in 500 ml of the same buffer, and subsequently 50 ml was transferred into another tube and warmed in a 55 C water bath for five minutes. 50 ml of 3% low melting point agarose was added to the cells, and using a glass pipette, the mixture was quickly transferred into previously made gaskets (plastic cut and sandwiched between two glass slides to allow the gel to form a thin layer which were clamped together) and stored at 4 C for 5-10 minutes until the gel had set. Once removed from the fridge, the gel was cut into very thin pieces and transferred into a drop of Alcain blue-0.1 % acetic acid dye Alcain blue, after which it was gently removed from the dye using a bent toothpick and placed in 3 ml of buffer to remove excess dye. Using a glass pipette, the buffer was removed, and care was taken not to remove gel fragments. 1-1.5 ml OsO 4 (made up from 1 ml 4% OsO 4 , 1 ml milli-Q water and 2 ml 200 mM cacodylate buffer) was added and the sample was left at room temperature for 1 hour. Following this step, OsO 4 was discarded, and the sample was washed once for 10 minutes in 50% ethanol and was left overnight in 70% ethanol at 4 C.
The following day, the sample was washed once in 90% ethanol and then three times in 100% ethanol. Following this wash step, the ethanol was discarded, and the fragments were washed twice in 3 ml propylene oxide. This was removed and 50% propylene oxide/50% low viscosity (LV) resin (12 g LV resin, 4 g VH1 hardener, 9 g VH2 hardener and 0.63 g LV accelerator) was added and left for 30 minutes at room temperature. 50/50 mix was removed and 100% LV resin mixture was added and left for 90 minutes. Following this, 10-12 fragments were transferred into fresh LV and left for another 90 minutes. Using a Pasteur pipette, 6 ml LV resin was put in a small mould and fragments were placed a small distance from the edge of the mould and gently pushed to the bottom using a bent toothpick. They were then placed in a 60 C oven for 20-24 hours in preparation for sectioning. To section, the mould was cut where the cells were most concentrated (following light microscopy) and were superglued onto blank resin capsules where it was filed down, and the edges of the capsule were cut away using a glass knife to leave the mould raised. The knife has a boat at the back which was filled with milli-Q water. The automated knife was used to cut very thin (few microns thick) slices from the block, which would stack in the water. To expand them, they were exposed to chloroform vapours. Once enough had been collected, roughly seven slices were attached to a slot grid coated in plastic.
To stain, a rectangle of dental wax with labelled columns was covered in milli-Q water and then sealed in parafilm. In each of the columns, a drop of 4.5% uranyl acetate was placed at the top of each, then below that a drop of milli-Q water, then another below that. The slot grid was placed on the uranyl acetate to stain for 45 minutes, then washed gently under milli-Q then placed on the drop of milli-Q, and repeated. It was dried using filter paper. On another grid of wax, which was also wrapped in parafilm, two drops of milli-Q were placed below a drop lead acetate (in this container, the space was filled with potassium hydroxide). The slot grid was placed in the lead for 7 minutes and transferred to the first, then second milli-Q drop, dried on filter paper, and left for a short while underneath a light (while being kept in the air by forceps).
For immuno-electron microscopy (IEM), aspirated cultures of P. lacertae were fixed for 1 hour in freshly prepared phosphate buffered saline (PBS) solution containing 4% formaldehyde and were then washed several times with PBS. IEM samples were suspended in LR white resin (Agar Scientific). Resin permeation was aided by placing the samples in a vacuum for 2 minutes. The resin was then aspirated and replaced with fresh resin and the samples transferred into gelatine (Agar Scientific) capsules and hardened for 15 hours in a pre-warmed 60 C oven. The hardened blocks were then polished and subsequently sectioned by ultra-microtome at a thickness of 70 mm, then placed on gold EM grids with approximately five sections per grid. Immuno-staining of the IEM grids was performed in humidifying chambers. Blocking of the samples was achieved via a 1-hour incubation in 2% bovine serum albumin in PBS with 0.05% Tween 20. Primary antibody binding was performed by 15-hour incubations with the Pex19 antibody, at three dilutions (1:10, 1:20 and 1:50) at 8 C. The IEM grids were subsequently incubated for 30 minutes at room temperature, with the corresponding gold-conjugated secondary antibodies. Counter-staining was achieved via a 15-minute incubation with 4.5% uranyl acetate in PBS and a 2-minute incubation in Reynold's lead citrate.
Both TEM and IEM grids were imaged in a Jeol 1230 Transmission Electron Microscope operated at 80 kV and images were captured with a Gatan One view digital camera. Scanning electron microscopy (SEM) Specimens of Blastocystis and P. lacertae were prepared for scanning electron microscopy (SEM) from cultures (Blastocystis: xenic culture from Betts et al. 103 ; P. lacertae: axenic culture in LYI-S-2 medium + adult bovine serum). Specimens were deposited with a pipette from the culture tubes into hand-made baskets [top end of a 1,000 ml pipette tip fixed with silicon to a 5 mm polycarbonate membrane filter (Millipore Corp.)] and placed in 12-well culture plates filled with PBS. A piece of Whatman No. 1 filter paper was mounted on the lid of the well plates and saturated with 4% (w/v) OsO 4 . The lid was closed on the well plate and the specimens were fixed by OsO 4 vapours for 30 minutes in the dark. Five drops of 4% (w/v) OsO 4 were added directly to the basket and the specimens were fixed for an additional 30 minutes. The filters were washed with water and dehydrated with a graded series of ethanol. Filters were critical point dried with CO 2 , mounted on stubs, sputter coated with 5 nm of platinum, and viewed using a scanning electron microscope Hitachi S-4300 (Hitachi, Tokyo, Japan).