Single-Cell Genomics Reveals the Divergent Mitochondrial Genomes of Retaria (Foraminifera and Radiolaria)

ABSTRACT Mitochondria originated from an ancient bacterial endosymbiont that underwent reductive evolution by gene loss and endosymbiont gene transfer to the nuclear genome. The diversity of mitochondrial genomes published to date has revealed that gene loss and transfer processes are ongoing in many lineages. Most well-studied eukaryotic lineages are represented in mitochondrial genome databases, except for the superphylum Retaria—the lineage comprising Foraminifera and Radiolaria. Using single-cell approaches, we determined two complete mitochondrial genomes of Foraminifera and two nearly complete mitochondrial genomes of radiolarians. We report the complete coding content of an additional 14 foram species. We show that foraminiferan and radiolarian mitochondrial genomes contain a nearly fully overlapping but reduced mitochondrial gene complement compared to other sequenced rhizarians. In contrast to animals and fungi, many protists encode a diverse set of proteins on their mitochondrial genomes, including several ribosomal genes; however, some aerobic eukaryotic lineages (euglenids, myzozoans, and chlamydomonas-like algae) have reduced mitochondrial gene content and lack all ribosomal genes. Similar to these reduced outliers, we show that retarian mitochondrial genomes lack ribosomal protein and tRNA genes, contain truncated and divergent small and large rRNA genes, and contain only 14 or 15 protein-coding genes, including nad1, -3, -4, -4L, -5, and -7, cob, cox1, -2, and -3, and atp1, -6, and -9, with forams and radiolarians additionally carrying nad2 and nad6, respectively. In radiolarian mitogenomes, a noncanonical genetic code was identified in which all three stop codons encode amino acids. Collectively, these results add to our understanding of mitochondrial genome evolution and fill in one of the last major gaps in mitochondrial sequence databases.

In order to obtain mitochondrial genome sequences from Retaria, we chose to use singlecell approaches. Single-cell genomics can effectively recover mitochondrial genomes from diverse protists (10,61,62). Even though most species of Foraminifera and Radiolaria are not in culture, contain a multitude of symbionts (63,64), and show high levels of intragenomic polymorphisms (65,66), we show that single-cell approaches can effectively recover mitochondrial genomes from these complex assemblages. Our data demonstrate that foraminiferan and radiolarian mitochondrial genomes have an overlapping but reduced gene complement compared to other sequenced rhizarians, similar to other reduced mitochondrial genomes from other lineages. Retarian mitochondrial genomes do not encode ribosomal proteins or tRNAs. However, they do contain truncated and divergent small and large rRNAs and contain only 14 or 15 protein-coding genes, including nad1, -3, -4, -4L, -5, and -7, cob, cox1, -2, and -3, and atp1, -6, and -9, with forams and radiolarians additionally carrying nad2 and nad6, respectively. An alternative genetic code was identified in radiolarian mitogenomes in which all three stop codons encode amino acids (TGA = W, TAG = Y, and TAA = Y/stop). These results further add to our understanding of mitochondrial genome evolution across the eukaryotic tree of life.

RESULTS AND DISCUSSION
Retarian mitochondrial, but not nuclear, genomes can be readily recovered using single-cell methods. We isolated individual cells and Illumina sequenced and assembled mini-metagenomes of 31 Foraminifera from 15 species (which are impossible to separate from their symbionts) and single-cell amplified genomes (SAGs) of 5 Radiolaria from 2 species (see Table S1 in the supplemental material for a complete list). Foram minimetagenomes are referred to here as SAGs. One additional foraminiferan metagenome (Globobulimina sp.) was downloaded and reassembled from the NCBI sequence read archive (SRA accession number SRX3312059 [67]). Assemblies from Calcarina, Neorotalia, Lithomelissa, and Acanthometra SAGs are available for BLAST at SAGdb (https://evocellbio .com/SAGdb/macher_et_al/).
Both forams and radiolarians associate with many eukaryotic and bacterial endosymbionts (68)(69)(70), making it difficult to obtain bona fide sequence data from either lineage. To assess the contamination in foram and radiolarian SAGs, we collected all 18S and 16S sequences from all assemblies using Cafeteria roenbergensis 18S and Escherichia coli 16S sequences as BLAST queries. We found foraminifera 18S genes or gene fragments in 23 of 31 SAGs from 13 of 15 species (Table S1). We also identified specific symbiont 28S sequences (from a dinoflagellate symbiont) or rbcL (from diatom symbionts) in all foram SAGs and species except Calcarina mayori and the reassembled Globobulimina (which is expected, since Globobulimina does not contain photosymbionts) ( Table S2). The inability to identify the foram 18S genes in all specimens is likely due to their extreme within-cell variability (65,71), which prevented proper assembly. 16S BLAST searches recovered diatom symbiont chloroplast and mitochondrial genes. In addition, 16S sequences from two common bacterial genera were also recovered (Burkholderia and Cutibacterium). Blob plots from foram assemblies confirm 18S BLAST findings as large contigs of symbiont organelles ( Fig. S1A and B). From these data, we concluded that our foram SAG assemblies predominantly contain symbiont contigs, with only some representation from the host nuclear genome.
In radiolarians, we obtained high-coverage contigs with complete 18S sequences only from radiolarians (Table S2). In Acanthometra and Amphibelone (nc69, -78, -87, and -96) SAGs, a few contaminating 18S sequences (e.g., diatom, cryptophyte, and ciliate) were detected, but these contigs were fragmented with low coverage, indicating relatively few eukaryotic contaminants. Similarly, only a few fragmented low-coverage 16S contigs were recovered, again indicating very little prokaryotic contamination. These results are corroborated by blob plots showing relatively little contamination in acantharian SAGs (Fig. S1C). In the Lithomelissa SAG (r2m), only radiolarian 18S sequences were recovered. However, many high-coverage bacterial contigs containing 16S sequences were identified, indicating that eukaryotic contamination in this SAG was low but bacterial contamination was very high. These 18S and 16S results are corroborated by blob plots showing a degree of bacterial and eukaryotic contamination but a large proportion of "unknown" reads with no similar hits in the NCBI nonredundant database (Fig. S1D). To assess the contamination of our nuclear data, we used a phylogenetic placement approach to assess SAG contamination (Fig. S2). Forams were excluded because they lacked sufficient identifiable nuclear contigs. Briefly, we extracted BUSCO proteins from SAG assemblies and added them to existing alignments from EukProt (72). Even with low radiolarian BUSCO scores (nc69, 10.6%; nc78, 5.1%; and nc96, 6.3% for Acanthometra SAGs; nc87, 3.9% for Amphibelone SAGs; and 11.4% for Lithomelissa SAGs), Acanthometra SAGs were correctly placed with full support alongside the only radiolarian (Astrolonche serrata) in the EukProt data set (Fig. S2). Conversely, the Lithomelissa SAG was placed within alveolates with full support (Fig. S2), suggesting unseen eukaryotic contamination, even though no contaminating 18S could be detected. Collectively, these data indicate that our radiolarian SAGs contain a substantial amount of radiolarian nuclear contigs, though the Lithomelissa SAG contains a large degree of bacterial and possibly eukaryote contamination.
Since mitochondrial genomes are often overrepresented in genome assemblies (10), we sought to identify foram and radiolarian mitochondrial genomes in our SAGs. Using protein sequences encoded by the Andalucia godoyi mitochondrial genome, one of the most gene-rich mitogenomes known (12), we identified several putative retarian mitochondrial contigs in foram and most radiolarian SAGs. Amphibelone SAG nc87 (96% identical 18S sequences to other nc SAGs) lacked any obvious mitochondrial contigs and was not investigated further. Since many assemblies exhibited both eukaryotic and prokaryotic contamination, great care was taken to inspect the validity of each contig manually. In forams, the putative mitochondrial contigs had orders-of-magnitude-higher read coverage and much lower GC content than symbiont or putative foram nuclear contigs (seen clearly in blob plots [ Fig. S1A and B]). Contigs representing nearly complete or complete symbiont organelle genomes were also found in many foram SAGs, though these contigs had much lower coverage than the foram mitochondrial genomes ( Fig. S1A and B). In radiolarians, the results were less clear-cut. While the coverage (;40 to 60Â) of the putative mitochondrial contigs was much higher than the median for the SAG (;3 to 5Â for the Acanthometra SAGs and ;11 for Lithomelissa, likely due to some very highcoverage contigs [ Fig. S3]), the GC content was similar to that of the putative nuclear contigs (Fig. S1). Thus, though these contigs had relatively high coverage, they were not clearly separated from the majority of contigs in blob plots. The coverage of mitochondrial genome contigs and contigs containing radiolarian 18S sequences had similar coverage (;40 to 60Â). Since both mitochondrial genomes and 18S sequences are generally found in multiple copies in a cell, we reasoned that we likely sequenced bona fide mitochondrial genomes and not nuclear mitochondrial genomes (NuMts), which would likely have much lower read coverage.
Retarian mitochondrial genomes carry a reduced gene complement. From each assembly, we extracted mitochondrial contigs collectively representing the putatively complete mitochondrial gene complement from 16 foraminiferan and two radiolarian species (Fig. 1). We obtained complete circular-mapping mitochondrial genomes of the forams Calcarina hispida and Neorotalia gaimardi. The mitogenomes were 46 kb (Calcarina hispida) and 50 kb (Neorotalia gaimardi) long, and each had the same set of 14 protein-coding genes (Fig. 2). We recovered contigs with mitochondrial genes from two species of radiolarians (Lithomelissa sp. and Acanthometra sp.) that almost completely overlap the foraminiferan complement (Fig. 1A). We concluded from these data that we likely extracted the complete, or nearly complete, coding complement of these radiolarian mitochondrial genomes. However, we were unable to recover complete circular mitochondrial genomes from either radiolarian, likely due to repetitive intergenic regions that prevented proper assembly. To complete these genomes, we attempted to link contigs using primers designed to PCR amplify missing regions between contigs but were unsuccessful, likely due to complex repetitive regions.
Between the two phyla, retarians displayed a near-identical gene complement, including cytochrome c oxidase subunits (cox1, cox2, and cox3), cytochrome b (cob), and ATP synthase subunits (atp1, atp6, and atp9) and NADH dehydrogenase subunits (nad1, nad3, nad4, nad4L, nad5, and nad7; nad2 is missing from radiolarians and nad6 is missing from foraminifera), The lack of nad2 in both radiolarian mitochondrial genomes and the lack of nad6 in one radiolarian and both foram mitochondrial genomes is not without precedent, as both are either lost or extremely diverged and transferred to the nuclear genome in euglenids (73). Fragments of large-subunit (LSU)-rRNA (rrnL) and small-subunit (SSU)-rRNA (rrnS) genes were identified in the mitogenomes of the foraminiferans Calcarina hispida and Neorotalia gaimardi and the radiolarian Lithomelissa sp., while only rrnL was identified in the mitogenome of the radiolarian Acanthometra sp.; however, neither full-length ribosomal protein-coding genes nor tRNAs were detected (Fig. 1). The nad9 gene was not found in our retarian mitochondrial genomes, even though all other sequenced rhizarian mitochondrial DNAs contain this gene (10, 74-77) (except for Brevimastigomonas, which has lost complex I altogether). Since most core complex I subunit genes appear to be retained in rhizarian mitogenomes (including those of retarians), missing complex I genes could be carried by the nuclear genome; however, these genes have not been identified in the nuclear genomes of euglenids (73). However, we were unable to find any complex I components in the nuclear assemblies, likely indicative of their incompleteness (BUSCO scores , 10%). Another conspicuous absence from retarian mitochondrial genomes is atp8, which encodes subunit 8 of ATP synthase. Subunit 8 is likely an essential component of ATP synthase in most organisms (77) but appears to be absent in Caenorhabditis elegans (78) and cannot be identified in many rhizarian mitochondrial genomes (77,79). To further confirm that we collected bona fide mitochondrial contigs, we reconstructed the phylogeny of forams with radiolarians as an outgroup using concatenated mitochondrial proteins predicted from the contigs (Fig. S4). The resulting phylogeny at the family level recapitulates the topology seen in 18S rDNA trees of Foraminifera (80)(81)(82), except for Peneroplidae clustering within the Soritidae.
We also obtained 25 fragmented mitochondrial genomes from 14 additional foraminiferan species (see Table S1 for a list of samples) that could not be linked in a single contig but had the same set of 14 protein-coding genes (all for ETC subunits) present in the circular-mapping mitochondrial genomes of Calcarina hispida and Neorotalia gaimardi. We also downloaded the available (meta)genomes of the foraminiferans Reticulomyxa filosa (55) and Astrammina rara (56) but could not identify mitochondrial genes.
Retarian mitochondrial genomes have large intergenic regions. Since we found large intergenic regions in both foraminiferan and radiolarian mitochondrial genomes, we conducted searches for genes or gene fragments within these intergenic regions using blastx (v.2.11.0) (83), mfannot (https://github.com/BFL-lab/Mfannot), and hmmer (v3.3.2) (84). Twenty-four regions were identified as putatively homologous to genes typically encoded by rhizarian mitochondrial genomes (Fig. 2, black lines). Eighteen of these are very similar to fragments of genes present elsewhere within the retarian mitochondrial genomes (atp1, atp6, cob, nad4, nad5, nad7, and cox2). The remaining six fragments are homologous to genes normally present in rhizarians, including nad2 and nad9 in radiolarians and rps12 in a foram and a radiolarian. These fragments could represent pseudogenes or horizontally transferred DNA sequences (85,86) or could reflect past genomic recombinations and rearrangements. In all mitochondrial contigs, small (;50-bp) stretches were nearly identical in many places, differing by one or two nucleotides. In the Lithomelissa sp. SAG, a large contig with similar read coverage was detected that contained these ;50-bp pseudorepeats but no mitochondrial genes or fragments (Fig. 2C). Perhaps the missing mitochondrial rRNAs have diverged beyond recognition.
All three standard stop codons are likely recoded in radiolarian mitochondrial genomes. Deviations from the ancestral standard genetic code have evolved in numerous lineages (87,88). In particular, lineages with extremely low GC content and limited opportunities for recombination (i.e., organellar genomes) exhibit genetic code changes more frequently (89,90). One common trend of genetic code variability, and the easiest to detect, is when stop codons are reassigned as sense codons. The most common version of stop codon reassignment by far is the TGA stop codon being recoded to tryptophan (normally encoded only by TGG) (91). This change has occurred several times across mitochondrial genomes and in other bacterial lineages. The TAA and TAG stop codons can also be recoded. For example, in the mitochondrial genome of certain thraustochytrid stramenopiles, a new stop codon (TTA) (GenBank accession no. AF288091.2) evolved and, in some species, both TAA and TAG were recoded to tyrosine (normally encoded by TAT and TAC) (10). All three stop codons have been recoded in the nuclear genomes of the ciliate Condylostoma magnum (92) and the kinetoplastid genus Blastocrithidia (93,94). In both cases, TGA encodes tryptophan, and TAA and TAG encode glutamine (normally encoded only by CAA and CAG). For Blastocrithidia, authors showed that highly expressed genes have fewer TAA and TAG codons and speculate that changes in tRNA usage enable ribosomes to read through TAA and TAG sense codons in the middle of genes with moderate and low levels of expression, while TAA is still used as a termination codon at the end of transcripts (93,94). Here, we identified a similar example in radiolarian mitochondria, where all three stop codons are likely recoded to sense codons.
To determine the genetic code of radiolarian mitochondrial genomes, we aligned predicted proteins with mitochondrion-encoded proteins from diverse protists. These alignments revealed that in-frame TGA and TAG codons occur at sites often occupied by tryptophan and tyrosine residues, respectively (Fig. 3). Conversely, relatively few in-frame TAA codons are present in conserved domains. The majority of in-frame TAA codons occurred at locations for which there was no consensus amino acid in the alignment (Fig. 3). In Acanthometra, only two genes contained in-frame TAAs (nad5 [8 TAAs] and cox2 [2 TAAs]). All eight nad5 TAA codons were in the 39 region, which appears to have diverged compared to the same region of other nad5 genes. Similarly, the two cox2 TAA codons were also in regions of the gene that are not highly conserved. In Lithomelissa, eight genes contained in-frame TAA codons. Like Acanthometra, Lithomelissa contained a few TAAs that aligned with conserved residues (one glutamine, one arginine, and the other tyrosine) in the middle of protein alignments (nad7 and cox3).
When assessing pairwise alignments of the radiolarian proteins (e.g., pairwise alignment of Lithomelissa and Acanthometra cox1), of the 29 in-frame TAA codons, nearly half (14 TAAs) aligned with a tyrosine, phenylalanine, or tryptophan, and the majority (19 TAAs) aligned with a hydrophobic residue. In addition, 25 of 27 radiolarian mitochondrial protein-coding genes had a TAA codon near where the end of the protein is predicted. Two Acanthometra genes (nad4L and cox2) lacked stop codons and were contiguous with the open reading frames of cob and nad1, respectively. These data all suggest that a mechanism similar to the one proposed for the Blastocrithidia nuclear genome may be in place in radiolarian mitochondrial genomes. While TGA and TAG encode tryptophan and tyrosine, respectively, TAA appears to have a dual role, likely encoding tyrosine in some proteins at a few locations but acting primarily as a stop codon. Curiously, several proteins lack any in-frame TAA codon. Perhaps, similar to the case in Blastocrithidia, the most highly expressed mitochondrial proteins lack in-frame stop codons. The atp9 gene is among the most highly expressed and has no TAA or TAG present in either radiolarian. These data indicate that the mitochondrial genetic code in radiolarians has diverged from the ancestral code and has recoded all three stop codons to code for amino acids. While TGA and TAG are recoded to tryptophan and tyrosine, TAA codons sometimes encode tyrosine but are the primary, and likely only, stop codon.
Retarian mitochondrial genomes contain fragmented rRNA genes, divergent atp6 genes, and split nad genes. In most eukaryotic lineages, mitochondrial genomes encode a combination of proteins involved in electron transport and ATP synthesis, ribosomal proteins, and a few auxiliary proteins involved in protein maturation or translocation (3). However, five major lineages (euglenids, retarians, chlorophycean algae, myzozoans, and animals [ Fig. 4]) have completely transferred all genes for mitoribosomal proteins to the nucleus, and two others are close behind (fungi contain only rps3, and glycomonads [Euglenozoa] contain at most rps3 and rps12) (34,77,(95)(96)(97). In all these lineages except animals and fungi, the EGT of mitoribosomal proteins has coincided with an extreme reduction or fragmentation of the mitochondrial rRNAs (Fig. 4, dark blue circles) (98). Animal and fungal mitochondrial rRNAs are truncated, but not to the extent of other mitochondrial rRNAs that are extremely divergent and nearly undetectable.
In addition to highly divergent rRNA genes, euglenozoans, retarians, chlorophycean algae, and myzozoans possess extremely divergent atp6 genes (Fig. 4, yellow circles). Since a few TAA codons appear in the 59 region of the Lithomelissa atp6 gene, we decided to model the Atp6 protein using Alphafold2 (99) to determine if the N-terminal extension is part of the protein or represents a noncoding upstream sequence. Alphafold2 modeled full-length subunit a into a structure that better resembles the classic subunit a (Fig. S5). This suggests that the TAA codons are in part of the coding sequence of Lithomelissa atp6. Divergence of ATP synthase structure can have consequences for mitochondrial crista morphologies (e.g., chlorophycean algae, euglenids, kinetoplastids, and apicomplexans all have unique crista morphologies) (100-103). Since mitochondrial crista architecture that departs from classic lamellar and tubular morphologies present in most other eukaryotes has also been reported for Foraminifera (104) and Radiolaria (105,106), retarian ATP synthase structures represent excellent candidates for future investigation into the structural and functional diversity of this amazing protein complex (107).
Curiously, nad1 is split into two parts in the Foraminifera Calcarina hispida and Neorotalia gaimardi. This suggests that some trans-splicing might be present in Retaria, similar to what has been reported for mitochondrial genes in other eukaryotes (108,109). It is also possible that two peptides are separately expressed and merged into a functional protein, as has been found in Chromera plastids (110). Furthermore, we identified a conserved frameshift in cox1 of all four analyzed species of the foraminiferan order Miliolida, which suggests that a mechanism for stop codon read-through or posttranscriptional mRNA modification of this codon exists in this lineage. Manual insertion of a single N into the miliolid sequences resulted in a continuous open reading frame (ORF), which, when translated, spans the entire length of the cox1 protein sequence (111). Posttranscriptional insertion modifications have evolved in several protist lineages, including euglenids and diplonemids (95,112). As the same pattern was found in all analyzed miliolid Foraminifera but not in any rotaliid species, we conclude that this is a unique feature of Miliolida mitochondria, which might be of interest in the future characterization of this group.
Conclusions. Why do mitochondrial genomes vary so drastically across eukaryotes? Specifically, what triggers the wholesale transfer of mitochondrial ribosomal genes to the nucleus in so many lineages? There are several possible benefits to mitochondrionto-nucleus gene transfer (113), and given enough time, mitochondrion-to-nucleus EGT is considered mathematically inevitable (114). Perhaps the diversity of mitochondrial genomes is simply a result of these evolutionary forces playing out over billions of years, with no functional cell biological consequences. However, this seems a somewhat unsatisfying answer given the diversity of mitoribosomal structures that have recently been solved (115)(116)(117).
In any case, retarian mitochondrial genomes represent a newly discovered ancient independent reduction in organellar gene content. The reduced gene complement of retarians displays more similarities to the mitochondrial genomes of euglenozoans, myzozoans, chlorophycean algae, animals, and fungi than it does to that of other rhizarians (Cercozoa). While several of these lineages may seem obscure and disparate, it is important to note that each lineage diverged in the mid-Neoproterozoic or earlier (118) (Fig. 4). These lineages therefore possess histories as deep and rich as those of animals and fungi, which are each traditionally considered independent "kingdoms." Given the ancient divergence of forams and radiolarians, the strikingly reduced mitochondrial genomes of Retaria have persisted without substantial change for over 500 million years. The persistence of mitochondrial gene content over large time spans suggests that mitochondrion-to-nucleus gene transfer does not occur consistently but rather occurs in relatively short macroevolutionary bursts. Further investigations into more deeply branching taxa at nodes of apparent sudden mass EGT will clarify this notion. In sum, the retarian mitochondrial genomes presented here bridge a major gap in our understanding and provide the first glimpse into the mitochondria of this diverse group of ancient protists.

MATERIALS AND METHODS
Sample collection. (i) Foraminifera samples. We analyzed 31 benthic Foraminifera cells (15 species) from the Spermonde Archipelago in Indonesia and from Coral Bay in Australia (see Table S1 for samples and locations). All specimens were stored in .90% ethanol after sampling and transferred to the Naturalis Biodiversity Centre laboratory for morphological species identification and molecular analyses. Specimens were sorted into morphotypes and identified and photographed using a ZeissDiscovery v12 stereomicroscope (Zeiss, Oberkochen, Germany).
(ii) Radiolaria samples. Marine surface water plankton samples were collected from the Pacific Ocean near the California coast (33.454219, 2117.705215) by towing an 87-mm-mesh plankton net from the back of a kayak on 7 February 2021 at 10:00 a.m. Bulk environmental plankton samples were immediately aliquoted into 15-mL Falcon tubes and preserved with RNAlater. Plankton samples were stored on ice during transportation to the lab. Radiolarian cells were identified by morphology and imaged prior to single-cell isolation under an inverted microscope using a micropipette. Individual cells were washed four times in DNase-and RNase-free water to remove extracellular material from each radiolarian. This process was repeated twice with new water each time before each cell was transferred to 4 mL of RNAlater and then stored at 220°C before further processing.
DNA extraction and sequencing. (i) Foraminifera. Single Foraminifera specimens were dried in sterile 1.5-mL Eppendorf tubes and ground to a fine powder using a porcelain mortar and pestle. Total genomic DNA extraction was carried out using the QIAamp DNA Micro kit (Qiagen; Hilden, Germany) as described in reference 111. After extraction, DNA quantification was conducted using the FragmentAnalyzer system (Agilent Technologies, Santa Clara, CA, USA). Since extracted DNA was already fragmented to an average length of less than 500 bp, no further fragmentation using ultrasonication or enzymes was conducted.
Shotgun metagenomic libraries were prepared using the NEBNext Ultra II DNA library preparation kit (New England Biolabs, Ipswich, MA, USA) with the corresponding NEBNext multiplex oligonucleotides for Illumina, following the manufacturer's protocol but reducing volumes by 50%. Final concentration and fragment size were checked on the Tapestation system (Agilent Technologies, Santa Clara, CA, USA). All samples were pooled in equimolar amounts before being sent for sequencing on the Illumina NovaSeq 6000 platform (2 Â 150-bp read length) at Baseclear (Leiden, The Netherlands), targeting 5 million reads per sample.
(ii) Radiolaria. Single-cell DNA extractions were performed using the MasterPure DNA and RNA purification kit (Epicentre Biotechnologies) following the protocol as written, with the addition of a 30-min incubation with a solution of lysis buffer and proteinase K at 65°C and 1,000 rpm. Purified total genomic DNA was eluted into 4 mL Tris-EDTA (TE) buffer and quantified using a Qubit HS double-stranded-DNA (dsDNA) kit.
Genomic DNA from each cell was amplified using the Repli-G Advanced DNA single-cell kit and protocol (Qiagen) for amplifying purified genomic DNA. Final concentration and fragment size were checked using the Tapestation and Qubit systems. An aliquot of each singly amplified genome containing a total of 500 ng DNA was provided to the ASU Genomics Facility for library preparation using KAPA Biosystem's LTP library preparation kit before the samples were sequenced on the Illumina NovaSeq 6000 platform targeting 10 million 2 Â 150-bp reads per sample.
Bioinformatic analysis. (i) Foraminifera. MultiQC (119) was used for the quality assessment of raw reads. Megahit (120) was used for the initial assembly of reads into contigs, which were loaded into Geneious Prime (v.2020) together with raw reads. Contigs were mapped against the mitochondrial genome of the rhizarian Lotharella oceanica deposited in GenBank (accession number NC_029731.1 [77]) with up to 50% mismatch, a word length of 5, and up to 10% gaps (gap size, 10) allowed. Since none of the assembled contigs could be mapped, raw reads and contigs were mapped against the L. oceanica reference with the settings mentioned above and against Foraminifera mitochondrial cytochrome oxidase subunit I (COI) barcode sequences published in reference 121. Regions with high coverage of mapped reads or with mapped contigs were manually inspected. When mapped contigs did not represent a full mitochondrial genome (which was the case only for Neorotalia gaimardi), mapped reads were used as a reference for repeated mapping with the Geneious Prime mapper, with a minimum of 100 bp overlap, a maximum of 1% mismatch, and no gaps allowed. Mapping was repeated until no further reads could be mapped. The resulting contigs were checked for ORFs with mitochondrial translation table 4, which was reported previously for protist mitochondrial genomes (10).
Contigs were submitted to the mfannot mitochondrial annotation web server of the University of Montréal (https://megasun.bch.umontreal.ca/cgi-bin/mfannot/mfannotInterface.pl). ORFs identified as the cytochrome oxidase subunit 1 gene (cox1) were searched against the NCBI GenBank reference database (122) and the Foraminifera cox1 database (121) using BLASTn to identify the cox1 sequence stemming from putative symbionts and the putative foraminiferal cox1. Annotations were manually curated in Geneious Prime. ORFs that were not annotated by mfannot were translated to proteins, subjected to transmembrane prediction with TMHMM (123), and searched against Pfam (124), UniProt (125), Swiss-Prot (126), and Ensembl (127) databases using the hmmer web server (84) to check for potential matches with known mitochondrial genes. When a complete mitochondrial genome could not be obtained, the putative foraminiferal mitochondrial genes were identified by mapping reads against the newly assembled Calcarina hispida and Neorotalia gaimardi mitochondrial genomes as described above.
To verify that foraminiferal mitochondrial genes could also be obtained from previously published data sets, we downloaded the Globobulimina (order Rotaliida) metagenome from the NCBI Sequence Read Archive (accession number SRX3312059 [67]) and assembled the foraminiferal mitochondrial genes as described above. Furthermore, we downloaded the genomic contigs of the foraminiferans Reticulomyxa filosa (55) and Astrammina rara (56) and searched for mitochondrial genes as described above, though none could be found.
(ii) Radiolaria. MultiQC (119) was used to trim and filter raw fastq reads, which were then normalized with BBNorm (an addition to BBMap v.38.12). SAGs were assembled using SPAdes (v.3.15.2) (128). Normalized reads were mapped back to contigs with BBMap, and genome completeness was assessed with BUSCO (v.5.1.2) (129). BlobTools (v.1.0) (130) was used to visualize contigs with similar read coverage and GC content. Mitochondrial contigs were identified using Andalucia godoyi mitochondrion-encoded proteins as queries in tblastn searches against radiolarian SAG assemblies. The mitochondrial contigs identified were manually stitched together by identifying regions with overlaps of .50 bp between contigs with similar read coverages.
Putative mitochondrial contigs were submitted to the mfannot mitochondrial annotation (https:// megasun.bch.umontreal.ca/apps/mfannot/) web server. Because mfannot did not identify full-length rRNA genes within our mitochondrial genomes, nhmmer (131) was used to search each genome for rRNA genes using manually curated rRNA databases. Fragments of mitochondrial genes were also identified by searching intergenic regions and open reading frames that were not annotated by mfannot against a manually curated database of mitochondrial protein sequences with representatives from all protist genera with a sequenced mitochondrial genome in NCBI GenBank using blastx. Intergenic regions and ORFs with at least four hits from the same gene were considered significant enough for annotation on the mitochondrial genome maps. Annotations were added manually using Geneious Prime.
Stop codon analysis. Amino acid multiple sequence alignments were used to assess the locations within a mitochondrial gene at which radiolarians have a stop codon. Alignments were generated with MUSCLE (132) using radiolarian genes identified by mfannot and genes from every available protistan genus in GenBank. If more than one mitochondrial genome existed for a genus in GenBank, then the most recent two mitochondrial genomes from different species were chosen as representatives of that genus. The total number of stop codons present within each mitochondrial gene from the two radiolarian mitochondrial genomes was visually counted. The 50% consensus amino acid identities at locations for which a radiolarian mitochondrial gene had an in-frame stop codon were also tallied to assess which amino acids radiolarian stop codons are potentially coding for instead. Radiolarian stop codons that occurred at locations where the consensus alignment contained a gap or where the majority of genes within the alignment were not present (the very beginnings and ends of the alignment) were not counted. Pairwise alignments of radiolarian mitochondrion-encoded proteins were performed using MUSCLE and inspected manually.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.

ACKNOWLEDGMENTS
This material is based upon work supported by the National Science Foundation under grant DBI-2119963 (J.G.W.).
We are grateful to T. Coots for assistance with computational analyses.