A unicellular relative of animals generates an epithelium-like cell layer by actomyosin-dependent cellularization

In animals, cellularization of a coenocyte is a specialized form of cytokinesis that results in the formation of a polarized epithelium during early embryonic development. It is characterized by coordinated assembly of an actomyosin network, which drives inward membrane invaginations. However, whether coordinated cellularization driven by membrane invagination exists outside animals is not known. To that end, we investigate cellularization in the ichthyosporean Sphaeroforma arctica, a close unicellular relative of animals. We show that the process of cellularization involves coordinated inward plasma membrane invaginations dependent on an actomyosin network, and reveal the temporal order of its assembly. This leads to the formation of a polarized layer of cells resembling an epithelium. We show that this epithelium-like stage is associated with tightly regulated transcriptional activation of genes involved in cell adhesion. Hereby we demonstrate the presence of a selforganized, clonally-generated, polarized layer of cells in a unicellular relative of animals.


Introduction
Cellularization of a coenocyte -a multinucleate cell that forms through sequential nuclear divisions without accompanying cytokinesis -is a specialized form of coordinated cytokinesis that results in cleavage into individual cells. Cellularization commonly occurs during development of animals and plants. In both, cellularization results in the formation of a multicellular, spatially organized epitheliumlike structure. Despite the similarities, distinct mechanisms are involved in the cellularization of animal and plant coenocytes. During endosperm development in most flowering plants, coenocytes cellularize through cell wall formation around individual nuclei, forming a peripheral layer of cells surrounding a central vacuole (Hehenberger et al., 2012) . This is coordinated by the radial microtubule system and is dependent on several microtubule-associated proteins (Pignocchi et al., 2009;Sorensen, 2002) . In contrast, in a model animal coenocyte, the syncytial blastoderm of the fruit fly Drosophila melanogaster, cellularization is accomplished through plasma membrane invaginations around equally spaced, cortically positioned nuclei (Farrell and O'Farrell, 2014;Mazumdar and Mazumdar, 2002) . This process is driven by a contractile actomyosin network (Mazumdar and Mazumdar, 2002) . It depends on several actin nucleators, such as the Arp2/3 complex (Stevenson et al., 2002) and formins (Afshar et al., 2000) . It also requires multiple actin-binding proteins, including myosin II (Royou et al., 2002) , which mediates actin cross-linking and contractility, as well as septins (Adam et al., 2000;Cooper and Kiehart, 1996) , cofilin (Gunsalus et al., 1995) and profilin (Giansanti et al., 1998) . In addition, it depends on cell-cell adhesion proteins including cadherin, and alpha-and beta-catenin (Hunter and Wieschaus, 2000;Wang et al., 2004) . This coordinated cellularization results in the formation of a single layer of polarized epithelial tissue, also known as cellular blastoderm (Mazumdar and Mazumdar, 2002) . Such cellularization is common in insects, however, whether this mechanism of cellularization is found outside animals, remains unknown.
Among holozoans -a clade that includes animals and their closest unicellular relatives ( Figure 1A) -ichthyosporeans are the only known lineage besides animals that forms coenocytes during their life cycles (Mendoza et al., 2002;de Mendoza et al., 2015) . All characterized ichthyosporeans proliferate through rounds of nuclear divisions within a cell-walled coenocyte, followed by release of newborn cells (Marshall andBerbee, 2011, 2013;Suga and Ruiz-Trillo, 2013;Whisler, 1968) . We have previously suggested that they undergo cellularization (Ondracka et al., 2018;Suga and Ruiz-Trillo, 2013) . However, at present nothing is known about the ichthyosporean cellularization, and whether it involves animal-like mechanisms.

The cortical actin network establishes sites of membrane invagination and generates an epithelium-like layer of cells
To assess whether cellularization in S. arctica involves encapsulation of nuclei by plasma membranes, we imaged the plasma membrane using live time-lapse imaging in the presence of the lipid dye FM4-64 (Betz et al., 1996) . We observed a rapid increase in FM4-64 intensity at the periphery of the coenocyte 30 minutes prior to flip (Figure 2A, panel II, Movie S3 and S4). The plasma membrane invagination sites formed at the periphery and progressed synchronously and rapidly from the outside toward the center of the coenocyte, forming polarized, polyhedral cells ( Figure 2A, panels II-V, Movie S3 and S4). Finally, following flip, cells lost their polyhedral shape and became round, suggesting that they were no longer attached to each other ( Figure 2A, panel VI, Movie S3 and S4).
In animal coenocytes, plasma membrane invagination is associated with dynamic organization of the actomyosin cytoskeleton (Mazumdar and Mazumdar, 2002) . To investigate actin dynamics during cellularization, we took advantage of the timeline described above and imaged coenocytes that were fixed and stained for actin and nuclei (using phaloidin and DAPI, respectively) at different time points during cellularization ( Figure 2B, S2A and S2B). During growth, actin localized exclusively as small patches at the surface of the coenocyte ( Figure S2A, panel I). Only at the onset of cellularization, multiple actin patches increased in size to form actin nodes (Figure 2B,panel I and II,and S2A,panel II). This phase was followed by cortical compartmentalization surrounding the nuclei through gradual formation of an actin filament network solely at the cortex of the coenocyte ( Figure 2B, panel III, and S2A, panels III and IV). Following this cortical compartmentalization, an epithelium-like cell layer was formed by inward growth of the actin filaments from the cortex to generate a structure resembling the cellular blastoderm in Drosophila ( Figure 2B, panels IV and S2A, panels V and VI). During this stage, the actin signal intensity was uneven and higher on the internal side (Movie S5), and nuclei were localized close to the cortex, indicating that cells in this epithelium-like layer are polarized ( Figure 2B, panel IV, Movie S5). The epithelium-like layer progressively grew towards the center of the coenocyte to fill the cavity, until the center space was occupied ( Figure S2A, panel VII). After flip, similar to the plasma membrane organization mentioned above, the epithelium-like layer of cells was reorganized to form spherical cells ( Figure 2B, panel VI, and S2A, panel VIII).
To determine the order of actin organization and plasma membrane invaginations, we assessed the localization of both actin and plasma membrane in fixed samples using phalloidin and a fixable analog of FM4-64 (FM4-64FX). We found that the cortical actin network formed prior to the appearance of the membrane dye ( Figure 2C, panel II). 6 The intensity of FM4-64FX labeling also increased and co-localized with the underlying actin network around the time of plasma membrane invagination (Figure 2C,panels III and IV). This suggests that the cortical actin network determines the site of plasma membrane invagination.
Finally, to determine the timing of cell wall formation, we stained the cells with calcofluor. We observed that labeling co-localized with the membrane marker FM4-64FX around individual cells in the postflip stage ( Figure S2C). This shows that the newborn cells already build the cell wall before the release, as was suggested previously in other Sphaeroforma species (Marshall and Berbee, 2013) .
In early insect embryos, cellularization of the syncytial blastoderm occurs through actin-dependent invagination of the plasma membrane. Here, we demonstrate that the cellularization of the ichthyosporean coenocyte also involves active actin reorganization and membrane invagination at the site of actin cytoskeleton. Additionally, this results in the formation of a polarized epithelium-like layer of cells with an internal cavity that morphologically resembles the cellular blastoderm of insects.

Cellularization is associated with extensive sequential transcriptional waves and is associated with evolutionarily younger transcripts
To gain insight into the regulation of the cellularization of S. arctica, we performed RNA-seq data on synchronized cultures with high temporal resolution, and comprehensively analyzed the dynamics of transcription, alternative splicing, and long intergenic non-coding RNAs (lincRNAs). Because the published genome assembly of S. arctica (Grau-Bové et al., 2017) was fragmented and likely resulted in incomplete gene models, we first re-sequenced the genome combining the Illumina technology with the PacBio long read sequencing technology. The final assembly sequences comprised 115,261,641 bp, and the metrics were greatly improved compared to the previous assembly (Grau-Bové et al., 2017) (Table S1). Ab initio gene annotation resulted in the discovery of novel ORFs due to the absence of repetitive regions in the previous assembly. RNA-seq data was used to improve the ORF prediction, to define the 5' and 3' untranslated regions, and to discover lincNAs. In total, 33,682 protein coding genes and 1,071 lincRNAs were predicted using this combined pipeline (see methods). This final transcriptome assembly was used as the reference transcriptome for further analysis.
To perform the time-resolved transcriptomics, we isolated and sequenced the mRNA from two independent synchronized cultures at 6-hour time intervals during the entire coenocytic cycle, encompassing time points from early 4-nuclei stage throughout growth and cellularization stages until the release of newborn cells ( Figure S3A). 7 We first analyzed transcript abundance during the time series. The majority of the transcriptome (20,196 out of 34,753 predicted protein-coding and lincRNA genes) was transcribed at very low levels (mean expression throughout the time course <0.5 tpm [transcripts per million]) and were removed for the subsequent clustering analysis. Clustering of transcript abundance data from both biological replicates revealed a clear separation between the transcriptomes of the growth stage (12h to 42h time points) and the cellularization stage (48h to 66h time points) ( Figure 3A). Furthermore, we observed that the transcriptome patterns between replicates 1 and 2 were shifted by 6 hours from 48 hours onwards ( Figure 3A), presumably due to differences in temperature and conditions influencing the kinetics of the coenocytic cycle. We thus adjusted the time of the second replicate by 6 hours according to the clustering results, although we emphasize that the clustering analysis did not depend on time. Among the expressed transcripts (defined as mean expression levels higher than 0.5 tpm across all samples; in total 13,542 coding genes and 1,015 lincRNAs), consensus clustering using Clust (Abu-Jamous and Kelly, 2018) extracted 9 clusters of co-expressed transcripts with a total of 4,441 protein coding genes ( Figure 3A), while the rest of the transcripts were not assigned to any coexpression cluster. The assigned cluster membership was robust to using either of the replicates or averaging ( Figure S3B). Visualization by heatmap and t-SNE plot separated the clusters into two meta-clusters containing the growth stage (clusters 1-3, totaling 2,197 genes) and cellularization stage (clusters 4-9, totaling 2,314 genes) clusters ( Figure 3C and 3D). Among the cellularization clusters, we see a clear temporal separation of different transcriptional waves, with clusters 5, 6 and 7 representing the three largest clusters of co-expressed transcripts of early, mid and late cellularization, respectively ( Figure 3B and 3D). Altogether, this shows that cellularization is associated with extensive sharp transcriptional activation in multiple temporal waves, in total affecting 17% of the expressed transcriptome.
In parallel to transcript abundance dynamics, we also assessed alternative splicing (AS) across the time series. This analysis identified 2,022 genes affected by intron retention (12.9% of all intronbearing genes, totaling 4,310 introns), 914 by exon skipping (12.3% of genes with >2 exons, 1,206 exons) and 44 with mutually exclusive exons (0.7% of genes with >3 exons, involving 118 exon pairs) in all samples ( Figure S3C and S3D). Overall, neither the number of AS events nor the number of genes affected vary dramatically along the S. arctica growth cycle ( Figure S3E-S3J). Analysis of AS events over time did not yield any discernible global dynamics, although we found a small number of events differentially present between the growth and cellularization stages ( Figure S3K).

8
Interestingly, skipped exons were more likely to be in-frame (38.63%, compared to 30.33% of in-frame exons in genes with >2 exons, p = 4.34e-05, Fisher's exact test) and yield non-truncated transcripts, a phenomenon commonly observed in animal transcriptomes but not in transcriptomes of other unicellular eukaryotes (Grau-bové et al., 2018) . The transcripts affected by such in-frame exon skipping events are enriched in biological processes such as regulation of multicellular organismal processes, assembly of the focal adhesion complex and cell growth ( Figure S3L). Overall, although pervasive, alternative splicing likely does not play a major role in regulation of the coenocytic cycle and cellularization of S. arctica.
Additionally, we analyzed the dynamics of lincRNA expression. Overall, lincRNAs represent ~3% of total transcript abundance across the time series ( Figure S4A). Among the long non-coding RNAs, 70 lincRNA transcripts clustered with coding genes into temporally co-expressed clusters ( Figure 4A).
Sequence homology search revealed that 24 of the S. arctica lncRNAs were conserved in distantly related ichthyosporean species such as Creolimax fragrantissima, Pirum gemmata and Abeoforma whisleri (estimated to have diverged ~500 million years ago (Parfrey et al., 2011) . This is a remarkable depth of conservation, since animal lincRNAs are not conserved between animal phyla (Bråte et al., 2015;Gaiti et al., 2015;Hezroni et al., 2015) . Other lincRNAs were either specific to S. arctica (511) or conserved only in closely related Sphaeroforma species (536). Comparison of lincRNA by degree of conservation showed no notable difference in %GC content ( Figure S4B) or dynamics of expression during the coenocytic cycle ( Figure S4C). However, we found that conserved lincRNAs were on average longer than non-conserved ones ( Figure S4D) and, strikingly, expressed at much higher levels ( Figure 4B). Among the 24 deeply conserved lincRNAs, 5 clustered in the temporally coexpressed clusters, which is higher than expected by chance (p = 0.0166, Fisher's exact test). Among these, 3 belonged to the cellularization clusters, including, lincRNA asmbl_31839, which has a remarkably high sequence similarity with its homologs from other ichthyosporeans ( Figure S4E). Furthermore, its transcriptional regulation is independent of the transcriptional regulation of its neighboring coding genes (located within 3kb) ( Figure 4C). In summary, we discovered deeply conserved lincRNAs that are expressed at high levels and are transcriptionally activated during cellularization.
Finally, to assess the evolutionary origins of the co-expression clusters, we used a phylostratigraphic analysis to classify genes into evolutionary gene age groups. We carried out orthology analysis of the predicted S. arctica proteome along with 30 representative species from the eukaryotic tree of life to identify "orthogroups" (i.e. groups of putative orthologs between species). S. arctica protein-coding genes clustered in 6,149 orthogroups representing 12,527 genes; the rest of the genes did not have an ortholog outside S. arctica. Next, we inferred the age of each gene using Dollo parsimony (Csuros, 2010) to classify them into phylostrata (sets of genes with the same phylogenetic origin) ( Figure 4D). Analysis of gene expression by phylostrata revealed a trend toward more variable expression throughout the coenocytic cycle in younger genes ( Figure 4E), although their mean expression levels were lower ( Figure S5A). Such a correlation has been observed in animal development, where developmentally regulated genes tend to be of younger origin (Domazet-Lošo and Tautz, 2010) . Analysis of enrichment of gene phylostrata in each gene expression cluster ( Figure S5B) showed that the growth clusters are enriched for paneukaryotic genes. In contrast, we find that the cellularization clusters were enriched for younger genes ( Figure S5B and S5C). Importantly, we found that genes with ichthyosporean origins were significantly enriched in all three of the largest cellularization clusters ( Figure S5B and S5C). Computing the transcriptome age index (Domazet-Lošo and Tautz, 2010) revealed an hourglass pattern, with older genes expressed at later stages of growth, and younger genes expressed during early growth and cellularization ( Figure 4F). Such an hourglass transcriptomic pattern has previously been observed in animal development, where it reflects the morphological similarities and differences of embryos of different taxa (Domazet-Lošo and Tautz, 2010) , and it has been suggested as a conserved logic of embryogenesis across kingdoms (Quint et al., 2012). Despite this, we currently do not have a morphological explanation for this transcriptional hourglass pattern in ichthyosporeans. Altogether, the phylostratigraphic analysis suggests that cellularization is a comparatively younger process, whereas the growth stage represent an evolutionarily ancient process.

Temporal co-expression of actin cytoskeleton, cell adhesion and cell polarity pathways during cellularization
To functionally assess the gene expression clusters, we also carried out gene ontology (GO) enrichment analysis (Table S2). The largest growth cluster (cluster 1) was enriched in GO terms related to cell growth and biosynthesis. Early and mid-cellularization clusters 5 and 6 were enriched for GO terms related to membrane organization and actin cytoskeleton. Late cellularization clusters were, in addition to GO terms related to actin, also enriched for GO terms related to cell adhesion and polarity. Given that these processes play a major role during cellularization of the insect blastoderm, we investigated the expression pattern of homologs of known regulators of cellularization in

Drosophila.
In the Drosophila blastoderm, cellularization depends on the spatial organization of both the microtubule and actin cytoskeleton. It involves several microtubule and actin binding proteins, including kinesins, Myosin II, Myosin V, profilin (Chickadee), cofilin (Twinstar) and Septins, and the conserved family of Rho GTPases (Mazumdar and Mazumdar, 2002) .
In S. arctica, we found that the expression of tubulins and kinesins, except for one (S. arctica Kinesin 2), is constant throughout the coenocytic cycle ( Figure 5A and 5B). On the other hand, we found many actin-associated genes dynamically expressed. All actin nucleators of the formin family and members of the Arp2/3 complex peaked during cellularization ( Figure 5C). In contrast to formins, which largely exhibited sharp peaks, the gene expression of the Arp2/3 complex was initiated earlier and increased gradually ( Figure 5C). Septins, Cofilin, Profilin and Myosin V were temporally co-expressed during mid-cellularization and peaked at the same time as actin nucleators ( Figure 5D), whereas myosin II, which has a role in organizing actin filaments and contractility, peaked later ( Figure 5D). We likewise found the expression of all four members of the Cdc42/Rho1 orthogroup to be sharply activated during late cellularization ( Figure 5E). Temporal co-expression of these genes suggests that the cellular pathways responsible for organizing the cytoskeleton and cell polarity in Drosophila cellularization are also involved in the cellularization process of S. arctica.
In addition to cytoskeletal organization, formation of a polarized epithelium also requires cell adhesion (Rodriguez-Boulan and Macara, 2014) . Integrins and the cytoplasmic members of the integrin adhesome mediate cell-matrix adhesion in animal tissues. We observed that in S. arctica, both integrin receptors, alpha and beta, as well as all cytoplasmic members, are temporally co-expressed during late cellularization ( Figure 5F). Beta and alpha catenin, together with cadherins in animals, mediate cell-cell adhesion in epithelial tissues (Rodriguez-Boulan and Macara, 2014) . In S. arctica, we found three copies of Aardvark, a homolog of beta-catenin (Murray and Zaidel-Bar, 2014) , as well as one homolog of alpha-catenin (Miller et al., 2013) . We found that the expression of two out of three Aardvark transcripts, as well as the expression of the alpha catenin homolog, peaked during late cellularization ( Figure 5F). This suggests that both cell-matrix and cell-cell adhesion pathways play a role in the establishment of the polarized epithelium-like structure during the late cellularization.

The actomyosin cytoskeleton is essential for cellularization
Finally, we tested whether disrupting the cytoskeleton would lead to defects in cellularization. In the absence of genetic tools, we used small inhibitory molecules that target specific conserved cytoskeletal components. We synchronized the cultures and added the inhibitors at the onset of cellularization (54 h time point). We first assessed the role of the microtubule cytoskeleton during cellularization by adding carbendazim (MBC), a microtubule depolymerizing agent (Castagnetti et al., 2007). Microtubule inhibition did not prevent plasma membrane invagination, although it resulted in a delay of flip and release of newborn cells ( Figure 6A and 6B, and S6A, Movie S6 and Movie S7). Furthermore, by staining the MBC-treated cells with DAPI and phalloidin, we observed a loss of the uniform spacing of the nuclei and actin filaments during the cortical compartmentalization stage of cellularization ( Figure 6C). After MBC treatment, newborn cells varied in size and number of nuclei ( Figure S6B, and S6C, Movie S6). These results suggest that the microtubule cytoskeleton is not essential for plasma membrane invagination but is crucial for nucleus and actin filament positioning at the cortex of the coenocyte.
Next, we sought to disrupt actin polymerization using either the broad actin depolymerizing agent Latrunculin A (LatA) (Braet et al., 1996) , the Arp2/3 inhibitor CK666 (Hetrick et al., 2013) , or the formin inhibitor SMIFH2 (Kim et al., 2015b) . Cells treated with Latrunculin A lacked any actin patches or actin filaments and failed to undergo flip or produce newborn cells ( Figure 6A-6C, Figure S6A, Movie S6 and S7). Furthermore, plasma membrane invagination did not occur after LatA treatment (Movie S6 and S7). In contrast, CK666-treated cells formed cortical actin patches, but were not able to form actin nodes, and they were unable to generate plasma membrane invaginations ( Figure 6A-6C, Movie S6 and S7). These results show that the actin cytoskeleton and Arp2/3-mediated actin nucleation are required for the formation of actin nodes, the first step in the cellularization of S. arctica. In contrast, the formin inhibitor SMIFH2 did not block the formation of actin nodes, but it prevented the formation of actin filaments in the later stages ( Figure 6A-6C). In addition, the plasma membrane invagination did not occur ( Figure 6A-6C, Figure S6A, Movie S6 and S7), although we note that a small fraction of cells was not affected. This suggests that formins play a role in the nucleation of actin filaments after the formation of actin nodes.
Finally, we assessed the role of myosin II, using blebbistatin, an inhibitor of myosin II ATPase activity (Kovács et al., 2004) . Blebbistatin treatment blocked plasma membrane invaginations and prevented cellularization ( Figure 6A-6C, Figure S6A, Movie S6 and S7). Although blebbistatin-treated cells were able to form actin filaments, they had an aberrant wavy cortical actin filament network, suggesting that inhibition of myosin II causes loss of actin crosslinking and actin network contractility ( Figure 6C).
Taken together, these results indicate that the actomyosin apparatus is essential for cellularization in S. arctica and reveal a temporal sequence of involvement of Arp2/3 complex, formins and myosin II ( Figure 6D). This temporal sequence is reflected in the relative timing of expression of Arp2/3, formins and myosin II genes ( Figure 5C and 5D).

Discussion
To address whether animal-like cellularization leading to formation of an epithelium exists outside animals, we performed imaging, transcriptomic analysis and pharmacological inhibition in a unicellular relative of animals, the ichthyosporean Sphaeroforma arctica ( Figure 6D). We show that at the onset of cellularization, Arp2/3 complex mediates the formation of actin nodes at the cortex of the coenocyte. This is followed by formin-dependent nucleation of actin filaments. These actin filaments are then crosslinked in a myosin II-dependent manner, which results in formation of a cortical actomyosin network that surrounds evenly-spaced nuclei. This spatial organization of both the actomyosin This strongly suggests that the general mechanisms of cellularization in S. arctica are likely conserved within ichthyosporeans. Altogether, our results argue that cellularization in ichthyosporeans requires both pathways conserved in animals, as well as pathways exclusive to ichthyosporeans. The later may include lincRNAs that we found to be remarkably conserved among ichthyosporeans and show high expression levels. Overall, our temporal gene expression dataset provides a good resource for further functional studies in S. arctica and other ichthyosporean species.
The epithelium is the first tissue that is established during embryogenesis and perhaps the first that emerged in evolution, thus representing the basic form of multicellular organization in animals.
Epithelia are characterized by apical-basal polarity and cell-cell junctions, mediated by cadherins and catenins, that segregates the internal medium from the outside environment (Rodriguez-Boulan and Macara, 2014) .

13
Epithelia were found in all animal lineages, including sponges (Leys et al., 2009) . It has previously been suggested that the origin of epithelial structures predates the origin of animals, as a polarized epithelium-like structures, regulated by alpha-and beta-catenins, is also present in the slime mold Dictyostelium discoideum (Dickinson et al., 2011) . Based on this finding, it has been suggested that all unikonts (Amorphea)a clade including animals, its close holozoan unicellular relatives, as well as Fungi, Apusomonadida, Breviatea, and Amoebozoaevolved from an ancestor with simple multicellular organization (Dickinson et al., 2012) . However, epithelial structures have not yet been observed in other unikont lineages besides animals and slime molds (Brunet and King, 2017) , raising questions whether the origin of these epithelial structures in these two relatively divergent lineages might have originated independently.
Here we argue that epithelium-like structures are generated during cellularization of ichthyosporeans.
We show that the epithelium-like layer of cells is polarized due to the asymmetrical localization of cellular components and the direction of growth. Furthermore, despite lacking genetic tools to demonstrate the function of alpha and beta catenin, the sharp expression of catenins during the epithelium-like stage of cellularization strongly suggests a function in cell-cell junctions. However, whether this layer of cells functions to segregate the interior of the coenocyte from the outside environment remains to be addressed. Our results in ichthyosporeans lends support to the hypothesis that the origin of epithelial tissues predates the origin of animals. However, unlike the epithelium formation in social amoebae, which originates through aggregation (Bonner, 1998) , ichthyosporean epithelium is generated clonally, such as epithelial tissues in all modern animals. Further studies to better understand the mechanisms of epithelium formation in ichthyosporeans will provide insight into the evolutionary origin of one of the defining features of animal multicellularity, the spatiotemporal cellular organization of tissues.

Declaration of interests
The authors declare no competing interests.

Culture conditions
Sphaeroforma arctica cultures were grown and synchronized as described previously (Ondracka et al., 2018) . Briefly, cultures were grown in Marine Broth (Difco BD, NJ, USA; 37.4g/L) in flasks at 12ºC until saturation, producing small cells. These cells were then diluted into fresh media with low density (1:300 dilution of the saturated culture), resulting in a synchronously growing culture. Cultures of S. gastrica, S. tapetis, S. nootkatensis, P. gemmata and A. whisleri were similarly kept in Marine Broth (Difco BD, NJ, USA; 37.4g/L) at 12°C in dark conditions.

Sphaeroforma arctica genome sequencing and assembly
Genomic DNA from S. arctica was extracted using QIAamp DNA Blood Midi Kit (Qiagen) from 300 mL culture incubated at 12 C° for 1 week in six 75 cm 2 flasks. The Qubit (Invitrogen) quantification found ~150 μg genomic DNA in total.
A SMRTbell library for P6/C4 chemistry was constructed and was run on 32 SMRT cells in a PacBio RSII system (Pacific Biosciences), generating 2,209,004 subreads with a total of 25,164,714,269 bp.
Raw subreads were first assembled using the FALCON_unzip assembler (v0.4.0) and the initial assembled sequences were polished by the Quiver integrated in SMRT analysis (v2.3.0). Genomic DNA was then sheared using a Focused-ultrasonicator (Covaris, Inc.). A paired-end library with an insert size of 600 bp was sequenced on an Illumina HiSeq2500 platform, producing 136,566,600 reads with read length of 250 bp. Paired-end reads were mapped against the polished sequences using the BWA mem (v0.7.12) followed by error-correction using Pilon (v1.22) (Walker et al., 2014) .

Sphaeroforma arctica gene annotation
We annotated predicted gene models in our scaffolds using BRAKER2 (Hoff et al., 2018) . First we used STAR 2.7 (Dobin et al., 2012) to map all the RNA-seq samples to the genomic scaffolds, so as to obtain empirical evidence of gene bodies and guide the prediction of gene coordinates. These read mappings were then supplied to BRAKER2 in BAM format, which combied these external evidence with the gene coordinates predicted by Augustus 3.2.2 (Keller et al., 2011) (--noInFrameStop mode) and Genemark v4.21 (Lomsadze et al., 2005) to obtain a consolidated set of gene models in GFF format.
The transcriptome of S. arctica was assembled in order to add 5' and 3' UTR regions to the genome annotation and to search for long non-coding RNA-seq using data obtained here (see below).
The RNA-seq reads from the first time series replicate (excluding timepoint 30) were subjected to quality trimming and adaptor removal using Trimgalore (v0.4.5) (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) and Trimmomatic v0.35 (Bolger et al., 2014) . Trimgalore was first run to remove sequencing adaptors, allowing 0 mismatch and minimum adaptor length of 1 nt. Trimmomatic was run by trimming bases with phred score <20 from both ends. Furthermore, a sliding window of 4 bases was used to trim reads from the 5' end when the mean phred score dropped below 20. Finally, the IlluminaClip option was used to search for remaining sequencing adaptors, allowing 2 seed mismatches, a palindrome clip threshold of 20 and minimum single match threshold of 7 nt. The quality filtered RNA reads were subsequently assembled using Trinity v2.5.1 (Grabherr et al., 2011) , running both a de novo and a genome-guided assembly. To perform a genome-guided transcriptome assembly, we first mapped the RNA-seq reads against the genome using Hisat2 v2.1.0 (Kim et al., 2015a) , in strand-specific mode with default parameters. Both Trinity assemblies were run in strand-specific mode while applying the jaccard clip option and otherwise default parameters. We next evaluated the strand-specificity of the assemblies by mapping RNA-seq reads back to the Trinity assemblies using scripts supplied within the Trinity software.
Transcripts were subsequently removed if >80% of the reads mapped in the wrong direction. The PASA pipeline v2.3.1 (Haas et al., 2003) was then used to update the existing genome annotation.
First, the assembled transcripts described above were mapped against the genome using the initial PASA script to make a temporary annotation file. This was performed with default parameters, using both Blat v35 (Kent, 2002) and Gmap v2015-09-29 (Wu and Watanabe, 2005) aligners, with transcripts specified as strand-specific. Lastly, the transcriptome-based annotations were compared with the existing genome annotation in a final PASA run, also using default parameters. In this step, the existing genes were expanded with UTR annotations and, in cases where a single RNA transcript covered multiple genes, these become merged into a single gene.

Clustering of the gene expression and gene ontology enrichment analysis in S. arctica
Sphaeroforma arctica transcripts were clustered by their gene expression profiles using Clust software (Abu-Jamous and Kelly, 2018) with default parameters and automatic normalization mode.
This yielded 4511 transcripts (4441 coding genes and 70 lncRNA genes) that were clustered into 9 clusters, ranging from 41 to 2081 co-expressed genes. Gene expression profiles were visualized using superheat package in R ( Barter and Yu, 2018

Transcriptome assembly of other unicellular holozoans
Raw reads (from sequencing libraries and SRA data) were processed with Trimmomatic (Bolger et al., 2014) to remove adapters and low-quality bases, by trimming bases with phred score <28 from both ends. Furthermore, a sliding window of 4 bases was used to trim reads from the 5' end when the mean phred score dropped below 28. Finally, the IlluminaClip option was used to search for remaining sequencing adaptors, allowing 2 seed mismatches, a palindrome clip threshold of 28 and minimum single match threshold of 10 nt. All libraries were assembled denovo with Trinity (v2.3.2-2.5.1) (Grabherr et al., 2011) using default parameters. The assemblies of RNA-seq data were performed with the strand specific option, while assemblies based on SRA data were run in standard mode. For the S. sirkka assembly we also applied the jaccard clip option. Coding regions were predicted using TransDecoder v5.2.0 (Haas et al., 2013) , by first extracting the longest possible ORFs (only top strand was searched in the strand-specific assemblies), on which likely coding region was predicted. Only the longest ORF was kept for each transcript.

DNA isolation and genome assembly of S. gastrica and S. tapetis
Genomic DNA (gDNA) was isolated from S. gastrica and S. tapetis, by lysing cells on a FastPrep system (MP Biomedicals, CA, USA; 4m/s, 20s) followed by gDNA purification using the DNeasy kit (Qiagen, NRW, Germany) and subsequently sequenced on an Illumina HiSeq X system (150 bp paired end). Raw reads were subjected to quality trimming and adaptor removal as described above for RNA-seq data, with a phred score cut-off of 26.
The resulting spades assemblies were scaffolded using L_rna_scaffolder (Xue et al., 2013) and polished with Pilon (Walker et al., 2014) . L_RNA_Scaffolder was run by first mapping the respective transcriptome assemblies to the genome assemblies using Blat (Kent, 2002) which was inputted to L_rna_scaffolder. Next, we run Pilon by first mapping the quality trimmed genomic Illumina reads to the genome assembly using Bowtie2 (Langmead et al 2012). The resulting mapping file was then used in Pilon with default parameters. L_rna_scaffolder and Pilon were run repeatedly 5 times, followed by 3 final runs using only Pilon. These genome assemblies were used as reference genomes for genome-guided Trinity assemblies and PASA annotation as previously described for S. arctica.

S. arctica long intergenic non-coding RNA identification and conservation analysis
Long intergenic non-coding RNAs (lincRNAs) were identified from the PASA annotation described above. First, transcripts shorter than 200 nt were discarded. Then, coding potential was evaluated using both TransDecoder (Haas et al., 2013) and CPC2 (Kang et al., 2017) with default parameters.
All transcripts lacking coding potential were then compared with the Rfam database (Kalvari et al., 2017) to exclude other non-coding RNAs such as rRNAs and tRNAs. To exclude possible UTRs actually belonging to fragmented protein coding genes, a minimum genomic distance of 1000 bp from the closest protein coding gene was required. The remaining transcripts were compared with protein coding genes in the Swissprot database using Blastx (Altschul et al., 1990) , and all sequences with a match better than e-value 1e-5 were removed.
Next, the potential lncRNA transcripts were blasted against the proteomes of S. arctica and other closely related ichthyosporeans (Sphaeroforma tapetis, Sphaeroforma sirkka, Sphaeroforma nootkatensis, Sphaeroforma napiecek, Sphaeroforma gastrica, Creolimax fragrantissima, Pirum gemmata and Abeoforma whisleri), and all transcripts with a match better than e-value 1e-5, were discarded. To remove transcripts resulting from potential spurious transcription, we required a minimum expression level of at least 1 tpm in at least one time series sample.
To search for possible conserved lncRNAs in other species, we performed Blastn searches against transcriptomes and genomes of several holozoan species (Richter et al., 2018) (Table S3). All genes providing hits with an e-value less than 1e-50 were considered to be a conserved homolog.

Genome-wide analysis of alternative splicing
Each RNA-seq run was independently aligned to the Sphaeroforma genome using Hisat2 (Kim et al., 2015a) , using default parameters except for longer anchor lengths to faciliate de novo transcriptome assembly (--dta flag).

19
The resulting alignments (in SAM format) were used to build sample-specific transcriptome assemblies with Stringtie2, using existing gene models (in GFF format, -G flag) as a reference, a minimum isoform abundance of 0.01 (-f flag) and a minimum isoform length of 50 bp (-m flag). For each sample, we only retained transcripts that overlapped with known genes in the final GFF file (using bedtools (Quinlan and Hall, 2010) ). Then, we built a consolidated set of isoforms by pooling all sample-specific GFF annotations and the reference annotation using Stringtie2 (--merge flag), without imposing any limitation of minimum expression levels (-T flag set to 0), and retaining isoforms with retained introns (-i flag). We also calculated the expression levels at the isoform level using using orthofinder (Emms and Kelly, 2015) in Diamond mode. The generated orthogroups were used to determine orthologs of S. arctica genes in other species, unless a phylogeny of a specific gene family has been published. We then determine gene age of each orthogroup by Dollo parsimony implemented in Count software (Csuros, 2010) to classify the S. arctica proteome into 10 phylostrata, ranging from paneukaryotic to S. arctica-specific. The transcriptome age index was computed using a formula defined in (Domazet-Lošo and Tautz, 2010) .

Flow cytometry
Flow cytometry was performed as described previously (Ondracka et al., 2018) Cells were fixed in 4% formaldehyde, 1M sorbitol solution for 15 minutes at room temperature, washed once with marine PBS (PBS with 35 g/L NaCl), and stained with DAPI (final concentration 0.5 μg/mL) in marine PBS.
Samples were analyzed using an LSRII flow cytometer (BD Biosciences, USA) and the data were collected with FACSDiva software. DAPI signal was measured using a 355nm laser with the 505nm longpass and 530/30nm bandpass filters. Approximately 2,000 events were recorded in each measurement. The flow cytometry data were processed and analyzed using FloJo software (Ashland, OR).

Microscopy
Confocal microscopy of the spatiotemporal organization of actin in Figure 2A and Movie S5, was performed using a confocal laser scanning Leica TCS SP5 II microscope with an HC PL APO 63x/1.40 Oil CS2 oil objective. All remaining live and fixed images were obtained using a Zeiss Axio Observer

Cell fixation and staining
Cells were fixed using 4% formaldehyde and 250mM sorbitol for 30 minutes before being washed twice with PBS. For actin and nuclei staining phalloidin (Figures 2A, 2C For figures 5A, S5A and S5B, cells were pre-grown at 12ºC for 48h prior to fixing every hour for a total of 14 hours.
To ensure oxygenation during the whole period of the experiment, the cover has been removed. To maintain the temperature at 12°C we used a P-Lab Tek (Pecon GmbH) Heating/Cooling system connected to a Lauda Ecoline E100 circulating water bath. To reduce light toxicity, we used a 495nm Long Pass Filter (FGL495M-ThorLabs). Kymographs of cells undergoing cellularization in Figure 1D were constructed in ImageJ v1.46 by drawing a 3-pixel-wide (

Image analysis
Image analysis was done using ImageJ software (version 1.52) (Schneider et al., 2012) . For measurements of cell diameter in Figures 1E and S1A, we cropped movies to ensure having a single cell per movie. We then transformed the movies into binaries to ensure later segmentation. We then used particle analysis function in ImageJ with a circularity parameter set to 0.65-1 to quantify measure cell perimeter. As cells are spherical, we computed cell diameter as: For quantification of fraction of cells in each stage of cellularization Figure S2B, we used the ObjectJ plugin in ImageJ (National Institutes of Health).