Thermotogales origin scenario of eukaryogenesis

How eukaryotes were generated is an enigma of evolutionary biology. Widely accepted archaeal-origin eukaryogenesis scenarios, based on similarities of genes and related characteristics between archaea and eukaryotes, cannot explain several eukaryote-specific features of the last eukaryotic common ancestor, such as glycerol-3-phosphate-type membrane lipids, large cells and genomes, and endomembrane formation. Thermotogales spheroids, having multicopy-integrated large nucleoids and producing progeny in periplasm, may explain all of these features as well as endoplasmic reticulum-type signal cleavage sites, although they cannot divide. We hypothesize that the progeny chromosome is formed by random joining small DNAs in immature progeny, followed by reorganization by mechanisms including homologous recombination enabled with multicopy-integrated large genome (MILG). We propose that Thermotogales ancestor spheroids came to divide owing to the archaeal cell division genes horizontally transferred via virus-related particles, forming the first eukaryotic common ancestor (FECA). Referring to the hypothesis, the archaeal information-processing system would have been established in FECA by random joining DNAs excised from the MILG, which contained horizontally transferred archaeal and bacterial DNAs, followed by reorganization by the MILG-enabled homologous recombination. Thus, the large genome may have been a prerequisite, but not a consequence, of eukaryogenesis. The random joining of DNAs likely provided the basic mechanisms for eukaryotic evolution: producing the diversity by the formations of supergroups, novel genes, and introns that are involved in exon shuffling. © 2020 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license. ( http://creativecommons.org/licenses/by/4.0/ )


Eukaryogenesis studies from the prokaryotic side
Eukaryogenesis is considered a genomic innovation converting prokaryotes to eukaryotes, introducing organizational complexity, including endomembranes, cytoskeletons, and mitochondria, in addition to large cells and genomes to manage them ( Koonin, 2010 ;Lane and Martin, 2010 ). The large genome appears the essential feature of eukaryotes, since prokaryotic genomes are necessarily small, while the other features can be seen in rudimentary level in prokaryotes ( Lane and Martin, 2010 ). Aside from that, it is widely accepted that eukaryotes are chimeras of archaea and bacteria; eukaryotes contain informational genes similar to those of archaea and operational genes similar to those of bacteria ( Rivera et al., 1998 ). Various types of genome fusion between archaea and bacteria and hypothetical scenarios have been proposed for eukaryogenesis, as reviewed previously ( Embley and * Corresponding author. E-mail address: kuwabara@biol.tsukuba.ac.jp (T. Kuwabara). Martin, 2006 ;Martin et al., 2015 ;Poole and Penny, 2006 ). The proposed archaeal species were changed from eocytes (hyperthermophilic Crenarchaea) ( Rivera and Lake, 2004 ) through TACK superphylum ( Guy and Ettema, 2011 ) to Asgard superphylum ( Zaremba-Niedzwiedzka et al., 2017 ), including Lokiarchaeota ( Spang et al., 2015 ), along with the spread of sampling areas, which eventually suggested that eukaryote-specific proteins could have originated from the deep sea ( Spang et al., 2015 ;Zaremba-Niedzwiedzka et al., 2017 ). The specific bacterial species involved was proposed to be cyanobacteria or proteobacteria ( Rivera and Lake, 2004 ), but this is not definite as these speculations are not supported by further studies ( Gould et al., 2016 ;Guy and Ettema, 2011 ;Spang et al., 2015 ;Zaremba-Niedzwiedzka et al., 2017 ). Recently, Gould et al. (2016) proposed the hypothesis that the bacterial genes had been horizontally transferred to the αproteobacterial proto-symbiont, and after endosymbiosis, the outer membrane (OM) vesicles of mitochondria transferred the genes to the host nucleus, accompanied by the endomembrane formation by the vesicles. This paper was a breakthrough for the understanding of the possible mechanism underlying endomembrane formation, but it appears to contain inconsistencies regarding the structural aspects of the cell. OM vesicle-producing bacte-ria should have peptidoglycan (PG); no eukaryotes have been reported to have mitochondria with PG, although some chloroplasts contain PG ( Aitken and Stanier, 1979 ;Pfanzagl et al., 1996 ), even with genes coding for PG formation obtained from bacteria other than cyanobacteria ( Sato and Takano, 2017 ), the taxon believed to be the origin of chloroplasts. Moreover, lipopolysaccharide (LPS) ( Schwechheimer and Kuehn, 2015 ) in the OM should have been immediately removed upon formation of endomembranes, unless the proto-symbiont is LPS-less α-proteobacteria, such as Sphingomonas ( Kawasaki et al., 1994 ), which have never been considered as the origin of mitochondria. The lack of LPS is not discussed in the OM vesicle-derived endomembrane hypothesis ( Gould et al., 2016 ). Thus, this hypothesis could be evaluated for the vesiclefusion mechanism of endomembrane formation but not for the source of vesicles.

Eukaryogenesis studies from the eukaryotic side
According to the classification by Adl et al. (2012) , eukaryotes can be divided into five supergroups: Stramenopiles, Alveolata, and Rhizaria (SAR); Archaeplastida; Excavata; Amoebozoa; and Opisthokonta. The mechanism of the diversification, which produced the supergroups from the last eukaryotic common ancestor (LECA), is unknown. Parasitic amitochondriate eukaryotes were previously thought to be closely related to the LECA ( Cavalier-Smith, 1989 ;Pace, 1997 ). However, the phylogenetic proximity was found to be caused by a long branch attraction ( Felsenstein, 1978 ) artifact with the discovery that their organelles, hydrogenosomes and mitosomes, are the retrogressed forms of mitochondria ( Embley and Martin, 2006 ). A free-living excavate, Naegleria gruberi, was found to have a large genome (41 Mb), including 4133 genes that are shared with at least one other eukaryote supergroup, in addition to novel genes accounting for nearly 40% of the total protein-coding genes ( Fritz-Laylin et al., 2010 ). N. gruberi, compared to other eukaryotes, appears evolutionarily closer to LECA; the supergroups may have evolved from the LECA by reducing the gene number ( Fritz-Laylin et al., 2010 ). Protein-coding genes of Saccharomyces cerevisiae (Opisthokonta) with evolutionary affinities to the Crenarchaeota-Thaumarchaeota-Nanoarchaeota group and α-proteobacteria are only 3% and 4%, respectively, of the total, while those of Cyanidioschyzon merolae (Archaeplastida) are 3% and 6%, respectively, of the total, with novel genes accounting for more than 30% of the total in both microorganisms ( Koonin, 2010 ). Thus, the possibility cannot be excluded that the LECA obtained more than 50% of its protein-coding genes from other bacteria or archaea. Pittis and Gabaldón (2016) reported that some bacterial proteins were obtained before acquisition of mitochondria, supporting the late mitochondria-acquisition ( Koumandou et al., 2013 ), though this topic is still debated in the field ( Degli Esposti, 2016 ;Martin et al., 2017 ). Such acquisition of bacterial proteins is possible by direct horizontal gene transfer (HGT) to the host or by more than one endosymbiotic event ( Gabaldón, 2018 ). In addition to the transfer of bacterial genes, the production of novel genes appears an important feature of eukaryogenesis. However, how they were produced remains unanswered.

Are experimental studies on eukaryogenesis possible?
We thought experimental research on eukaryogenesis would be possible, if appropriate model microorganisms are present, by introducing and/or removing some genes, similar to induced pluripotent stem cells established by introducing four factors, Oct3/4, Sox2, c-Myc, and Klf4, into adult mouse fibroblasts ( Takahashi and Yamanaka, 2006 ). However, such model microorganisms have not yet been obtained. Eukaryogenesis is likely to be a process in which an anaerobic host prokaryote captures an aerobic αproteobacterial symbiont, possibly under partially aerobic conditions. Environments around hydrothermal vents are one of the naturally occurring partially aerobic conditions, which is dynamically maintained by the mixing of anaerobic hydrothermal fluids and aerobic seawater ( Jannasch and Mottl, 1985 ). We isolated a possible model prokaryote, Thermosipho globiformans (JCM 15059 T /DSM 19918 T /NBRC 109617 T ) ( Kuwabara et al., 2011 ), whose spheroids can produce progeny rods in periplasm ( Kuwabara and Igarashi, 2017 ), from a hydrothermal vent of Suiyo Seamount (Japan), using an in-situ cultivation device that creates an interface of anaerobic and aerobic conditions ( Kuwabara et al., 2006 ).
The genus Thermosipho belongs to the order Thermotogales ( Huber and Stetter, 1992 ). Thermotogales represents (hyper)thermophilic and anaerobic heterotrophs having an enigmatic morphology, large periplasm at the ends of rods and the surrounding thick surface structure, toga, which consists of OM and amorphous layer (AL). 1 Thermotogales bacteria are microscopically distinguishable from the other prokayotes by the large periplasm. Thermotogales is known to have been highly prone to HGT ( Nelson et al., 1999 ;Nesbø et al., 2009 ;Zhaxybayeva et al., 2009 ), as they are suggested to have obtained as many as 80% of their proteincoding genes by HGT . Consistent with this are many clustered regularly interspaced short palindromic repeat (CRISPR) sequences ( Horvath and Barrangou, 2010 ) in their genomes Zhaxybayeva et al., 2009 ), indicative of previous infection by viruses and thereby HGT ( Brodt et al., 2011 ). In this article, we hypothesize the mechanism of the chromosome formation in progeny. Using this hypothesis, we explain the establishment of archaeal information-processing system in Thermotogales ancestor spheroids as well as the generations of supergroups, novel genes, and introns, in eukaryogenesis.

Intra-outer-membrane multicellularity of Thermotogales
Enrichment cultures for the model prokaryotes were performed with heterotrophic media at 55 ºC. T. globiformans , which naturally forms spheroids containing multiple cells within an OM in the early growth phases, was selected as the model prokaryote ( Kuwabara et al., 2011 ). This multicellular state is expressed in this article as i ntra-o uter-membrane m ulticellularity (IOM), emphasizing the difference from ordinary multicellularity regarded as a 1 Cell structures of Thermotogales . In Thermotogales , the cell is defined as the cytoplasmic membrane (CM) plus the entity surrounded by the CM, while the rod is defined as the total of the rod-shaped cell, periplasm, and the Thermotogalesspecific surface structure, toga ( Huber and Stetter, 1992 ), which is composed of an outer membrane (OM) and an amorphous layer (AL). The discrimination of cell and rod is necessary because cell division and rod fission do not occur simultaneously ( Kuwabara and Igarashi, 2012 ), different from other bacteria ( Errington, 2003 ;Margolin, 2005 ).
The AL is an approximately 40-nm-thick structure that lines the OM, which contains or is equivalent to peptidoglycan (PG) ( Kuwabara et al., 2011 ), as described below. Thermotogales PG has not been definitively identified by TEM, but it has been extensively characterized biochemically ( Boniface et al., 2006 ;Boniface et al., 2009 ). PG of Thermotoga maritima has been shown to have two different peptides (1:1 in the molar ratio). The third amino acid residues from the glycan chain, central to the three-dimensional (3D) structure of PG ( Typas et al., 2012 ), are L-lysine and Dlysine. As a result of the heterogeneity, PG is expected to have a less compact 3D structure, which would be hard to form and easy to degrade, despite functioning in total as a cell wall. In TEM images, the thick AL is the only recognizable structure between the OM and the CM. Lysozyme treatment of rods eliminates the AL, causing the transformation of rods to spheroids ( Kuwabara et al., 2011 ), a process that suggests that AL contains or is equivalent to PG. Although an OM protein, Omp α, is shown to be associated with AL ( Engel et al., 1992 ), proteins are not affected by lysozyme, indicating that at least the main constituent of AL is PG. eukaryote-specific feature. We considered how the IOM was involved in eukaryogenesis. We suggested that even if the genomic innovation to eukaryotes failed, the Thermotogales lineage would have been preserved, and speculated that the IOM was related to the genomic innovation and/or the lineage preservation.

Large cells and genomes, and an endomembrane of Thermotogales spheroids
Cell structure of Thermotogales is different from those of other bacteria. 1 Owing to these differences, Thermotogales can form rods with IOM ( Kuwabara and Igarashi, 2012 ). Briefly, in Thermotogales , the septal PG is formed in the periplasm between previouslydivided cells, unlike in other bacteria, in which the septal PG (plus OM in the case of gram-negative bacteria) forms concomitantly with the constriction of the CM ( Errington, 2003 ;Margolin, 2005 ;Wissel et al., 2005 ). Through rounds of cell division without septation, rods with IOM are formed. The transformation of rods to spheroids by the alteration or degradation of surrounding PG is common in bacteria ( Lam et al., 2009 ;Leaver et al., 2009 ;Young, 2008 ). The resulting spheroids lack the ability to accumulate PG. The transformation of T. globiformans occurs in the early growth phases ( Kuwabara and Igarashi, 2017 ), in addition to the stationary phase as in other bacteria ( Huber et al., 1989 ;Huber et al., 1990 ;Huber et al., 1986 ;Lam et al., 2009 ;Leaver et al., 2009 ;Podosokorskaya et al., 2011 ;Podosokorskaya et al., 2014 ;Young, 2008 ). Owing to the early transformation, spheroids with IOM, which are produced by the transformation from rods with IOM, enlarge in the exponential phase, accompanied by the growth of multiple cells. These cells fuse with each other and can form a single dish-shaped cell that continues to grow ( Kuwabara and Igarashi, 2017 ). 2 The resulting large dish-shaped cells likely fulfill the size requirements of LECA ( Koonin, 2010 ;Lane and Martin, 2010 ). Although the nucleoids of dish-shaped cells should only contain components of the genome of T. globiformans , we posit that their copies likely integrate into a larger entity. The nucleoid of large dish-shaped cells appears distinct from that of rods as it pertains to both size 2 and the transformation during growth; the nucleoid transitions from the dish shape to a ring shape with a hole in the center, finally to a lip shape with the vacancy pulled into opposite directions ( Kuwabara and Igarashi, 2017 ), showing an appearance resembling dividing eukaryotic chromosome ( Kuwabara et al., 2011 ). These macroscopic differences suggest that the large dish-shaped cells have multicopy-integrated large genome (MILG), not identical to the genome of rods, and that the IOM is likely necessary to create the MILG through cell fusion.
Spheroids having a large dish-shaped cell can have vesicles and/or rods in periplasm, as shown by transmission electron microscopy (TEM) images ( Fig. 1 ). They are regarded as immature and mature progeny, respectively, based on comparison with optical 2 Large dish-shaped cells with a large nucleoid . Spheroid growth can be studied by high-temperature microscopy (HTM), which was developed to live-observe the growth of anaerobic thermophiles ( Kuwabara and Igarashi, 2012 ). T. globiformans spheroids with IOM having a diameter as small as 2 μm in the early growth phases grow up to 12 μm in diameter in the late exponential phase ( Kuwabara and Igarashi, 2017 ). Along with the growth of spheroids, multiple cells fuse with each other to form a dish-shaped cell with a diameter of ~3 μm, which further grows to ~12 μm. Based on growth, the cell volume increased about 7500fold in 367 min ( Kuwabara and Igarashi, 2017 ). This increase is not surprising, considering the ability of T. globiformans rods to grow rapidly with the doubling time of 24 min ( Kuwabara et al., 2011 ) and the inability of dish-shaped cells to divide ( Kuwabara and Igarashi, 2017 ). Accompanying the cell fusion, the nucleoids from multiple cells fuse with each other and further enlarge to form a large nucleoid, occupying 1/7 to 1/6 area of spheroids (diameter, 7-9 μm), as seen in epifluorescence microscopy images ( Kuwabara and Igarashi, 2017 ;Kuwabara et al., 2011 ), which show very strong green epifluorescence of Live/Dead, indicating the viability of the cells. microscopically observed various-sized moving progeny (MP), as identified with DNA and membrane. 3 The immature progeny likely fuse with each other ( arrow ) forming larger immature progeny ( I' ), which is not surprising since multiple cells translocate from rods into spheroids and fuse with each other to form dish-shaped cells ( Kuwabara and Igarashi, 2017 ). The fusion of immature progeny is noteworthy given that the OM vesicles ( Gould et al., 2016 ) are hy-3 Progeny production in spheroid periplasm . Rapidly moving objects, which have various sizes depending on the cultivation time, can be produced in periplasm of spheroids having large dish-shaped cells ( Kuwabara and Igarashi, 2017 ;Kuwabara et al., 2011 ). These objects are stainable with Live/Dead and FM1-43, indicating the presence of nucleic acids and membrane, respectively ( Kuwabara et al., 2011 ). When fixed, they can be stained with 4 ,6-diamidino-2-phenylindole (DAPI), indicating the presence of DNA ( Kuwabara and Igarashi, 2017 ). Given this, they are called moving progeny (MP), which should have a cytoplasmic membrane; they were called simply 'progeny' in a previous paper ( Kuwabara and Igarashi, 2017 ). The size variety of MP suggests that MP grow in spheroid periplasm, although live-observations of the MP growth have been unsuccessful, because spheroids having small MP tend to sediment from just beneath the coverslip of observation chamber upon HTM ( Kuwabara and Igarashi, 2017 ). In transmission electron microscopy (TEM), vesicles smaller than rods, as well as rods, which have togas, can be observed in periplasm of spheroids having a large dish-shaped cell ( Fig. 1 ). Given their subcellular localization and size, the vesicles and rods are likely equivalent to optical-microscopically observed small and large MP and, thus, are called immature and mature progeny, respectively.
It is uncertain whether the movement of MP is derived from cell motility or Brownian movement. It should be noted that Thermosipho genomes Zhaxybayeva et al., 2009 ), including that of T. globiformans (GCA_003990895.1), contain flagella-related genes. Nevertheless, Thermosipho is generally considered to be immotile, though with a few exceptional species ( L'Haridon et al., 2001 ;Podosokorskaya et al., 2011 ;Podosokorskaya et al., 2014 ). The conditions under which the Thermosipho flagellar genes are expressed remain to be clarified.
Other Thermotogales species have not been reported to produce MP. Nevertheless, Thermotoga maritima ( Huber et al., 1986 ) does produce MP when the media of a stationary phase culture is exchanged via precipitation and resuspension, followed by cultivation ( Kuwabara and Igarashi, 2017 ). Thus, the apparently special ability of T. globiformans to form in-periplasm progeny in the exponential phase likely resides in the high efficiency of spheroid formation in the early growth phases ( Kuwabara and Igarashi, 2017 ). One of the possible reasons of the early spheroid formation could be related to its natural habitat, Suiyo Seamount, which is a caldera where fast bottom current flows twice a day ( Kuwabara et al., 2006 ), likely bringing oxic seawater to the anoxic hydrothermal vents, and thus both anoxic and oxic environments alternate twice in a day. Thermotogales species isolated from other habitats are not reported to form spheroids in the early growth phases. pothesized to fuse with each other and form eukaryotic endomembranes. The immature progeny appears to be a better source of endomembranes than the OM vesicles, since the immature progeny membrane is likely derived from the CM 3 and thus, may not be accompanied by either PG or LPS.
The TEM image of spheroid containing various sizes of vesicles and rods with a large dish-shaped cell was reproduced from Kuwabara and Igarashi (2017) . D , dish-shaped cell; I, small vesicle (immature progeny); I , large vesicle (immature progeny); and M, rod (mature progeny). Arrows indicate possible fusion of small vesicles. Bar: 1 μm. A movie of epifluorescence microscopy results showing a spheroid having a rod-sized MP in periplasm is available as Online Resource 6 from Kuwabara and Igarashi (2017) .
If the eukaryotic endomembranes are related to the Thermotogales CM, the similar N -terminal signal cleavage systems should be found between them. However, it is well known that the endoplasmic reticulum-type eukaryotic signal cleavage sites ( Nielsen, 2017 ) are distinct from those of prokaryotes ( Bagos et al., 2009 ;Nielsen, 2017 ) and have never been found in prokaryotic proteins. Surprisingly, the eukaryotic signal cleavage sites are found in Thermotogales periplasmic proteins if they are translated from non-ATG translation-initiable codons ( Fig. 2 ). Briefly, FtsI, the d,d -transpeptidase catalyzing the final step of PG synthesis ( Typas et al., 2012 ) is considered to be soluble in Thermotogales , since septal PG is formed far from the CMs of previously divided cells ( Kuwabara and Igarashi, 2012 ). In fact, most Thermosipho FtsIs are predicted by SignalP ( Nielsen, 2017 ) to have eukaryotic signal cleavage sites. SignalP is considered the best predictor of N -terminal signal cleavage sites in bacteria and eukaryotes ( Choo et al., 2009 ). Moreover, another tool, Signal-3L ( Shen and Chou, 2007 ), produced similar but ambiguous results. In contrast to the Thermosipho FtsIs, no signal cleavage sites were predicted in most FtsIs of other Thermotogales species. Since the cleavage site-predicted Thermosipho FtsIs are all reported to be translated from non-ATG translation-initiable codons, the other Thermotogales FtsIs were also tested to determine if they were translated from non-ATG translation-initiable codons, as detailed in the legend of Fig. 2 . Eukaryotic signal cleavage sites are now predicted in all Thermosipho and Thermotoga FtsIs and in most Thermotogales FtsIs (Supplementary Table 1). Eukaryotic signal cleavage sites are also predicted in other proteins of Thermotoga maritima including xylanase ( Liebl et al., 2008 ), glucose-binding protein ( Nanavati et al., 2002 ;Palani et al., 2012 ), maltose-binding proteins ( Nanavati et al., 2002 ), and arginine-binding protein ( Ausili et al., 2013 ), for which the periplasmic localization has been biochemically suggested. This finding suggests that the endoplasmic reticulum membrane would have been derived from immature progeny, rather than from the OM vesicles, in which a comprehensive N -terminal signal cleavage system has not been reported. Unfortunately, no signal peptidases with appreciable homology between Thermotogales and eukaryotes have yet been found. This is likely because substrate specificity may not be evident based on the primary sequences of signal peptidases.
Gene and protein sequences were retrieved from National Center for Biotechnology Information (NCBI), as shown in Supplementary Tables 1 and 2. Protein sequences are aligned referring to the eukaryotic signal cleavage sites, which were predicted as follows and are indicated by triangles. The N -terminal hydrophobic stretches ( von Heijne, 1983 ) formed by non-charged amino acid residues are highlighted. In the region from the N -terminus to the end of the hydrophobic stretch, amino acid residues encoded by translation-initiable codons other than ATG, such as GTG (for valine, V), ATA (isoleucine, I), and TTG (leucine, L), are underlined. The original sequence as well as the truncated sequences starting from the underlined positions (replaced by methionine) or internal methionine residues were analyzed for signal cleavage sites of eu-karyotes and gram-negative and gram-positive bacteria using Sig-nalP 4.1 ( Nielsen, 2017 ) ( http://www.cbs.dtu.dk/services/SignalP/ ) with default settings. For predicted eukaryotic signal cleavage sites in the sequence, the letter at the N -terminal position is written in red. Irrespective of the N -terminus, the same signal cleavage site was predicted in each FtsI, with one exception (Supplementary Table 1). Notably, FtsIs of Pseudothermotoga thermarum (AEH50221) and Fervidobacterium penivorans (WP_014452532) belonging to Thermotogales , in addition to Petrotogales and Kosmotogales bacteria, except for Marinitoga piezophile (AEX86383), were predicted to have no signal cleavage sites. This indicates that the order Thermotogales , especially Thermosipho and Thermotoga genera, is more closely related to eukaryotes than the other orders, Petrotogales and Kosmotogales , in the phylum Thermotogae ( Bhandari and Gupta, 2014 ). In some T. maritima MSB8 proteins, signal cleavage sites of gram-negative and/or gram-positive bacteria were also predicted in conjunction with those of eukaryotes, but never alone (Supplementary Table 2).
None of the Thermotogales FtsI sequences show bacterial signal cleavage sites (Supplementary Table 1), consistent with the aforementioned involvement of FtsI in the Thermotogales -specific morphogenesis, which suggests that FtsI is a Thermotogales -original protein. Further, no bacterial signal cleavage sites have been found in the arginine-binding protein. In contrast, some signal cleavage sites of the sugar-related proteins are predicted to be both eukaryotic and bacterial (Supplementary Table 2), suggesting that these proteins may have been obtained by HGT via signal cleavage sites.
It is currently unknown how the translation from non-ATG translation-initiable codons relates to eukaryogenesis. However, this is the first example of eukaryotic signal cleavage sites in prokaryotic proteins, which likely is significant given the importance of these sites in other prokaryotic proteins. Because of the large cell and genome sizes, the endomembrane formation, and the eukaryotic signal cleavage sites, Thermotogales ancestor spheroids, which have the G3P-type membrane lipids, could be regarded as a proto-eukaryote ( Poole and Penny, 2006 ) possessing some eukaryote-specific features, even if archaeal informational genes were absent.

Hypothesis of fragment-joining chromosome formation in progeny
The large dish-shaped cells can produce MP, which are detectable with optical microscopy, in spheroid periplasm. 3 The size of the MP is negatively correlated to their number ( Kuwabara and Igarashi, 2017 ;Kuwabara et al., 2011 ). This suggests that it is unlikely that an immature progeny matures by itself, which should yield a similar number of MP, irrespective of the size. In TEM images ( Kuwabara and Igarashi, 2017 ), immature progeny can be as small as 80 nm in diameter ( Fig. 1 , label I ). The small immature progeny may not have the whole genome of T. globiformans (1.9 (Mb, GCA_003990895.1) ( Igarashi, 2019 ), when compared with Nanoarchaeum equitans ( Huber et al., 2002 ), which is one of the smallest prokaryotes with a cell size of 400 nm in diameter and a genome size of 0.49 Mb ( Waters et al., 2003 ). Briefly, the amount of chromosomal DNA in a prokaryotic cell is represented by genome size multiplied by copy number ( Taniguchi et al., 2010 ). Thus, even if the copy number of N. equitans were 55, the largest archaeal copy number of a euryarchaeon Methanococcus maripaludis ( Hildenbrand et al., 2011 ;Soppa, 2011 ), the amount of DNA in the small immature progeny would be 0.22 Mb, approximately 11% of the genome, assuming the same concentration of chromosomal DNA. This amount is likely insufficient for the growth. Taken together with the apparent inconsistency between the numbers of immature and mature progeny, it is possible that immature progeny fuse with each other ( arrow ) to eventually form mature progeny with the former DNA fragments ligated to form the chromosome of the latter. Alternatively, mature progeny might be produced directly by the large dish-shaped cells, but if so, this should be accompanied by the removal of immature progeny. This removal is unlikely, considering that immature progeny has both DNA and a membrane. 3 If this were the case, for what purpose are immature progeny produced? Furthermore, it is reasonable to think that the dish-shaped cells cannot produce PG, since they are generated through the results of the alteration or degradation of surrounding PG ( Lam et al., 2009 ;Leaver et al., 2009 ;Young, 2008 ). If the dish-shaped cells can produce PG, why is PG absent from the spheroid OM ( Fig. 1 ) ( Kuwabara and Igarashi, 2017 ;Kuwabara et al., 2011 )? Thus, the PG of mature progeny may not be formed by the large dish-shaped cells, but is likely formed under a different genetic control in spheroids, namely in large immature progeny (label I' ) when the PG-forming genes or the chromosome is completed. The presumed fragment joining appears to be consistent with the findings of Nesbø et al. (2006) who showed that some Thermotoga genes contain patches of four or more consecutive nucleotides termed "potentially recombinant fragments." Thus, we believe that the fragment joining theory is worth serious consideration, although such a mechanism of chromosome formation has not yet been identified.
If the DNA fragments in immature progeny have appropriate cohesive ends, the correct ligation of fragments reproducing the rod chromosome might be possible. However, such a cohesive endgenerating mechanism is unknown. Thus, further research is required to understand the mechanism that ligates non-homologous ends. In Mycobacterium and eukaryotes, the non-homologous endjoining (NHEJ) system is reported, which repairs dsDNA-breaks with either blunt or cohesive ends ( Brissett and Doherty, 2009 ;Glickman, 2014 ;Kegel et al., 2006 ). The NHEJ system employs Ku protein, which binds to the ends, and LigD or a homolog, which is recruited by Ku and ligates the ends. Unfortunately, no homologs of Ku and LigD of Mycobacterium were found in Thermotogales genomes during a Blastp search ( Altschul et al., 1990 ). However, we found a LigD homolog (WP_121509529.1) with an identity as large as 44% in a mesophilic Thermotogae ( Bhandari and Gupta, 2014 ), Mesotoga sp. H07pep.5.4 ( Nesbø and Charchuk, 2019 ). Thus, the fragment-joining system in Thermotogales , which could be analogous to the NHEJ system of Mycobacterium , cannot be excluded.
In the presumed fragment joining system in Thermotogales , the ligation of DNA fragments would occur each time following fusion of immature progeny, which must result in a sequence different from that in the chromosome, since the order of the fusion of immature progeny is expected to be random. Thus, reorganization of random ligated DNA may be necessary to produce the rod chromosome. One of the possible mechanisms of reorganization would be homologous recombination with fragments excised from different loci of the MILG of large dish-shaped cells. For example, by using fragments including the border of DNAs neighboring in the MILG, the DNAs separated by unrelated fragments in the random ligated product could regain the neighboring locations. Nevertheless, this mechanism may not be sufficient for reorganization. Other mechanisms such as the knock-in activities of CRISPR systems ( Auer et al., 2014 ), could also be involved in reorganization. Regardless, the chromosome formation by fragment joining followed by reorganization cannot be excluded.

Virus-related particles with large DNAs and broad host ranges
We postulate, in this article, a Thermotogales origin scenario of eukaryogenesis. The genetic vehicles transferring archaeal informational genes could have been virus-related particles, as described in the next paragraph. Thus, this scenario could be regarded as a version of viral eukaryogenesis ( Livingstone Bell, 2001 ) with the specific host, Thermotogales ancestor spheroids, which were speculated to have large cells and genomes, and an endomembrane.

Establishment of FECA
If the fragment joining followed by the homologous recombination-mediated reorganization represents an aspect of chromosome formation in progeny, the establishment of FECA as well as the eukaryotic evolution could be, at least partly, explained. It would be reasonable to speculate that immature progeny fused with each other and formed eukaryotic endomembranes, including the nuclear membrane ( Fig. 3 ), as suggested in the OM vesicle-derived endomembrane hypothesis ( Gould et al., 2016 ). For the endomembrane formation, the mature rod formation from immature progeny should be prevented. We posit that the inability of dish-shaped cells to produce PG ( Kuwabara and Igarashi, 2017 ) likely eased the permanent cessation of PG formation, for example, by inserting unrelated fragments into PG-forming genes. It is noteworthy that Thermotogales OM, which does not contain LPS ( Sutcliffe, 2010 ), could be used for the CM of the resulting eukaryote-like cells without significant modifications of lipids.
Foreign DNAs were horizontally transferred into Thermotogales ancestor spheroids, whose growth yielded multicopy-integrated large genome (MILG). Immature progeny therefrom contained a fragment derived from Thermotogales (colored) or foreign (white) DNA ( a ). Immature progeny containing Thermotogales and foreign DNAs neighboring in the MILG would have also been produced, but are not shown for simplicity. The immature progeny fused with each other forming the nuclear membrane, with their DNAs random ligated, followed by homologous recombination-mediated reorganization. Among the resulting spheroids, those that became capable of division owing to the archaeal cell division genes are shown as FECA ( b ). Note that owing to the random joining of DNA fragments, the diversity is introduced in the FECA members; the lengths, numbers, and gene repertoires of chromosomes varied among them.
We assume that ancestors of Thermotogales had no CRISPR sequences. Therefore, their spheroids are expected to have been able to acquire a large amount of foreign DNA, taking advantage of the absence of the physical barrier, PG, without degrading them through CRISPR-related immunity ( Horvath and Barrangou, 2010 ). Thus, it would be possible that virus-related particles with large DNA and broad host ranges transferred massive archaeal genes simultaneously, including those for cell division ( Lindås et al., 2008 ) and cytoskeletons for phagocytosis ( Yutin et al., 2009 ;Zaremba-Niedzwiedzka et al., 2017 ). The MILG of large dish-shaped cells would have contained DNA more foreign than Thermotogales ' genetic material (for example 40 Mb in total, referring to the genome size of N. gruberi , vs 2 Mb, that of Thermotogales ). The Thermotogales and foreign DNA fragments in immature progeny should have been ligated in a random order. The resulting DNAs had a linear morphology because the ends were unrelated with each other. Most DNAs were dysfunctional due to the inappropriateness of the joining order, but some could have had a chance to have functional genes owing to the aforementioned homologous recombinationmediated reorganization of DNA. The spheroids, in which the archaeal cell division genes ( Lindås et al., 2008 ) became functional, were the FECA ( Fig. 3 ). The absence of an active bacterial cell division system in the spheroids ( Kuwabara and Igarashi, 2017 ) should have eased the recruitment of an archaeal cell division system. Even if massive HGT failed to transfer all archaeal informational genes, some bacterial genes could have substituted the missing archaeal ones. Such FECA members would have obtained the archaeal genes through subsequent HGT, which eventually replaced the bacterial genes. The selective force of replacement was the faster growth provided by the archaeal genes, which matched with the archaeal cell division system.
Archaeal gene donors are likely to have been the deepsea members of Asgard archaea ( Zaremba-Niedzwiedzka et al., 2017 ), consistent with one of the habitats of Thermotogales , deepsea hydrothermal vents, but their pelagic members ( Zaremba-Niedzwiedzka et al., 2017 ) could also have been involved in the HGT; pelagic SAR11 clade a -proteobacteria (relating to Candidatus Pelagibacter ubique) were observed in seawater near the habitat of T. globiformans 3 ( Kato et al., 2009 ). Since Thermotogales bacteria are suggested to have obtained as many as 80% of their protein-coding genes by HGT , it cannot be excluded that some genes of other bacteria were transferred before or upon the massive HGT, namely before acquisition of mitochondria, supporting the late-mitochondria acquisition ( Koumandou et al., 2013 ). The origins of bacterial genes appear varied, as evaluated from the protein-coding genes of S. cerevisiae and C. merolae ( Koonin, 2010 ), although these data would also reflect the HGT after the FECA formation.

Fragment joining suggests the mechanisms of eukaryotic evolution
The fragment joining followed by the reorganization could have enabled various chromosome compositions of FECA members, even from the spheroids having an identical gene repertoire ( Fig. 3 ). These members are thought to have evolved into supergroups. With partial reorganization of random joined products, existing genes would be disrupted, while novel genes could be produced by chance, which should have been the genomic innovation process of eukaryogenesis. Genes not producing indispensable proteins are thought to have had a chance to be extinct during the reductive evolution from LECA ( Fritz-Laylin et al., 2010 ;Koonin, 2010 ). Some gene-disrupting inserts became introns, when their corresponding parts in the primary transcripts happened to be spliced for end modification ( Livingstone Bell, 2001 ), forming mRNA. All eukaryotes, including N. gruberi ( Fritz-Laylin et al., 2010 ), have introns. Thus, it would not be surprising if introns were produced in the FECA. Introns are known to be involved in exon shuffling ( Gilbert, 1978 ), a natural process of creating new combinations of exons by intronic recombination ( Kolkman and Stemmer, 2001 ;Long et al., 2003 ;Patthy, 1999 ). One of the mechanisms of exon shuffling, illegitimate recombination ( van Rijk and Bloemendal, 2003 ), is dependent on the NHEJ pathway for dsDNAbreak repair ( Kegel et al., 2006 ). Exon shuffling can produce new genes and is, thus, considered to be an evolutionary driving force in eukaryotes ( Long et al., 2003 ;Patthy, 1999 ). Therefore, the presumed fragment joining in Thermotogales ancestor spheroids, which is thought to have generated supergroups, novel genes, and introns, likely provided basic mechanisms for eukaryotic evolution.

Establishment of LECA
Although our scenario covers up to the establishment of FECA, an example of LECA formation compatible with the FECA should be described. The members of the FECA community are thought to have acquired mitochondria by phagocytosis, supported by archaeal actin-related protein homologs ( Yutin et al., 2009 ;Zaremba-Niedzwiedzka et al., 2017 ), and became the LECA. The FECA should have conserved energy for phagocytosis under anaerobic conditions, which was exerted under partially aerobic conditions, where they encountered the proto-symbiont α-proteobacteria. The geological site for the mitochondria acquisition may have been a partially aerobic hydrothermal vent area. Each member of the FECA community should have captured a distinct α-proteobacterium, implying that the mitochondrial origin was not unique among the members, although the selection after capture could have caused it to appear so. Members not acquiring mitochondria may not have conserved the energy for proliferation under aerobic conditions and, thus, are thought to have been extinct. However, such members could have survived under anaerobic conditions, similar to an amitochondriate animal symbiont, Monocercomonoides sp. PA203, which entirely lacks mitochondrial proteins and is thought to have lost the mitochondria that were previously obtained ( Karnkowska et al., 2016 ).

Experimental evaluation of the Thermotogales origin scenario is possible
The scenario can be experimentally evaluated by using T. globiformans spheroids. Whether or not the spheroids become capable of cell division, the most critical step of the scenario, would be to study the introduction of archaeal cell division genes. Nevertheless, there is a concern that the CRISPR-related immunities ( Horvath and Barrangou, 2010 ) could prevent the acquisition of foreign genes. Thus, prior elimination of these activities may be necessary. The genome structure of large dish-shaped cells can be studied by single-cell sequencing ( Eberwine et al., 2013 ). Doing so would suggest the mechanisms of large genome formation, progeny production, and prevention of cell division and PG accumulation. Fragment joining in progeny chromosome formation can be evaluated through purification of immature progeny. Metagenomic and proteomic analyses would reveal DNA sequences in immature progeny and the supposed DNA-joining enzymes. The eukaryotic signal cleavage sites in the Thermotogales periplasmic proteins can be studied by N -terminal sequencing and the signal cleave of the in vitro translated proteins.

Conclusions
A eukaryogenesis scenario is proposed that Thermotogales ancestor spheroids became capable of division owing to the horizontally transferred archaeal cell division genes, becoming the FECA. The genetic vehicles involved in the HGT were virus-related particles with large DNA and broad host ranges. The archaeal genes necessary for eukaryogenesis, including those for cell division and phagocytosis, were transferred practically at a time.
Spheroids of extant Thermotogales can show some eukaryotespecific features, such as large nucleoids, which are formed by integration of multiple genome copies, and an endomembrane, which is the CM of immature progeny produced in periplasm, in addition to the G3P-type membrane lipids and the endoplasmic reticulum-type signal cleavage sites in periplasmic proteins. Thus, Thermotogales could be regarded as a proto-eukaryote. The former two features are acquired through Thermotogales -specific IOM and subsequent fusion of multiple cells, which are brought by retrogressing alterations of PG, the landmark of bacteria. Therefore, the Thermotogales cell structures may be related to eukaryogenesis. We hypothesize that the progeny chromosome is formed in immature progeny by random joining small DNAs excised from the MILG, followed by reorganization of the products by mechanisms including homologous recombination enabled with MILG.
In Thermotogales ancestor spheroids, the presumed absence of CRISPR-related immunity enabled the MILG to contain archaeal and bacterial DNAs horizontally transferred via virus-related particles. Fusion of immature progeny containing small DNAs from the MILG formed endomembranes including the nuclear membrane and random ligated DNAs. The spheroids, in which the archaeal cell division genes became functional through reorganization of random ligated DNAs by MILG-enabled homologous recombination, were the FECA. It means that the large genome, considered a eukaryote-specific feature, was not a consequence, but a prerequisite of eukaryogenesis. The random joining of DNAs likely provided the basic mechanisms for eukaryotic evolution, generating various FECA members leading to supergroups, novel genes, and introns that are involved in exon shuffling. Each FECA member phagocytosed a distinct α-proteobacterium as mitochondria, using the energy conserved under anaerobic conditions, which formed the LECA.

Author contributions
T.K. contributed to the sequence analyses and manuscript writing; K.I. contributed to sequence analyses and experiments.
Correspondence and requests for materials should be addressed to kuwabara@biol.tsukuba.ac.jp

Declaration of Competing Interest
There are no financial, general, or institutional competing interests.