Emergence of a “Cyclosome” in a Primitive Network Capable of Building “Infinite” Proteins

We argue for the existence of an RNA sequence, called the AL (for ALpha) sequence, which may have played a role at the origin of life; this role entailed the AL sequence helping generate the first peptide assemblies via a primitive network. These peptide assemblies included “infinite” proteins. The AL sequence was constructed on an economy principle as the smallest RNA ring having one representative of each codon’s synonymy class and capable of adopting a non-functional but nevertheless evolutionarily stable hairpin form that resisted denaturation due to environmental changes in pH, hydration, temperature, etc. Long subsequences from the AL ring resemble sequences from tRNAs and 5S rRNAs of numerous species like the proteobacterium, Rhodobacter sphaeroides. Pentameric subsequences from the AL are present more frequently than expected in current genomes, in particular, in genes encoding some of the proteins associated with ribosomes like tRNA synthetases. Such relics may help explain the existence of universal sequences like exon/intron frontier regions, Shine-Dalgarno sequence (present in bacterial and archaeal mRNAs), CRISPR and mitochondrial loop sequences.


Introduction
After the first observations of a ribosome sixty years ago by G.E. Palade and P. Siekevitz [1], theoreticians proposed the first models about the origin of life using tools from statistical mechanics: In 1967, S. Ulam simulated large automata networks and remarked that, with simple growth rules, he obtained complicated patterns similar to those observed in biology [2]. In 1968, inspired by these results, J. Conway started to do similar simulations of cellular automata, in particular a new one called the "Game of Life" by M. Gardner in 1970, because it showed discrete numerical structures moving on a plane and being duplicated [3]. Independently, in parallel, a French team composed of two biologists and one physician-mathematician (J. Besson, P. Gavaudan, and M.P. Schützenberger) worked on the optimality of the genetic code and the existence of primitive RNAs [4] similar to the invariant parts of tRNA loops (see supplementary material in [5]). Unfortunately, Conway's algorithm

A Primitive Network at the Origin of Life
In our hypothesis, amino acids were concentrated around the AL, which acted as a "proto-nucleus" to allow the first "organ" or "cyclosome" to synthesize peptides. Insofar as an object corresponds to a discontinuity in a field of connectivity [26], the boundary of this cyclosome corresponded to a discontinuity in the gradient of peptides around the AL.
The boundary of the first functional "machine" able to build peptides can be defined as a peptide gradient boundary centred on the "proto-nucleus" AL, resulting from an amino acid confinement around the AL favoring the occurrence of peptide bonds. This "organ" functioned as a "cyclosome" in a "proto-membrane", thus as a "proto-cell" with a circular organization. This proto-cell is a solution to the problem of how to obtain autopoiesis: Peptide synthesis favored by the AL was necessary to repair the proto-cell membrane made of hydrophobic peptides and lipids, which reciprocally protected the AL against denaturation by ensuring the integrity of the proto-nucleus. The autopoiesis network Life 2019, 9, 51 3 of 23 underlying this organization has been studied in [27][28][29] and exhibits exponential growth if the peptide proto-membrane allows the entry of nucleic acids for AL replication. We can represent its dynamics by defining the variables of the network and their interactions using a system of differential equations (1) whose Jacobian graph is given in Figure 1: Let us denote by R, A, B, E, M, and P for, respectively, the concentration of AL Ring, Amino acids, nucleotide Bases, hydrophilic Enzyme peptides, hydrophobic Membrane peptides, and the Pool of lipids plus the elements C, N, and H 2 O: In the absence of diffusion (diffusion coefficients d i 's equal to 0), the differential system (1) has an initial exponential growth behaviour and tends, if P is constant and initial values of variables are not zero (which corresponds to an unstable steady state), towards the unique stable stationary state: (R*,A*,B*,E*,M*) = (K/k R , K'k R /K(k A + k' E ), K/k B , K'k' E /k E (k A + k' E ), K'k A /k M (k A + k' E )) (2) with the following Jacobian matrix J* equal to: Kk A /k R 0 0 −k M whose characteristic polynomial (k R + λ)(K(k A + k' E )/k E + λ)(k B + λ)(k E + λ)(k M + λ) = 0 has only negative eigenvalues, ensuring the stability of the stationary state (2) (R*,A*,B*,E*,M*). If we add a diffusion term for the different metabolites, this dynamics leads to the spatial segregation of R, A, B, M, and P into structures like a protein-synthesizing machine made of the AL or the anti-sense AL (R), proto-cytoplasm (A, B), proto-membrane (M), and building blocks (P). The AL serves as a template for the formation of hydrophilic enzymatic peptides E ( Figure 1) able to activate (or inhibit depending on their catalytic properties) the AL or anti-sense AL replication with a reaction constant k r [30]. If we introduce diffusion processes (whose viscosity coefficients depend on the membrane concentration M) in the purely reaction differential system, we get the diffusion-reaction equations (1), for which a close discrete analogue has been already simulated in [31,32], and which shows a progressive space segmentation by the M gradient. During its exponential growth and diffusion, the boundary of the system (1) is chosen as the gradient boundary of peptides polymerized from amino acids. Growth stops in the case of a lack of nucleic acid or protein precursors, i.e., because of the disappearance of the elements of the C, N, and H 2 O pool (which provides the amino acids and nucleotides that are consumed during the growth).

Construction of the AL RNA Ring
We have shown by using constraint programming and a step-by-step computation [33,34] that only 25 RNA rings satisfy the following constraints: • All dinucleotides should appear at least once (apart from CG because of CG suppression). • Among rings satisfying the constraint "to be as short as possible and contain at least one codon of each amino acid synonymy class", there is no solution for a length below 22 nucleotides. For length 22, 29,520 solutions contain the codon AUN twice, N being G for 52% of the solutions.

•
From the 29,520 solutions, only 25 rings allow the formation of a hairpin at least 9-bases long. Then, we remark that AL appears by merging the following sequences of the genome of Rhodobacter sphaeroides (Figure 2): AATGGTACTTCCATTCGATATG from the Gly-tRNA TCC loops, AATGGTACTGCGTCTCAAGACG from 5S rRNA [35]. Among rings satisfying the constraint "to be as short as possible and contain at least one codon of each amino acid synonymy class", there is no solution for a length below 22 nucleotides. For length 22, 29,520 solutions contain the codon AUN twice, N being G for 52% of the solutions.

•
From the 29,520 solutions, only 25 rings allow the formation of a hairpin at least 9-bases long.

•
Of these 25 rings, 19 have both start and stop codons. • Through calculation of the average genetic distances to the others (e.g., circular Hamming distance, permutation distance, and edit distance), one singular ring exhibits a minimum distance as compared to the others. Only one sequence, called AL (for ALpha) is thus acting as the barycenter of the set of the 18 others: 5′-AUGGUACUGCCAUUCAAGAUGA-3′.
Then, we remark that AL appears by merging the following sequences of the genome of Rhodobacter sphaeroides ( Figure 2): AATGGTACTTCCATTCGATATG from the Gly-tRNA TCC loops, AATGGTACTGCGTCTCAAGACG from 5S rRNA [35].  [28] with only activation arrows except the dashed arrow, which can represent either an activation or an inhibition. P (in red) represents the Pool of the elements C, Nand H2O, E (in brown) hydrophilic Enzyme peptides, R AL Ring, (A) Amino acids, (B) nucleotide Bases, and M hydrophobic Membrane peptides. Among rings satisfying the constraint "to be as short as possible and contain at least one codon of each amino acid synonymy class", there is no solution for a length below 22 nucleotides. For length 22, 29,520 solutions contain the codon AUN twice, N being G for 52% of the solutions.

•
From the 29,520 solutions, only 25 rings allow the formation of a hairpin at least 9-bases long.

•
Of these 25 rings, 19 have both start and stop codons.

•
Through calculation of the average genetic distances to the others (e.g., circular Hamming distance, permutation distance, and edit distance), one singular ring exhibits a minimum distance as compared to the others. Only one sequence, called AL (for ALpha) is thus acting as the barycenter of the set of the 18 others: 5′-AUGGUACUGCCAUUCAAGAUGA-3′.
Then, we remark that AL appears by merging the following sequences of the genome of Rhodobacter sphaeroides ( Figure 2): AATGGTACTTCCATTCGATATG from the Gly-tRNA TCC loops, AATGGTACTGCGTCTCAAGACG from 5S rRNA [35].   [35]); (C) Optimal hairpin form for AL (from Kinefold [36]). It is possible to design, by using the Kinefold ® algorithm [36], the most thermodynamically stable hairpin (Gibbs free energy equal to ∆G = −9.5 kcal/mol in Figure 2) among the 22 RNA chains obtained from the circular permutations of AL ( Figure 2C). This structure could explain why, during denaturation, there is first a loss of the AL-hexamer CUGCCA (anticodon loop of current Gly-tRNA GCC s) and then a break between AL-heptamers UUCAAGA (the T Ψ -loop of current tRNAs) and AAUGGUA (the D-loop of current tRNAs). An argument in favor of this scenario is the distribution of the pentamer frequencies inside the current genome (from Rfam database, http://rfam.xfam.org/), which shows the two highest survival probabilities for the AL-pentamers coming from the most stable part of AL, also parts of the D-loop and T Ψ -loop of the present tRNAS, i.e., AAUGG, AUGGU, UGGUA, GGUAC, TTCAA, TCAAG, and CAAGA. If we consider other subsequences of AL, we find many repeated motifs, such as AATGG [37] and GATG [38] from human microsatellites, AGAT from vertebrate repeated UTR motifs [39], and CCATTCA from the Alpha Satellite of Human Chromosome 17 [40] and from the HMG box (High Mobility Group Box, a protein domain involved in DNA binding [41]), as well as the optimal codons that determine mRNA stability in the yeast genome [42].
We can generalize the result obtained from R. sphaeroides to other archaeal, bacterial, and eukaryotic species as shown in Table 1. The genetic code consists of 64 triplets made of 3 letters representing purine bases-A for Adenine and G for Guanine-and pyrimidine ones-U for Uracil and C for Cytosine-that can be grouped into 21 synonymy classes.   Each class contains between 1 and 6 triplets; 20 classes correspond to the 20 amino acids (except for one class containing only 1 triplet, which corresponds either to the amino acid Methionine or, if this triplet initiates a sequence of messenger RNA (mRNA), to a "start" punctuation symbol), plus one class corresponding to the "end" punctuation symbol terminating the mRNA sequences. It has been shown that stereochemical bonds can favor a non-permanent, reversible link between amino acids (AA) and codons or anticodons of their AA synonymy class [43][44][45][46][47]. The 25 selected rings satisfy two opposite constraints corresponding to a min-max problem: (i) to be as short as possible, and (ii) to contain one and only one triplet corresponding to each amino acid synonymy class. The latter constraint would allow the rings to serve as a "matrimonial agency" concentrating amino acids in the vicinity of the ring and thereby favoring the links between any pair of them via peptide bonds [48][49][50][51][52][53][54][55]. The 25 RNA rings selected can be considered as ancestors of the tRNA of the 22 AAs including Pyrrolysine and Selenocysteine (Figure 3), with Serine counted twice, and Tyrosine and Aspartic Acid able to replace C by U in their tRNA anticodons [56,57].
The 12 rings in red in Figure 2 could correspond to an intermediary genetic code using the wobble mechanism present in Archaea [58][59][60] and many other organisms [61,62]. The AL ring (resp. AL' anti-ring) selects and confines more L-aminoacids (resp. D-aminoacids) and catalyses the synthesis of either hydrophobic or hydrophilic peptides [63][64][65]. We can note that in [9] peptide synthesis was achieved experimentally by using as RNA template a heptameric subsequences of AL, AAUGGU.

Nucleo-Nucleic and Nucleo-Peptidic Mechanisms
Different intracellular mechanisms involving RNA, DNA, and proteins conserve as relics subsequences of AL, in particular from its short hairpin ATTCAAGATGAAT.

Nucleo-Nucleic and Nucleo-Peptidic Mechanisms
Different intracellular mechanisms involving RNA, DNA, and proteins conserve as relics subsequences of AL, in particular from its short hairpin ATTCAAGATGAAT.

tRNA Loops
tRNA loops (D-loop, anti-codon loop, T Ψ -loop, and articulation loop) form a sequence that has many similarities to AL. For example, loops of mitochondrial GlytRNA GCC of Lupine [46] fit AL almost perfectly ( Figure 4) and this tRNA exists in 242 species in the NCBI Nucleotide database [59].
"cyclosome". Bottom Middle: A hairpin form of AL. Bottom Right: The most stable hairpin form of the Archetypal Bound AB proposed as a variant of the cyclosome AL.

Nucleo-Nucleic and Nucleo-Peptidic Mechanisms
Different intracellular mechanisms involving RNA, DNA, and proteins conserve as relics subsequences of AL, in particular from its short hairpin ATTCAAGATGAAT.

tRNA Loops
tRNA loops (D-loop, anti-codon loop, TΨ-loop, and articulation loop) form a sequence that has many similarities to AL. For example, loops of mitochondrial GlytRNA GCC of Lupine [46] fit AL almost perfectly ( Figure 4) and this tRNA exists in 242 species in the NCBI Nucleotide database [59].  [66], whose loops (articulation, D-, anti-codon, and TΨ-loops) fit AL almost perfectly with the sequence formed by its loops (in red). Figure 4. GlytRNA GCC of Lupine [66], whose loops (articulation, D-, anti-codon, and T Ψ -loops) fit AL almost perfectly with the sequence formed by its loops (in red).
In the tRNADB-CE database, a high percentage of tRNAs have loops that fit the AL, with TGGTA in D-loop and TTCNA in T Ψ -loop among tRNAs with NTGCCAN as the anticodon loop (Table 2).

Giant Viruses
The hypothesis that de novo template-free RNAs appear spontaneously-as at the origin of life-and invade modern genomes (in particular those related to the giant viruses) is based on their resemblance to the 25 putative ancestors of the present tRNAs (cf. Figure 3 and [67,68]). Moreover, the AL-pentamers are often observed in the sequences of the giant viruses. To quantify the frequency of the AL-pentamers, we define an AL-proximity frequency for a given genome as the percentage of occurrence in this genome of the 9 most frequent pentamers from the AL ( Table 2): If this genome contains 1,000,000 nucleotides, the percentage of such occurrences supposed to be random equals 0.88 ± 0.016* (* for the 90%-confidence interval). From calculations using the NCBI nucleotide database [67][68][69]

Circular RNAs
The 3801 human circular RNAs from circBase [70] observed after the first discovery of circular RNAs in many organisms [71,72] contain 36228352 possible pentamers; the number of AL-pentamers from a branch of its hairpin form are given in Table 3, which significantly exceeds the number obtained at random. Table 3. The most frequent pentamers in 3801 human circular RNAs from circBase. The observed numbers can be compared to the number of pentamers obtained at random, 35,379 ± 310* (* for the 90%-confidence interval).

AL-Pentamer
Observed Number

Ribozymes
An RNA catalytic domain has been found within the sequence of the 359 base long negative-strand satellite RNA of tobacco ringspot virus [73]. The catalytic domain contains 2 minimal sequences of satellite RNA, a 14-base substrate RNA, and a 50-base catalytic RNA containing 2 AL-pentamers: 5 -AAACAGAGAAGUCAACCAGAGAAACACACGUUGUGGUAUAUUACCUGGUA-3 A minimal RNA hairpin ribozyme discovered 18 years later [74] shows an interesting catalytic activity due to its chain D with 3 AL-pentamers present in its 19 bases: 5'-UCGUGGUACAUUACCUGCC-3'. The AL-tetramer UGGU is generally not cleavable by ribozymes [75], this empirical fact explaining its survival in present ribozymes. AL-pentamers can also be found in the D Chain of many other hairpin ribozymes [76][77][78][79][80][81][82][83], used to build simple RNA systems, consisting of two ribozymes with concerted activity allowing replication [84].

Intron-Exon Frontier
The heptamers GGTAAGT and TTCA(G)GA present in AL ring are observed frequently at the frontiers of, respectively, exon/intron and intron/exon in genome of many organisms ( Figure 5) [85].

Synthetases
Using the AL-proximity calculated from the 9 most frequent pentamers from the AL ring (Table 3), glycyl-tRNA synthetases from [69] have a value more than the 95%-confidence upper threshold, which is equal to 0.88 + 0.49 = 1.37* (calculated for a sequence of size 1000). Table 4 and Figure 6 show the values of the AL-proximity (for the 9 most frequent AL pentamers) for the tRNA synthetases of the different microorganisms studied in [86][87][88][89][90][91][92][93], especially Bacteria, Archaea, and one Fungus. We observe that Archaea have the maximal values of this proximity and by comparing the sequences of these synthetases [93], we see that the clustering tree based on the sequence resemblance (for the Hamming distance) described as narrow are synthetases having similar values of their AL-proximity (except the pair Haloferax larsenii / Helicobacter pylori).

Synthetases
Using the AL-proximity calculated from the 9 most frequent pentamers from the AL ring (Table  3), glycyl-tRNA synthetases from [69] have a value more than the 95%-confidence upper threshold, which is equal to 0.88 + 0.49 = 1.37* (calculated for a sequence of size 1000). Table 4 and Figure 6 show the values of the AL-proximity (for the 9 most frequent AL pentamers) for the tRNA synthetases of the different microorganisms studied in [86][87][88][89][90][91][92][93], especially Bacteria, Archaea, and one Fungus. We observe that Archaea have the maximal values of this proximity and by comparing the sequences of these synthetases [93], we see that the clustering tree based on the sequence resemblance (for the Hamming distance) described as narrow are synthetases having similar values of their AL-proximity (except the pair Haloferax larsenii / Helicobacter pylori).   The Table 5 gives the values of the AL-proximity for different tRNA synthetases (called also tRNA ligases) and ribosomal or transfer RNAs, showing that the 40S ribosomal RNAs; tRNA synthetases; and 60S, 18S, and 16S ribosomal RNAs have, in this order, decreasing proximities to the AL. The high values of the AL-proximity for the synthetases are consistent with a very early role in a protein-synthesizing machine, by increasing the efficacy of amino acid binding to an RNA-oligopeptide complex such as the AL ring coupled with ligases.

Defence Mechanisms
The CRISPR-CAS system provides bacteria like Streptococcus agalactiae with adaptive immunity and the AL-pentamers ATGGT and ATTCA, and AL-hexamers AATGGT and TCAAGAT (corresponding respectively to the D-loop and Tψ−loop of many t RNAs) are often found at many levels of the system (CAS proteins, Casposon TIR and CRISP repeats [94]), e.g., typical repeat sequences for CRISPR1 and CRISPR3 [95] contain AL-heptamers shared by AL and tRNA loops: GTTTTTGTACTCTCAAGATTTAAGTAACTGTACAAC (CRISPR1) GTTTTAGAGCTGTGTTGTTTCGAATGGTTCCAAAAC (CRISPR3), as well as the sequences of TIR and CRISPR compared in [96,97], a consensus sequence from central part of the murine RSS VκL8, Jß2.6, and Jß2.2 [98][99][100], and human RSS spacer common for Vh, V328h2, and V328 [101][102][103] described in Table 6 and also the protein H354 of Mimivirus kasaii [104] (see Supplementary Material 3 for calculations of AL-proximity of CRISPR-CAS proteins). Table 6. Sequences of elements involved in defence mechanisms compared to AL.
In Mimivirus, a mechanism similar to CRISPR has been discovered [104], which involves two exonucleases R 350 and R 354 having respectively 3. Moreover, the sequences of the genes of these exonucleases contain numerous heptamers like: 5 -GATGATGAAGATGATGATGAAGAT-3 (MIMIVIRE gene H354).

Mitochondrial D-loop
In [105], the 2D-structure of the mitochondrial D-loop (7S mtDNA) is given with its central AL-octamer and hexamer TACTGCCAGTCAACATGAAT and in false colour the frequency of its bases (Figure 7). This D-loop is conserved among different species and contains putative mitochondrial micro-RNAs, called mito 2 miRs in [106].

Cytidine Deaminases
The AID/APOBEC protein family comprises cytidine deaminases capable of deaminating cytosine to uracil in the context of a single-stranded polynucleotide [107], met primitively in yeast, and after in fishes, birds, amphibians, and mammals: They play a role of RNA-editing enzymes, contributing to the co-evolution of viruses and their antibodies [108], then perhaps initially to the co-evolution of first RNAs. Among 50 members of this family given in [107] (see Supplementary Material 5), 96% of them have values of the AL-proximity over the 95%-confidence expected upper threshold of 2.3*, their mean value being equal to 3.36.

Cytidine Deaminases
The AID/APOBEC protein family comprises cytidine deaminases capable of deaminating cytosine to uracil in the context of a single-stranded polynucleotide [107], met primitively in yeast, and after in fishes, birds, amphibians, and mammals: They play a role of RNA-editing enzymes, contributing to the co-evolution of viruses and their antibodies [108], then perhaps initially to the co-evolution of first RNAs. Among 50 members of this family given in [107] (see Supplementary Material 5), 96% of them have values of the AL-proximity over the 95%-confidence expected upper threshold of 2.3*, their mean value being equal to 3.36.

Origins of the AL Ring
The sequence of the AL ring was obtained by trying to satisfy the constraints of both being as short as possible and being long enough to encode all the amino acids. A justification for the former constraint can be found in the 'lipid world'. Even membranes composed of a single species of molecule can have domains in gel and fluid phases whilst membranes composed of different molecules contain many, predominantly small, domains [109]. We have proposed that such interfaces catalysed the polymerisation of both RNAs and amino acids [110]. In this scenario, in which there would have been many very small domains with closed loop interfaces, there would have been a correspondingly greater production of small RNA rings (Figure 8).

The AL-Pentamer Proximity as a Marker of Age of the Genome
Class II of aminoacyl-tRNA synthetases constitutes a set of very ancient multi domain proteins [25,93]. By calculating their AL-proximity, we see that their genes are closer to AL than the genes of the class I (Figure 9 Top). This is available for the 20 synthetases in human, an archaeum (Methanobacterium lacus), a proteobacterium (Rickettsia prowazekii) close to mitochondria, and an extremophilic bacterium (Deinococcus radiodurans, see Supplementary Material 6). U Figure 8. Combination of lipid interfaces mechanism and the functioning of AL as a protein-synthesizing machine without the whole ribosomal machinery.
A justification for the second constraint can be found in the hypothesis that interaction between amino acids and nucleotides stabilised both species thereby leading to their accumulation in the abiotic flux of molecular creation and destruction, as previously proposed [27,28]. In this case, there would have been a strong selection for RNAs to have compositions that would have resulted in the binding of the maximum proportion of the amino acids present in the prebiotic ecology. Hence, if proteins had been synthesised endlessly they would have remained dynamically attached to the selected RNAs and have protected it.

The AL-Pentamer Proximity as a Marker of Age of the Genome
Class II of aminoacyl-tRNA synthetases constitutes a set of very ancient multi domain proteins [25,93]. By calculating their AL-proximity, we see that their genes are closer to AL than the genes of the class I (Figure 9 Top). This is available for the 20 synthetases in human, an archaeum (Methanobacterium lacus), a proteobacterium (Rickettsia prowazekii) close to mitochondria, and an extremophilic bacterium (Deinococcus radiodurans, see Supplementary Material 6).    [111]) indicate the AL-proximity (calculated for the 9 more frequent AL-pentamers of Table 3) of the Giant viruses' genomes.
The lowest proximity is observed for Deinococcus radiodurans, which is capable of genetic transformation by homologous recombination. We observe the same phenomenon for the Pandora viruses (Figure 10 Bottom), which are able to create neogenes and which are considered as recently evolved additions to the large family of giant viruses [111]. These observations as well as the order observed between mean AL-proximities of SASPs (4.49), 5S rRNAs (4.01) and cytidine deaminases (3.36), which respectively protect DNA backbone (SASP), act as mediator between tRNA and ribosome (5S rRNA), and control the cell pyrimidine level (cytidine deaminase) suggest AL-proximity as a marker of genome age, which could constitute a further topic of study.

'Infinite' Proteins
The existence of circular mRNAs makes it possible for ribosomes to translate them without ever encountering a translational stop. This could lead to the synthesis of essentially 'infinite' proteins. We propose that the synthesis of such proteins could have occurred at an early stage of the origins of life scenario if the AL cyclosome were simultaneously mRNA/tRNA/synthetase/rRNA. In support of this, the oldest synthetase genes (type II) of Rickettsia prowazekii are close to AL (Table 4 and Figure 9 Top), which supports the idea that AL functioned as a primitive protein-synthesizing machine acting without the whole ribosomal machinery for catalysing the first peptides ( Figure 10).  A reversible, stereo-binding between AL and amino acids from a Miller-like source could have catalysed peptide bonds to synthesize a protein with a sequence that would have only been partly random since some juxtaposition, alignment, and orientation on the cyclosome would have occurred [110]; the UGA inside the AL would not necessarily have perturbed the machine because neither reading frames nor punctuation codons would have been needed to produce an "infinite" protein in this way. At a later stage of the evolution of the translational machinery, we propose that synthesis of such proteins would have been associated with (1) a relatively weak primitive Shine-Dalgarno RBS sequence GGAGGU which has a weak complementary sequence inside the AL, CUGCCA, and which would have had the advantage of limiting steric problems due to too many ribosomes trying to bind; (2) a relatively long mRNA; (3) a limited codon repertoire; and (4) the tendency of these proteins to form filaments.

tRNA Building
A way to build a tRNA molecule from four AL hairpins could consist in following as suggested by many studies [112][113][114], which propose that the contemporary tRNA was formed by the ligation of four half-sized hairpin-like RNAs. In Figure 11, four partial hairpins from AL ring have been used for

Conclusions
To conclude, a small circular RNA, called AL, has been proposed with a sequence that has the following features: -Its subsequences (namely, pentamers) are observed as relics in many parts of modern genomes, especially in Archaea; -AL relics are often present in t RNA loops, and in mitochondrial D-loops; -An AL-heptamer constitutes the major part of the exon/intron boundary; -A scalar proximity to AL explains the relationships between polymerases and, more generally, between complete genomes in phylogenetic trees of Archaea. This proximity suggests a common origin for these genomes.
Hence, the AL cyclosome could have played the role of an ancient protein-synthesizing machine. This claim is central to the stereochemical hypothesis of the genetic code [115] and to the proposal by A. Katchalsky in 1973 [116]: The existence of catalytic RNAs in clays such as the "montmorillonite" may have facilitated the synthesis of small peptides and long RNAs (as is now done by synthetases, polymerases and replicases), thereby constituting an autocatalytic loop at the origin of life.
Hairpin palindromic structure Hairpin size The existence of a simple RNA structure capable of surviving as a stable hairpin or functioning in a ring form was postulated soon after Katchalsky's hypothesis [46,47,112], and numerous experimental works [117][118][119] now reinforce this stereochemical hypothesis in a field that continues to advance both experimentally and theoretically.
We anticipate six research developments will follow from the hypothesis presented here: -An attempt to take into account the potential evolutional path from the AL ring to the large ribosomal subunits (LSU) extracted from the modular organization of the rRNAs structure [17,94,120]; -A search for more AL relics in modern genomes at critical functional steps of the nuclear transcription/translation processes (notably when they are coupled as in Archaea [121], in which the Archaea tRNA Gly presents the following sequence in its three successive loops: TGGTA CTGCCA TTCAA, that is a 16-mer from AL [122]), mitochondrial energetic or cellular immune receptor machineries); -An attempt to explain the evolution of tRNA secondary structures in relation to the genetic code [123][124][125][126][127][128][129][130]; -An attempt to understand the evolution of immune systems (from CRISPR and TOLL to RAG systems [94][95][96][97]), taking into account the reuse of former AL RNA fragments already present in the "cyclosome"; - The discovery of sequences linked to AL useful for synthetic biology and studies on "minimal cell" and its primitive genome, with original stable structures as those observed in the "cyclosome" (Figure 12); - The identification of genetic networks based on common sequences inherited from AL and appearing in regulatory RNAs like microRNAs or circular RNAs.
Life 2019, 9, x FOR PEER REVIEW

of 24
We anticipate six research developments will follow from the hypothesis presented here: -An attempt to take into account the potential evolutional path from the AL ring to the large ribosomal subunits (LSU) extracted from the modular organization of the rRNAs structure [17,94,120]; -A search for more AL relics in modern genomes at critical functional steps of the nuclear transcription/translation processes (notably when they are coupled as in Archaea [121], in which the Archaea tRNA Gly presents the following sequence in its three successive loops: TGGTA CTGCCA TTCAA, that is a 16-mer from AL [122]), mitochondrial energetic or cellular immune receptor machineries); -An attempt to explain the evolution of tRNA secondary structures in relation to the genetic code [123][124][125][126][127][128][129][130]; -An attempt to understand the evolution of immune systems (from CRISPR and TOLL to RAG systems [94][95][96][97]), taking into account the reuse of former AL RNA fragments already present in the "cyclosome"; - The discovery of sequences linked to AL useful for synthetic biology and studies on "minimal cell" and its primitive genome, with original stable structures as those observed in the "cyclosome" (Figure 12); - The identification of genetic networks based on common sequences inherited from AL and appearing in regulatory RNAs like microRNAs or circular RNAs. Tracqui for many discussions about the molecular structures potentially involved at the origin of life.

Conflicts of Interest:
The authors declare that there is no conflict of interest regarding the publication of this paper.