Genetic Code Optimization for Cotranslational Protein Folding: Codon Directional Asymmetry Correlates with Antiparallel Betasheets, tRNA Synthetase Classes

A new codon property, codon directional asymmetry in nucleotide content (CDA), reveals a biologically meaningful genetic code dimension: palindromic codons (first and last nucleotides identical, codon structure XZX) are symmetric (CDA = 0), codons with structures ZXX/XXZ are 5′/3′ asymmetric (CDA = − 1/1; CDA = − 0.5/0.5 if Z and X are both purines or both pyrimidines, assigning negative/positive (−/+) signs is an arbitrary convention). Negative/positive CDAs associate with (a) Fujimoto's tetrahedral codon stereo-table; (b) tRNA synthetase class I/II (aminoacylate the 2′/3′ hydroxyl group of the tRNA's last ribose, respectively); and (c) high/low antiparallel (not parallel) betasheet conformation parameters. Preliminary results suggest CDA-whole organism associations (body temperature, developmental stability, lifespan). Presumably, CDA impacts spatial kinetics of codon-anticodon interactions, affecting cotranslational protein folding. Some synonymous codons have opposite CDA sign (alanine, leucine, serine, and valine), putatively explaining how synonymous mutations sometimes affect protein function. Correlations between CDA and tRNA synthetase classes are weaker than between CDA and antiparallel betasheet conformation parameters. This effect is stronger for mitochondrial genetic codes, and potentially drives mitochondrial codon-amino acid reassignments. CDA reveals information ruling nucleotide-protein relations embedded in reversed (not reverse-complement) sequences (5′-ZXX-3′/5′-XXZ-3′).

Here we present a previously unknown dimension of the genetic code. Analyses suggest that the genetic code is optimised in relation to this new property. The property reflects differences between nucleotides at first versus second codon positions, as compared to differences between nucleotides at third versus second codon positions. In this context, previous analyses [43] showed that the subtraction of dipole moments of nucleotides at first and second codon positions correlate with hydrophobicities of corresponding amino acids, after accounting for another, previously reported, correlation between codon and amino acid hydrophobicities [44,45]. Here analyses generalise the principle to all codon positions and nucleotide properties.

Codon Directional Asymmetry
The new codon property is derived from comparing two differences in nucleotide contents, the difference between nucleotides at first and second codon positions, and the difference between nucleotides at second and third codon positions. This defines a codon's directional asymmetry in nucleotide content, CDA. CDA reflects semi-quantitatively extents by which a nucleotide at either 5′ or 3′ codon extremity differs from the codon's two remaining nucleotides. Along this principle, palindromic codons with the same nucleotide at 5′ and 3′ extremities (at first and third positions, XZX (including codons with X = Z)) are symmetric, CDA = 0. When the nucleotide at the 5′ extremity belongs to a different nucleotide group (purine/pyrimidine) than the two other positions and the latter are identical (ZXX), CDA = − 1. When the nucleotide at the 3′ extremity differs from other positions (XXZ), CDA = + 1. Signs for 5′-and 3′-dominant CDAs are arbitrary, but necessarily opposite (positive versus negative).

Purines and Pyrimidines
For codons of types ZXX/XXZ, CDA = −0.5/+0.5, when both X and Z are purines, or both pyrimidines. This reflects lesser purine-purine and pyrimidine-pyrimidine structural differences than for purinepyrimidine comparisons. This principle assigns a CDA score also for some codons of type XZW, where all three nucleotides differ, and Z belongs to the same chemical group (purine or pyrimidine) as the nucleotide at either codon extremity. For codons where nucleotides Z and W are both purines/pyrimidines, X is the most different nucleotide (CDA = −0.5), because chemical structural differences between X and Z are greater than between W and Z. According to that rationale, for codons where nucleotides X and Z are both purines (or both pyrimidines), W is the most different nucleotide (CDA = +0.5).

Complementarity Between Nucleotides at Different Codon Positions
For some codons with structure XZW, Z does not belong to the same group (in terms of purines/pyrimidines) as any nucleotide at the other positions. In these cases, an additional rule determines which of the nucleotides among X or W, differs more from the two others. We propose that complementarity between canonical base pairs (C:G and A:T/U) defines that complementary nucleotide pairs are the most different pairs. Hence for codons with structure XZW, CDA = −0.5 and CDA = + 0.5 when X is the canonical complement of Z, and when W is the complementary of Z, respectively. This rule set defines CDA for all 64 codons (Table 1).

A New Dimension of the Genetic Code
The distribution of CDA in Table 1 is symmetric. Therefore, the genetic code table could probably be reordered so as to reveal graphically this symmetry, as done for other symmetry properties of the genetic code [46].
To what extent does CDA represent a dimension of the genetic code that is independent of other dimensions? In this respect, we compare Table 1 with the binary representation of the genetic code [47, therein figure 6], a rather complete 6-bit representation of each codon. It assigns to each codon position two binary values, the first representing the purinepyrimidine divide, the second value represents whether the nucleotide forms two or three hydrogen interactions when in duplex conformation with an inverse-complementary strand. This defines two binary variables for each codon position, hence six binary variables for each codon.
Pearson correlation coefficients r of CDA with any of these six binary codon properties are 'zero', indicating that CDA is independent of each of these properties. Correlations with sums and subtractions between any pairs of these six binary values also yield r = 0. Results are identical if one pairs nucleotides according to keto versus amino nucleotides as previously reported [47,48]. This means that CDA catches a genetic code dimension that differs from classically recognised codon properties.

Tetrahedral Representations and CDA
The genetic code can also be presented as a tetrahedron, with four equal triangular faces each subdivided into 16 equilateral, smaller triangles, representing the 64 codons. Castro-Chavez [49] reviews these representations, and proposes a tetrahedral representation, placing codons so that hydrophobic amino acids are central to each tetrahedral face, named faces A-D. Applying CDA to Castro-Chavez's tetrahedral representation, faces A and D tend to have CDA b 0, and faces C and B CDA N 0. Within each face, in total 19 triangle vertices (over all 4 faces) with CDA b 0 are common with vertices belonging to triangles with CDA N 0. This is very close to the 18 vertices expected if codons were randomly distributed in relation to CDA (P N 0.5, chisquare test), considering that 24 codons have CDA b 0, 16 have CDA = 0, and 24 have CDA N 0. Eleven among 24 vertices common between triangles from different faces of the tetrahedron are for triangles/codons with opposite CDA. This is slightly more than the 6.75 expected by random CDA distribution (P = 0.054, chi-square test). Hence the tetrahedral representation of Castro-Chavez [49] is random in relation to CDA within tetrahedral faces, and probably also between faces.
Fujimoto's tetrahedral codon stereo-table [50] is much more ordered in relation to CDA's distribution among and within tetrahedral faces ( Fig. 1): Faces A-D each have six codons with CDA b 0, six codons with CDA N 0, and four codons with CDA = 0. Within each face, there are exactly two contacts between codons/triangles with opposite CDA. This total of eight contacts between triangles with opposite CDA is significantly less than the expected 18 contacts for randomly distributed CDA within faces of the tetrahedron (P = 0.018, chi-square test). There are no contacts between tetrahedron faces for codons/triangles with opposite CDA (P = 0.0096, chi-square test). Hence Fujimoto's tetrahedral representation is most compatible with the genetic code's symmetries implied by CDA in Table 1.
The specific examples used here illustrate randomness versus CDA, and close to perfect reorganisation of the genetic code in relation to CDA, respectively. Other representations might reorganise the genetic code more optimally in relation to CDA. However, these representations may not relate to interpretable phenomena in the real world.

Codon Directional Asymmetry and Codon Participation in Error Correcting Codes
Genetic codes include a subjacent punctuation code called the natural circular code that enables retrieving the ribosomal translation frame Table 1 The genetic code's 64 codons and their codon directional asymmetry, CDA. Shaded nucleotides indicate the nucleotide at one of the codon's extremities that is the most different from nucleotides at other positions, along rules described in text, and which determines the dominant side of codon directional asymmetry: negative CDA when the first (5′) codon position has the most different nucleotide, and positive CDA when the third (3′) position has the most different nucleotide. Codons assigned to amino acids aminoacylated by class I tRNA synthetases are framed, remaining amino acids are aminoacylated by class II tRNA synthetases. -55]. Mechanisms for coding frame retrieval remain unknown, but are probably associated with circular code motifs conserved in tRNAs and ribosomal RNAs [56][57][58][59]. Codon symmetry is particularly informative in relation to frame retrieval, as codons of type XZX (CDA = 0) have maximal capacity for reading frame retrieval [55,60,61], and have highest occurrences within various types of error-correcting codes [62]. Absolute values of CDA are lower for codons belonging to the natural circular code than for the remaining codons (P = 0.016, two tailed Mann-Whitney test). This principle is confirmed also when comparisons imply only codons belonging to the natural circular code: their absolute CDA increases with codon-specific reading frame retrieval (r = − 0.615, P = 0.002; rs = 0.44, P = 0.026, one tailed tests). Hence processes determining the near-universal natural circular code probably contributed biological functions to CDA.

Codon Directional Asymmetry and tRNA Synthetase Classes
CDA in Table 1 reflects a genetic code symmetry that does not follow the purine-pyrimidine, keto-amino, nor the weak-strong base-pairing patterns. A little known symmetry within the genetic code relates to Rumer's transformation [63][64][65], which replaces systematically all adenine (A) with cytosine (C) and vice versa, and also all guanine (G) with thymine (T) and vice versa. It is one among 23 bijective transformations [60], also called systematic nucleotide exchanges [66,67] or 'swinger' transformations [68][69][70][71]. RNA and DNA sequenced by several different methods and published in GenBank by various groups match these transformations. Hence while a priori, transformations such as Rumer's seem theoretical processes, they reflect biological realities, such as actual nucleotide sequences that were presumably produced by replication or transcription that systematically inserts a specific nucleotide instead of another specific nucleotide. This phenomenon of systematic nucleotide exchanges has similarities with isolated nucleotide misinsertions [60,66,67].
Rumer's transformation also correlates with a notable biological property, tRNA synthetase classes [72] of amino acids assigned to codons. The tRNA synthetases are enzymes that load amino acids to their cognate tRNA. The twenty tRNA synthetases form two groups of equal size, tRNA synthetase classes I and II based on structural homology [73,74]. tRNA synthetases class I covalently link cognates to the 2′ hydroxyl group of the tRNA's last ribose, and class II to its 3′ hydroxyl group [75,76].
The symmetry in the genetic code that correlates with tRNA synthetase classes exchanges nucleotides at the first and third codon positions along rule A↔C + G↔T (Rumer's transformation), and A↔G + C↔T at the second codon position. If instead of applying the nucleotide exchange rule A↔C + G↔T to the third codon position, one applies the exchange rule A↔T + C↔ G, the symmetry between codons whose corresponding tRNA is aminoacylated by tRNA synthetase class I or class II is also recovered [77]. These symmetries by nucleotide exchanges are not mere theoretical considerations. Homologies of some DNA and RNA sequences in GenBank were detected after accounting for systematic nucleotide exchanges for the mitogenome [66][67][68][69][70][71][78][79][80]. In addition, the regular human mitogenome includes numerous repeats that can only be detected when assuming systematic exchanges [81], including palindromes [82].
CDA associates with tRNA synthetase classes. On average, codons assigned to amino acids aminoacylated by tRNA synthetases class I have CDA b 0 (15 among 21 codons (stops excluded), P = 0.039, two tailed sign test). For tRNA synthetases class II, the situation is opposite: most codons have CDA N 0 (17 among 24, CDA = 0, P = 0.032, two-tailed sign test). Sign tests are inadequate to handle codons with CDA = 0, therefore codons with CDA = 0 are excluded from these calculations. Mean CDA for tRNA synthetase classes differ significantly (two-tailed P = 0.002 for each t-test and Mann-Whitney test). These comparisons between means include codons with CDA = 0.
CDAs are averaged for codons assigned to specific amino acids. Mean CDA b 0 for 8 among 10 amino acids for class I; and CDA N 0 for 8 among 10 amino acids for class II (P = 0.006, two-tailed sign test for each tRNA synthetase class). Exceptions are Cys and Leu for class I, and Ala, and Thr for class II. Overall, the sign of mean CDA for codons assigned to an amino acid follows expected patterns (class I, CDA b 0; class II, CDA N 0) for 16 among 20 amino acids/tRNA synthetases (P = 0.00296, one tailed sign test).
Note that stop codons have CDA b 0, predicting tRNA synthetase class I. However, the tRNA synthetase of pyrrolysine, which is inserted at some stop codons, belongs to tRNA synthetase class II [83]. Exceptions might reflect historical constraints on the genetic code's genesis [77].
Hence the rationale defining CDA reveals a symmetry that is close to that of the combination of nucleotide exchanges that reveal the genetic code's symmetry in relation to tRNA synthetase classes. However, the rationale behind CDA is simpler and perhaps more amenable to mechanistic reduction.

Alternative Scores for Codons with CDA = |0.5|
Three different types of codons get CDA = |0.5|, based on different rationales: (a) codons with structures ZXX/XXZ where both X and Z are purines/pyrimidines; (b) codons with structure XZW where Z belongs to the same nucleotide family (purine/pyrimidine) as either X or W; and (c) codons with structure XZW where Z belongs to a different nucleotide family than X and Z. This scoring is somewhat arbitrary, and might not be optimal to reflect biological properties. Keeping signs, we rescore each of these three codon types with values | 0.25 | and |0.75|, resulting in different scoring systems for these three codon groups:  This heuristic approach suggests that associations between tRNA synthetase classes (an ancient property of the translational apparatus) and CDA are robust in relation to CDA's semi-quantitative scoring.

Translation Kinetics
The tRNA synthetase classes differ in the position of aminoacylation of the amino acid on the tRNA's acceptor stem. This probably affects the spatial kinetics of peptide elongation. We suggest that CDA also affects the spatial kinetics of codon-anticodon interactions in the ribosome's translational core (site P [84]; site A [85]). Hence both tRNA synthetase class and CDA would affect cotranslational protein folding, meaning folding during the process of peptide extension by ribosomal translation [86][87][88][89][90][91][92][93][94][95][96][97]. Tentatively, we consider that associations between CDA and tRNA synthetase classes suggest synergistic effects on cotranslational protein folding by each CDA and tRNA synthetase class.
Note that cotranslational protein folding does not occur for all proteins [98]. Cotranslational protein folding frequently increases the yield of proper folds, but is not always an absolute requirement [99][100][101][102][103]; yet decreases misfolding probabilities [104][105][106]. Among others, at least in some cases, cotranslational folding requires complete protein structural subdomains [107,108]. Cotranslational protein folding following the sense of translation (from the N terminal) predicts more accurately protein structures than when proceeding in the opposite sense (from the C terminal) [109,110], indicating that cotranslational protein folding is a reality for most proteins. Nevertheless, cell free protein folding shows that cotranslational folding is not always required [111].
mRNA properties affecting translation speed and ribosomal pausing [112][113][114], also affect protein folding independently of that protein's amino acid sequence. Synonymous codons associate with different types of protein secondary structures [115,116], in particular for clusters of rare codons on mRNAs [117][118][119]. These associations might explain effects of synonymous single nucleotide polymorphisms on protein function [120][121][122][123] and are in line with selection at amino acid level that affects synonymous codon choice [124,125].
Following these rationales, CDA might reflect (a) indirectly tRNA synthetase classes and their effects on amino acid positioning during peptide elongation; and (b) directly the spatial kinetics of codon-anticodon interactions, such as tRNA-mRNA approach angles during codon-anticodon duplex formation in the ribosomal translational core(s). These two components should affect according to the cotranslational protein folding hypothesis folding patterns of elongating peptides. Hence CDA is predicted to correlate with amino acid secondary structure conformational parameters for alpha helices, beta turns and/or betasheets (conformational indices are from [142][143][144][145]). The main candidates are the conformational parameters associated with transmembrane foldings (beta turns, and/or parallel and antiparallel betasheets, from references [146,147]).

Antiparallel Betasheet Formation and Codon Directional Asymmetry
The hypothesis that CDA associates with cotranslational protein folding predicts correlations between CDA and secondary structure conformation parameters. Betasheets are the major secondary structures found in transmembrane proteins, antiparallel betasheets are more frequent than parallel betasheets [147]. Biases in tRNA synthetase amino acid contents correlate with the amino acid's antiparallel betasheet conformation parameter [148]. Hence, we predict correlations between CDA and conformation parameters, and in particular antiparallel betasheet conformation parameters.
Indeed, antiparallel betasheet conformation parameters correlate negatively with mean CDA of codons assigned to the amino acid according to the standard genetic code (Pearson correlation coefficient r = −0.642, two-tailed P = 0.0023; non-parametric Spearman rank correlation coefficient rs = −0.564, two-tailed P = 0.01; Fig. 2). In contrast, and functioning as a negative control, the correlation between mean CDA and parallel betasheet conformation parameters is not statistically significant (r = −0.28, two-tailed P = 0.23, not shown). The presumed effect of CDA is specific for formation of antiparallel, not parallel, betasheets.
The variation around the regression line is similar for negative and positive CDA ranges (Fig. 2). Hence the determinism of CDA on conformation is comparable for 5′ versus 3′ CDA dominance: effects are independent of coding importance of codon positions. In other words, the 'information' in CDA that is relevant to protein secondary structure is similar for asymmetry at first and third codon positions. Alternative scores (Section 4.1) do not change qualitatively the results (P values for rs remain above 0.05).
The correlation between mean CDA of codons assigned to amino acids and these amino acids' antiparallel betasheet conformational indices might be due to transitivity, due to associations between CDA and tRNA synthetase classes (see above section) and the association between tRNA synthetase class and conformational indices. In order to control for effects of tRNA synthetase classes, we calculate mean CDA and mean antiparallel betasheet index separately for each tRNA synthetase class. These means are subtracted from CDA and conformational indices of each amino acid in that respective class. These values are residual CDA and conformational indices after excluding effects of tRNA synthetase classes. Residual CDA and residual antiparallel betasheet indices correlate negatively (r = −0.435, P = 0.0275; rs = −0.461, P = 0.0205, one tailed tests). Hence the correlation between CDA and antiparallel betasheet indices is not indirect, through colinearity with tRNA synthetase classes.
The association between CDA and antiparallel betasheet indices has rs with P b 0.05 for eight among ten alternative scores (as in Section 4.1) after controlling for tRNA synthetase class. The genetic code seems structured so as to enable synergistic effects of CDA and tRNA synthetase classes on antiparallel betasheet formation, presumably by cotranslational protein folding.
Independently of the correlation between CDA and antiparallel betasheet conformation parameters, a weaker correlation exists between CDA and alpha-helix conformation parameters (r = −0.556, P = 0.011; rs = − 0.499, P = 0.05, two-tailed test, not shown). This further correlation confirms that CDA affects protein folding. To our knowledge, these are the first described correlations between a codon property and secondary structure conformational parameters of assigned amino acids. CDA b 0 associates independently with each alpha and antiparallel beta conformational indices, in line with the literature on cotranslational protein folding [136][137][138][139][140]. Hence according to the working hypothesis, similar kinetic conditions favor each of these two very different secondary structures. Presumably, factors other than CDA (for example chain polarity) determine whether an alpha helix rather than an antiparallel betasheet is initiated during peptide elongation.

Codon Directional Asymmetry and Prediction of Protein Secondary Structure
The correlation between CDA and conformation parameters might have two causes. First, it could be intrinsic to the genesis of the genetic code, but relatively inconsequent to modern organisms. Secondly, CDA still affects protein folding. In the latter case, correlations between CDA and secondary structure conformation parameters could explain that some synonymous mutations perturb protein function. Indeed, several amino acids have some synonymous codons with opposite CDA, such as for alanine, leucine, serine and valine. Putatively, this would indicate that for these amino acids, synonymous codons with CDA b 0 occur preferentially for mRNA regions coding for antiparallel betasheets, and those with CDA N 0 in other mRNA regions.
Codon usage frequencies are adapted to minimise effects of mutations and translation errors [149][150][151][152]. Hence weighing mean CDA for a given amino acid according to observed synonymous codon usages might increase correlations between CDA and conformation parameters. However, this is not the case for the pool of genes encoded by the human nucleus, nor those coded by the human mitogenome: correlations become in both cases weaker (not shown).
This does not mean that associations between synonymous codons in modern mRNAs and secondary structures of modern proteins do not exist. However, this suggests that testing these predictions is not as straightforward as it seems. Among others, secondary structure annotations available in GenBank don't indicate whether a betasheet is parallel or antiparallel. Hence these tests will require involvement of more adequately equipped specialised proteomics teams (for example Caudron and Jestin [147]). Until then, the contribution of CDA for improving secondary structure predictions [26,167], especially such based on optimization of multiple approaches [168], will remain speculative.

Mitochondrial Genetic Codes Optimise Codon Directional Asymmetry
Many variant genetic codes are from mitochondria [169]. The reduced mitogenomes almost exclusively encode for mitochondrial transmembrane proteins, which include mainly antiparallel betasheets. In contrast, nuclear genomes encode also for large proportions of cytosolic proteins, which include much fewer betasheets. Hence, we predict that the correlation between CDA and antiparallel betasheet conformation parameters is weaker for genetic codes associated with nucleus-encoded proteomes than for mitochondrial genetic codes. The correlation in Fig. 2 (for the standard genetic code) is calculated for the remaining genetic codes listed by Elzanowski and Ostell [169], after recalculating mean amino acid CDA, considering codon-amino acid reassignments. The correlation's strength for each genetic code is estimated by the Pearson correlation coefficient r.
The correlation between tRNA synthetase classes and CDA is also calculated, by assigning to tRNA synthetase classes I and II values '1' and '2', respectively, and calculating the Pearson correlation coefficients r between this dummy variable representing tRNA synthetase classes and the mean CDA of codons assigned to the corresponding amino acid, for each variant genetic code. The CDA-antiparallel betasheet correlation coefficients are plotted as a function of the CDA-tRNA synthetase class correlation coefficients for the various genetic codes (Fig. 3). The line in Fig. 3 indicates y = x, meaning that both correlations have equal strengths. Note that in context of this particular section, Pearson correlation coefficients are used as quantitative estimates of the strength of a correlation, not as test statistics to infer that a correlation exists.
All eleven mitochondrial genetic codes have stronger correlations between CDA and antiparallel betasheet conformation parameters than between CDA and tRNA synthetase classes. Obtaining this result for all eleven mitochondrial genetic codes has P = 0.00049 (two-tailed sign test). Two additional genetic codes occur in nuclear and mitochondrial genomes: the standard genetic code, and the Mycoplasma/ Spiroplasma genetic code that also occurs in mold, protozoan and coelenterate mitochondria. These two genetic codes follow the pattern observed for the eleven genetic codes only found in mitochondria.
Six among eight genetic codes associated only with nuclear genomes are below the line y = x in Fig. 2, indicating that the CDA-tRNA synthetase class correlation is frequently a greater constraint for nuclear genetic codes than mitochondrial ones. This qualitative difference between nuclear and mitochondrial genetic codes has P = 0.001 (two-tailed Fisher exact test). This divide might reflect different constraints on protein folding for populations of mitochondrion-encoded versus nucleus-encoded proteins. This pattern might indicate stronger synergy between effects of CDA and tRNA synthetase class on cotranslational folding for nucleus-encoded proteins translated in the cytosol than mitogenome-encoded ones.
These results indicate that associations between CDA and conformation parameters, and between CDA and tRNA synthetase classes, drive differentially evolutions of mitochondrial versus nuclear genetic codes. Tentatively, amino acid positioning on the tRNA acceptor stem is less relevant for mitochondrial translation than CDA, the opposite is true for cytosolic translations.

Lepidosaurian Body Temperature and Codon Directional Asymmetry
Temperature reflects noise in molecular movements, potentially affecting contranslational protein folding, which indeed depends on optimal temperatures [186]. Hence, formation of antiparallel betasheets might be impeded by high temperatures. Therefore, we expect that negative CDAs promote betasheet formation despite high temperature. Hence when comparing the mean CDA calculated across all 13 membrane-embedded mitogenome-encoded proteins of different organisms, we expect that organisms with high temperatures have low mean CDA for the same homologous genes. Indeed, the mean CDA of lepidosaurian mitochondrion-encoded proteins decreases with their body temperature (ro = −0.283, one tailed P = 0.018, Fig. 4, temperature data compiled for species with complete mitogenome available in GenBank by Seligmann and Labra [183], therein Table 1).
This correlation is also statistically significant within the family Lacertidae (ro = − 0.842, one-tailed P = 0.001). It is negative for Agamidae (ro = − 0.255, one-tailed P = 0.238), Gekkota (ro = − 0.25, one-tailed P = 0.258), iguanid lizards (ro = − 0.333, one- Fig. 3. Correlation between antiparallel betasheet conformation parameter of amino acids and mean directional asymmetry (CDA) of codons assigned to that amino acid as a function of the correlation between CDA and the tRNA synthetase class for the corresponding amino acid for different genetic codes. Correlations are Pearson correlation coefficients. Filled/open circles are nuclear/mitochondrial genetic codes, shaded circles are for genetic codes existing in nuclei and mitochondria. The line indicates y = x. Nuclear genetic codes tend to optimise the association between CDA and tRNA synthetase classes, mitochondrial genetic codes tend to optimise the association between CDA and the antiparallel betasheet conformation parameter. Most mitogenome-encoded proteins are transmembrane proteins, hence antiparallel betasheets are particularly frequent in these proteins. Hence genetic code evolution optimises the CDA-antiparallel betasheet association in mitochondria. Open circles: mitochondrial genetic codes; filled circles: nuclear genetic codes; shaded circles: genetic codes used in nuclei and mitochondria. tailed P = 0.21), Varanidae (ro = − 1.00, one-tailed P = 0.005) and Chamaeleo (ro = − 0.40, one-tailed P = 0.30) and for the pool of remaining isolated species from various families (Heloderma, Shinisaurus, Lepidophyma, Sphenodon (ro = −0.238, one tailed P = 0.285). The correlation is positive for Amphisbaenia (ro = 0.40, one tailed P = 0.30). Hence seven among eight phylogenetically independent samples yield negative correlations, which is a significant majority according to a sign test (P = 0.0176). Considering the qualitative direction of correlations for phylogenetically independent species samples follows the principle of phylogenetically independent contrasts [187]. This confirms that positive results are not confounded by phylogenetic inertia among species. Results of this sign test are valid independently of P value adjustments for multiple tests.
GC contents could confound this correlation, because G:C base pairs are linked by three hydrogen interactions, while A:T and A:U base pairs by only two hydrogen bridges. Hence GC contents usually increases with temperature, as it confers higher stability to structures formed by nucleotide chains [188,189,190]. However, GC codon content does not correlate with body temperature for mitochondria of the above mentioned lepidosaurian species (r = − 0.0425, one-tailed P = 0.379). This is in line with results from various analyses [191][192][193]; that didn't detect the expected GC-temperature correlation. This negative control stresses that the association in Fig. 4 is not trivial.

Developmental Stability and CDA
Molecular noise (in terms of erratic molecular movements) affecting mitochondrial transmembrane protein folding might cause developmental inaccuracies at the whole organism level. Hence, we explore the correlation between mean CDA of mitogenome-encoded proteins and developmental stability of the 4th toe of Lepidosauria, estimated by the Pearson correlation coefficient r between subdigital lamellae counts on left and right sides (data from [194][195][196][197][198]). Developmental stability/accuracy decreases with mean CDA of mitogenome-encoded proteins (ro = − 0.316, one-tailed P = 0.0235), as expected by the working hypothesis. However, analyzing separately species grouped according to phylogenetic groups (as in previous section) yields negative correlations only in five among eight groups, which is not statistically significant at P b 0.05 according to a one sided sign test. Hence this preliminary result on CDA and developmental stability is at best tentative.

Lifespan and CDA
Patterns between CDA and temperature, and CDA and developmental stability (Figs. 4 and 5) suggest that CDA b 0 for mitogenomeencoded proteins associates with longevity. For this purpose, we compared codon contents in mitogenomes of 112 semi-supercentenarians and 96 centenarians versus those of 97 healthy young controls [199,200] (Table 3). Codons with CDA = − 1 are more frequent in supercentenarians than in controls for seven among eight comparisons, which is a significant majority according to a one tailed sign test (P = 0.0176). No tendencies are observed for other CDA values (− 0.5, 0, 0.5, 1), nor for comparisons between centenarians and controls. The result is suggestive that CDA b 0 could contribute to extreme longevity, but the high number of tests and the small differences in codon frequencies stress cautious interpretation.  Table 1 of Seligmann and Labra [183] Overall, analyses weakly confirm predictions for correlations between CDA and whole organism properties (body temperature, developmental stability, longevity). These suggest that analyses considering additional information, such as residue-specific location in three dimensional protein structures, might yield positive results. More upto-date methods for including phylogenetic information in relation to evolutionary adaptive optima might also alter conclusions [201].
Mean CDA of the 13 human mitogenome-encoded proteins does not correlate with time spent single stranded by that gene during replication, assuming light strand replication initiates at the OL, the light strand replication origin (ro = 0.033, P = 0.92, two tailed test). DNA templating for tRNA genes presumably also functions sometimes as replication origins [184,218,219]. Integrating the possibility of these multiple replication origins yields gene-wise single-strand durations that converge with transcriptional singlestrandedness [220]. The correlation between transcriptional duration of singlestrandedness and mean gene CDA is also not statistically significant (ro = 0.418, P = 0.156, two tailed test). Hence, we do not detect statistically significant effects of mutation pressures on mean CDA of human mitochondrial genes.

Adjusting Statistical Significances for Multiple Tests
Analyses that include several tests have to adjust P values according to the number of tests. This is because, when deciding that a result is positive at P b 0.05, when k tests are performed, on average, k × 0.05 tests are false positives. Bonferroni's correction considers that when performing k tests, results are statistically significant at P = 0.05 for any specific test among k tests if P b 0.05/k. This correction is reputedly overconservative [221,222]. Unadjusted Ps minimise risks of false negative results, Bonferroni's method minimises risks of false positives. The Benjamini-Hochberg adjustment for false discovery rates [223] optimises between these two risks and seems most adequate [224]. This method ranks all k P values from highest to lowest (best), adjusted Ps are the product of P with k divided by the rank i, where i ranges from 1 to k. This means that the 'best' (lowest) P is unchanged, and that the 'worst' (highest) P value after adjustment follows Bonferroni's adjustment. Ps with intermediate rank are intermediate between these extremes.
Here we consider only P values from non-parametric tests, when also parametric tests were done. For some of the associations described, more than one test was done, but these are then summarised by a test that integrates the previous tests. Adjustments consider in these cases only the latter P value. Along this approach a total of 29 hypothesis tests were done, as detailed in Table 2. Control analyses (such as with GC contents, and mutational gradients, in total 29 tests) are also included in the list of multiple tests. These are not related to the main CDA hypothesis and could arguably be excluded. Excluding controls does not alter qualitatively results of the adjustments of P values. The analysis for codon usage associated with lifespan includes 10 tests (for CDA values −1, −0.5, 0, 0.5, and 1, and this for comparisons between controls and centenarians, and between controls and supercentenarians). Among unadjusted P values with P b 0.05, only the adjusted P value for the correlation between mean CDA of codons assigned to amino acids and the amino acids' alpha helix conformational indices is above 0.05. This occurs when considering all 58 tests, and when considering only the 29 tests directly pertaining to the working hypothesis about CDA. Qualitatively, results of P adjustments are robust in relation to numbers of tests included in this analysis: for example, for P with rank 17 to get P N 0.05 after adjustment, one requires k = 89 when including negative controls and k = 33 when excluding negative controls. Hence even if one was to increase numbers of tests included in the analyses, the relevant cutoff property of the distribution of adjusted Ps is relatively robust, so that issues related to multiple tests are unlikely to alter conclusions.

A New Directional Codon Dimension
Intuitively, it seems conceivable that CDA, via its plausible effects on codon-anticodon interactions, affects cotranslational protein folding. However, developing a mechanistic scenario that explains why this effect should occur for antiparallel betasheets rather than parallel ones, or for alpha helices, is more difficult. We propose that some (unspecified) conformations depend on translational speed. Other conformations might be favored by random movements of the tRNA's loaded acceptor stem in relation to the elongating peptide, versus more directed movements of that stem, hence some ratio between kinetic noise and direction. Our educated guess (but nothing beyond that) is that CDA relates more to the latter type of mechanisms. We also lack clues on why CDA b 0 promotes antiparallel betasheets, and CDA N 0 prevents them. Alpha helices might be more simple structures that require less order than antiparallel betasheets. A similar rationale might function for parallel and antiparallel betasheets. In addition, the ratio between parallel and antiparallel betasheets is about 1:7 [26]: the genetic code might be optimised towards 'coding' for the most frequent protein conformation.
The genetic code can be characterised as a hypercomplex mathematical multidimensional symmetry structure [225]. In other terms, the genetic code reminds spontaneously self-organizing structures such as crystals [226,227]. Crystals result from specific rules organizing relations between atoms. Similarly, but at a much higher level of molecular complexity, the genetic code organises relations between nucleic and amino acid sequences. The genetic code might be thought as an imaginary polyhedron with 64 triangular faces (64 codons with three nucleotide positions). The geometrical form of this structure remains unknown, but several symmetries implied by RNA/DNA structure and chemistry are known, such as reverse-complementarity (implied by the double helix structure), and the purine-pyrimidine as well as the alpha-keto groupings of nucleic acids. Formulation of a generalised description of this complex structure is a difficult task. It is simplified by projections of the complex structure on specific scales/planes of probable biological interest.
Here learned intuition detects a new symmetry property, based on codon content directionality. Analyses here can be seen as projecting that complex genetic code structure on the CDA scale, enabling to detect some new properties of the genetic code. The details of the scale of CDA scores as presented here is probably inaccurate and will hopefully be amended. CDA implies that a directional dimension that had not been apprehended links codons and amino acids: biologically meaningful information relating to protein structure is embedded in the comparison between codons and their reversed (not reverse-complemented)  Table 3 Mean codon frequencies (promil) in the 13 mitogenome-encoded genes of three groups of Japanese males: 97 healthy controls, 112 semi-supercentenarians and 96 centenarians from references [199,200]. sequence. This palindrome-minded approach to codons probably reflects error-correcting properties of primitive genetic code(s) [228].

Conclusions
A property of codons, codon directional asymmetry (CDA), is defined for the genetic code. Codons are classified into symmetric (CDA = 0), 5′-and 3′-asymmetric (negative and positive CDA). CDA maps non-randomly on Fujimoto's tetrahedral representation of the genetic code. Symmetric codons are the most common codons in frame-error-correcting codes, such as comma-free and circular codes. Most codons assigned to amino acids aminoacylated to cognate tRNAs by tRNA synthetases class I have CDA b 0, those assigned to cognates of tRNA synthetases class II have usually CDA N 0.
Amino acid tendencies to participate in antiparallel betasheets decrease with CDA. Results suggest that CDA and tRNA synthetase class affect spatial kinetics of peptide elongation. These spatial kinetics affect local peptide elongation rates, which determine cotranslational peptide folding during peptide synthesis. Hence CDA, a property of gene sequences, bears useful information to predict protein folding. Some synonymous codons have CDA with opposite signs, potentially explaining how some synonymous mutations alter protein function.
CDA probably played a role in the evolution of genetic codes. Mitochondrial genetic codes optimise associations between CDA and antiparallel betasheet formation, nuclear genetic codes tend to optimise associations between CDA and tRNA synthetase class. This difference might mean that synergistic effects of CDA and tRNA synthetase class on cotranslational protein folding are stronger for nuclear than mitochondrial genetic codes. CDA affects codon-amino acid (re)assignments, hence plays an important role in genetic code evolution.
Preliminary analyses suggest that average CDA of mitochondrionencoded proteins decreases with body temperature, increases developmental stability and lifespan, but further controlled analyses are required to confirm these potential whole organism effects of codon directional asymmetry (CDA).

Conflicts of Interests
None.