Chimeric Translation for Mitochondrial Peptides: Regular and Expanded Codons

Frameshifting protein translation occasionally results from insertion of amino acids at isolated mono- or dinucleotide-expanded codons by tRNAs with expanded anticodons. Previous analyses of two different types of human mitochondrial MS proteomic data (Fisher and Waters technologies) detect peptides entirely corresponding to expanded codon translation. Here, these proteomic data are reanalyzed searching for peptides consisting of at least eight consecutive amino acids translated according to regular tricodons, and at least eight adjacent consecutive amino acids translated according to expanded codons. Both datasets include chimerically translated peptides (mono- and dinucleotide expansions, 42 and 37, respectively). The regular tricodon-encoded part of some chimeric peptides corresponds to standard human mitochondrial proteins (mono- and dinucleotide expansions, six (AT6, CytB, ND1, 2xND2, ND5) and one (ND1), respectively). Chimeric translation probably increases the diversity of mitogenome-encoded proteins, putatively producing functional proteins. These might result from translation by tRNAs with expanded anticodons, or from regular tricodon translation of RNAs where transcription/posttranscriptional edition systematically deleted mono- or dinucleotides after each trinucleotide. The pairwise matched combination of adjacent peptide parts translated from regular and expanded codons strengthens the hypothesis that translation of stretches of consecutive expanded codons occurs. Results indicate statistical translation producing distributions of alternative proteins. Genetic engineering should account for potential unexpected, unwanted secondary products.


Expanded Codons and Anticodons
Ribosome-free translation seems impossible because codonanticodon duplexes are too weak to enable peptide elongation. A simple potential solution to ancestral ribosome-free translation by codonanticodon interactions assumes that modern codons are reduced. Ancestral ribosome-free translation presumably resulted from interactions between codons and anticodons consisting of more than three nucleotides [1,[32][33][34]. This stabilizes codon-anticodon duplexes long enough to enable ribosome-free peptide elongation (the ribosome also channels the spatial dynamics of aminoacylated tRNA-peptide interactions [35]). In addition, a code based on a specific subset of tetracodons (codons expanded by a fourth, silent nucleotide) would have self-correcting symmetry properties that could be an adequate ancestor of some modern (mitochondrial) genetic codes (the tessera hypothesis [36,37]).

Peptides Coded by Expanded Codons
In addition to tetracoding sequences predicted by alignment methods, peptides corresponding in their entireties to the translation of tetra-and pentacodons have been detected in MS data (produced by the medium accuracy Thermo Fisher (Illkirch, France), and the high accuracy Waters (Milford, MA, USA) technologies) from the human mitochondrial peptidome [62][63][64][65][66][67]. Detections of complete tetracoded peptides differ from occurrences of isolated tetracodons, notably in mitochondrial genomes [68,69], which might result from decoding by specific expanded anticodons or as regular codons after deletion of extra nucleotide(s) by post-transcriptional editing [70].

Alternative Transcriptions
Post-transcriptional editing by systematically deleting every fourth, or every fourth and fifth nucleotide after each transcribed nucleotide triplet could produce noncanonical transcripts whose regular transcription matches noncanonical translation of regular transcripts according to tetra-or pentacodons. Mitochondrial transcripts detected in several independent datasets produced by independent sequencing technologies (Sanger and Illumina) match sequences predicted by systematic deletions after each transcribed nucleotide triplet. These noncanonical transcripts (delRNAs), because of their length corresponding to numerous tricodons separated by deleted mono-or dinucleotides, seem more likely produced by noncanonical transcription systematically deleting mono-or dinucleotides than by posttranscriptional edition, though the latter cannot be excluded [97]. These delRNAs have more than expected homopolymers [98] that frequently induce frameshifting [99,100].
Note that the human mitogenome, assuming systematic deletion of mono-or dinucleotides after each transcribed nucleotide triplet, includes more palindromes than random sequences with the same length and nucleotide content [100,101].
A different type of noncanonical transcription exists, consisting of systematic exchanges between nucleotides, producing swinger RNAs that do not resemble their template DNA unless considering the transformation rule that produced the noncanonical RNA. Nine systematic nucleotide exchanges are symmetric (Xb-NY, example Gb-NT [87,88,90], and fourteen are asymmetric (X-NY-NZ-NX, example C-NT-NG-NC [89]). Empirical genomic coverages by swinger RNAs are replicable across independent datasets [98] and sequencing techniques (Sanger and Illumina, human mitogenomes [63]; 454 and SOLID, Mimivirus [102]). The human mitogenome, assuming systematic nucleotide exchanges, includes more palindromes forming stem-loop hairpins than random sequences with the same length and nucleotide contents [103]. Systematic nucleotide exchanges conserve error-correcting properties of the genetic code and its embedded circular code regulating ribosomal translation frame [104][105][106][107].

Chimeric RNAs and Peptides
Note that swinger DNA has also been reported [108,109], in one case with abrupt switches between the regular and the swingertransformed part of the mitogenome [110]. Chimeric RNAs, partly corresponding to regular transcription of the human mitogenome, and partly corresponding to swinger-transcription of adjacent parts of the mitogenome, also occur, including abrupt switches between these two parts [111]. These either result from regular transcription of genomic swinger DNA, or from swinger transcription of part of the template mitogenome. Chimeric peptides corresponding to translation of adjacent parts of regular and swingertransformed RNA exist in mitochondrial proteomic data, including peptides whose regular part corresponds to mitochondrionencoded proteins, for example CytB [112].
These chimeric molecules are strong evidence for swinger phenomena. First because if they reflected unknown artifacts, these would not have produced the regular parts of the RNA and peptide sequences. Chimeric RNAs and peptides show that unknown phenomena producing variants of known RNAs and proteins exist. In addition, the regular parts of the chimeric RNA/peptide are natural matched positive controls for adjacent noncanonical parts. This strengthens the confidence in the biological reality of these noncanonical phenomena.
In the context of long stretches of translation according to expanded codons, chimeric peptides corresponding in part to regular translation, and in adjacent parts to translation according to expanded codons ( Fig. 1), would also consist strong evidence for translation of stretches of expanded codons, and indicate that variants of known proteins including parts encoded by expanded codons exist. Hence here we present analyses of two mitochondrial MS datasets (one produced by Thermo Fisher and one by Waters technologies) that explore for chimeric peptides resulting from translation according to adjacent stretches of regular and expanded codons.
These analyses presume that translation produces a distribution of alternative protein products, some might present functional advantages, for example by having functional optima at conditions that differ from those of known canonical proteins (for example temperature). The approach implies caution: genetic engineering should account for potential unwanted byproducts resulting from little known and/or unknown alternative transcriptions and translations: unlike engineered genes, natural genes adapted to avoid or minimize disruptive effects by proteins resulting from rarer noncanonical transcriptions and/or translations.

Materials and Methods
The data and analytical methods are identical to those previously used for chimeric swinger peptides [112]. The human mitogenome (NC_012920, length 16,569 basepairs) was downloaded in its entirety.

Predicted Peptides
The difference with previous analyses consists in the fact that analyses compare observed MS/MS data with hypothetical peptides that result in part from canonical, and from noncanonical translations for consecutive, adjacent stretches of amino acids. The design of these hypothetical chimeric peptides uses two sizes of running windows: one for tetracodons (codons expanded by one nucleotide) and one for pentacodons (codons expanded by two nucleotides). Each running window codes for 30 amino acids according to regular tricodons, 30 consecutive amino acids coded according to noncanonical codons expanded by a mono-or dinucleotide, and another consecutive 30 amino acids coded according to regular tricodons. Hence each hypothetical chimeric peptide consists of 30 canonically coded amino acids at each its 5′ and 3′ extremities, and 30 noncanonical codons, each expanded by a monoor dinucleotide. This produces window sizes of (3 × 30) + (4 × 30) + (3 × 30) = 90 + 120 + 90 = 300 nucleotides for tetracodons and 90 + 150 + 90 = 330 nucleotides for pentacodons. Windows move by steps of single nucleotides along the complete genome, producing 2 × 16569 theoretical chimeric tetra-, and 2 × 16569 chimeric pentacoded peptides, for + and − mitogenome strands.
For hypothetical chimeric peptides, the length of 30 consecutive amino acids with the same translation modus was chosen because previous experience shows that most detected peptides are shorter than 30 amino acids. Adopting shorter lengths, for example 15 amino acids, for the different parts of the hypothetical chimeric peptide, would not detect chimeric peptides with canonical and/or noncanonical parts longer than 15 amino acids.
For each hypothetical peptide, the relevant proteome analysis software predicts a theoretical mass spectrometry distribution. This distribution is compared with observed MS/MS data.

Translation of Stop Codons
Peptides translated from sequences including stop codons are included 19 times in the pool of theoretical peptides, each time inserting at all stop codons the same amino acid (there are 20 amino acids, but leucine and isoleucine have equal masses and hence cannot be distinguished by MS/MS techniques, resulting in 19 alternative peptides where a different amino acid species is inserted at all stops). Hence analyses consider the possibility that each amino acid can translate stop codons. Approximately 2 × 16569 × 19 = 629,622 chimeric peptides exist for each tetraand pentacoded chimeric translations.

Medium Accuracy Data Searches
For the MS/MS data from [113], consensus searches between observed and predicted MS/MS data were handled with the Sequest (Thermo Fisher Scientific, Illkirch) algorithm with the following mass tolerances: Parent = 1 Da and Fragment = 0.5 Da (monoisotopic masses). Fixed carbamidomethyl (C) and variable Oxydation (M) modifications were activated, as well as the lysine → pyrrolysine modification, and only one missed trypsin cleavage was allowed. False discovery rate was estimated against a reverse decoy database using the Percolator algorithm. No protein grouping was allowed since the database only contained non-redundant entries. Peptides with false discovery rate q b 0.05 and score Xcorr N1.99 were considered identified. The score Xcorr is a likelihood of match between expected and observed MS/MS data unaffected by peptide length. The false discovery rate q is adapted to populations of detected peptides [114].
Previous analyses show that searches of the trypsinized proteome produced by [113] detect much more peptides when analyzing data separately for peptides cleaved at K, and separately for those cleaved at R, and when searches assume cleavage at the amino extremity of these amino acids rather than the carboxyl extremity at which trypsin cleaves proteic chains. These searches mainly detect peptides ending at the carboxyl extremity of K or R, corresponding to the experimental trypsin cleavage by assuming one missed cleavage. This increased efficiency of searches remains unexplained, but is not due to artifacts, because noncanonical peptides detected by all searches (assuming cleavage at any amino acid and at any (carboxyl and amino) extremities) map with comparable rates on corresponding, detected noncanonical RNAs, and correspond in their overwhelming majorities to trypsinized peptides as expected by the experimental parameters (addition of trypsin) [66].

High Accuracy Data Searches
The same pool of theoretical chimeric peptides was used for PLGS searches of another human mitoproteome dataset [115], extracted by a higher accuracy method (Waters, Milford, MA). Mass peak estimates are more accurate (5 ppm for data extracted by [115], versus 0.5 Da for data by [113]). Precise comparison of accuracies between these techniques is unfeasible: Sequest (Thermo Fisher Scientific, Illkirch) uses fixed cutoffs; PLGS adapts cutoffs to masses of detected peptide: 0.5 Da in the latter sample would occur for peptides with mass 5 × 10 6 × 0.5 Da.
The twelve samples from [115] were processed using ProteinLynx Global Server version 3.0.1 (Waters, Saint-Quentin En Yvelines, France). Processing parameters were 250 counts for the low energy threshold, 100 counts for the elevated energy threshold and 750 counts for the intensity threshold. Hits are considered significant according to standard criteria, with PLGS peptide score 6.49. This score is compared to a decoy database to estimate FDR, as done for Xcorr from the dataset produced by [113], and peptides with q b 0.05 are retained. Each sample was searched separately for peptides 38 times, each search assuming cleavage at a different extremity (carboxyl or amino) of each amino acid species (merging L and I, 2 × 19 = 38).

Minimal Size of Detected Chimeric Peptides
Detected peptides are further filtered so as to retain only peptides with at least eight consecutive amino acids coded according to regular codons, and at least eight consecutive amino acids coded according to noncanonical expanded codons. This size was determined so that each regular-and noncanonically-encoded parts of the chimeric peptide have an approximate maximal e value 0.0014 (629622 × 1/19 8 ).

Chimeric Peptides With a Tetra-or Pentacoded Part
Tables 1 and 2 present 28 chimeric peptides detected in the MS/MS data published by [113] and 14 chimeric peptides detected in the MS/ MS data published by [115], with at least eight amino acids coded by tricodons and eight adjacent amino acids coded by tetracodons. Tables 1 and 2 also present 19 chimeric peptides detected in the Table 1 Chimeric peptides transcribed in part according to regular, and in part according to expanded codons (tetra-and pentacodons) from the human mitogenome, detected in MS data from Guegneau et al. (2014). Columns are: 1. Regular tricodon translation frame (positive strand, 1-3; negative strand, 4-6), tetra-and pentacoded parts indicated by T and P; 2. Position of regular tricoded part on translated human mitogenome; 3. S, amino acid inserted at stop codons; 4. Detected peptide sequence, minor letters indicate translated stops, "|" separates regular tricoded from other part, underlined parts are tetra-and pentacoded. Ambiguous limits between tricoded and other part are also indicated when occurring, ambiguous part is considered parsimoniously as tricoded. 5. Xcorr between expected and observed MS; 6. PSM, counts observed MS matching expected MS; 7. q, false discovery rate; 8. PEP, posterior error probability, peptide specific; 9. Position-specific amino acid modifications; 10. Positions in regular mitogenome-encoded proteins matching regular tricoded part of detected chimeric peptide. MS/MS data published by [113] and 18 chimeric peptides detected in the MS/MS data published by [115], with at least eight amino acids coded by tricodons and eight adjacent amino acids coded by pentacodons.
In about half the cases, the noncanonical part of the peptide is on the 5′ extremity of the peptide, for tetra-and pentacoded parts, for any dataset. The noncanonical part can either be at the 5′-or the 3′-encoded extremity of chimeric peptides, with no apparent bias for one of these extremities in the current results.

Chimeric Peptides Integrated in Regular Mitogenome-Encoded Proteins
Most peptides include amino acids inserted at stop codons, in each regular-encoded and noncanonical parts. The regularencoded part of a total of seven chimeric peptides corresponds to one among the 13 classical mitogenome-encoded proteins. Six among these have adjacent tetracoded parts, and one an adjacent pentacoded part. The proteins are: AT6, CytB, ND1, ND2 (two different peptides), ND5 (tetracoded) and ND1 (pentacoded). These small numbers do not enable to test whether biases exist in terms of which proteins tend to include more or less noncanonical parts, nor in relation to their position on the mitogenome, as these genes are scattered across the whole mitochondrial operon. The hypothesis that noncanonical peptides result from mitochondrial polymorphisms and heteroplasmy [116][117][118][119][120][121] is unlikely: their exact correspondence to sequences predicted by translation of expanded codons excludes this option. As a group, they cannot result from regular mitogenomic DNA variability.

Noncanonical Transcription or Translation?
Above results confirm that tetra-and pentacoded of amino acid stretches occur, conjugated with regular encoded stretches of amino acids. The alternative, that chimeric peptides originate from regular translation of chimeric RNAs produced in part by regular transcription, and in part by noncanonical transcription systematically deleting mono-or dinucleotides after each transcribed trinucleotide, cannot be excluded. The data at hand don't enable to test between these two alternatives potentially producing identical peptides. We tentatively presume that both mechanisms are at work because expanded codons and anticodons have been previously reported, and because noncanonical transcripts corresponding to transcription systematically deleting mono-and dinucleotides also exist.

Adaptive Diversity
Amino acid stretches encoded by noncanonical codons (or resulting from noncanonical transcription) might be integrated in regular mitogenome-encoded proteins, as suggested by their association with stretches of tricodon-encoded amino acids clearly corresponding to regular membrane-bound mitochondrial proteins. The possibility that these chimeric peptides are part of functional Table 2 Chimeric peptides transcribed in part according to regular, and in part according to expanded codons (tetra-and pentacodons) from the human mitogenome, detected in MS data from Alberio et al. (2014). PLGS is the score estimating goodness of fit between observed and expected MS in the PLGS peptide detection software. Δ ppm is the difference between expected and observed MS total mass. Cl indicates cleavage expected by the MS/MS search that detected the specified peptide, C indicates cleavage at the carboxyl-, and N the amino-end of the amino acid. Chym and elas indicate cleavage by chymotrypsin and elastase. proteins cannot be excluded. The existence of chimeric peptides suggests that natural protein diversity can be increased by mixing types of decoding processes, such as regular tricodons and noncanonical codons expanded by one or two nucleotides. This diversity might have unknown adaptive/functional components, including widening ranges of functionally optimal conditions at which some metabolic activities might occur. Results also stress that natural translation of expanded codons is not extremely rare. The hypothesis that expanded codons (but not systematic deletions) are adaptive at high temperatures could be tested by comparing abundances of detected peptides coded by expanded codons at different temperatures, expecting more translation according to expanded codons at higher temperatures. Other analyses searching for peptides corresponding to codons expanded by more nucleotides (N2) will also contribute to our understanding of these noncanonical transcriptions and translations that increase the coding potential of sequences.

Unwanted Effects of Genetic Engineering
Experiments and analyses exploring for which genome regions undergo the different noncanonical transcriptions and translations in which cell types and under which conditions would deepen our understanding of cell metabolism, implying likely biomedical applications. Results also stress that genetic engineering should explore potential effects of proteins produced from noncanonical transcripts and/or by noncanonical translations, to avoid undesirable effects from discarded noncanonical processes such as swinger and del-transcriptions, and translation of stop codons and according to expanded codons.

Declaration of Competing Interest
None.