Transcripts with systematic nucleotide deletion of 1-12 nucleotide in human mitochondrion suggest potential non-canonical transcription

Raw transcriptomic data contain numerous RNA reads whose homology with template DNA doesn’t match canonical transcription. Transcriptome analyses usually ignore such noncanonical RNA reads. Here, analyses search for noncanonical mitochondrial RNAs systematically deleting 1 to 12 nucleotides after each transcribed nucleotide triplet, producing deletion-RNAs (delRNAs). We detected delRNAs in the human whole cell and purified mitochondrial transcriptomes, and in Genbank's human EST database corresponding to systematic deletions of 1 to 12 nucleotides after each transcribed trinucleotide. DelRNAs detected in both transcriptomes mapped along with 55.63% of the EST delRNAs. A bias exists for delRNAs covering identical mitogenomic regions in both transcriptomic and EST datasets. Among 227 delRNAs detected in these 3 datasets, 81.1% and 8.4% of delRNAs were mapped on mitochondrial coding and hypervariable region 2 of dloop. Del-transcription analyses of GenBank's EST database confirm observations from whole cell and purified mitochondrial transcriptomes, eliminating the possibility that detected delRNAs are false positives matches, cytosolic DNA/RNA nuclear contamination or sequencing artefacts. These detected delRNAs are enriched in frameshift-inducing homopolymers and are poor in frameshift-preventing circular code codons (a set of 20 codons which regulate reading frame detection, over- and underrepresented in coding and other frames of genes, respectively) suggesting a motif-based regulation of non-canonical transcription. These findings show that rare non-canonical transcripts exist. Such non canonical del-transcription does increases mitochondrial coding potential and non-coding regulation of intracellular mechanisms, and could explain the dark DNA conundrum.


Introduction
Raw transcriptomic data include RNA reads that do not correspond to canonical transcription of the genome [1]. The DNA template of most known noncanonical RNAs is easily recovered because their sequence usually differs from the template DNA by few single nucleotide edits. Around 80-90% of the human genome is transcribed at some point during development [2,3] and has biochemical functions [4]. Most noncanonical transcripts are noncoding RNAs (ncRNAs). Small ncRNAs are <200bp long and contribute to transcription activation [5], transcription maintenance [6], translation inhibition [7], mRNA degradation [8], gene regulation [9], epigenetic modification [10,11], and RNA polymerase II backtracking [9]. Other unknown roles might exist. These coding and non-coding RNAs in the transcriptome were detected assuming canonical transcription, and transcripts not matching template DNA were ignored in further analyses. Hence this genetic information is lost because it is not considered in the analyses of non-canonical transcription. This lost genetic information could possess answers to many questions related to evolution, genetic diseases and could explain dark DNA conundrum.
Dark DNA are hidden genes in the genome of an organism that are essential for its survival whose translational product has been detected in that organism. The Dark DNA term was first coined in the genome study of sand rat that had 87 missing genes essential for its survival. However, the functional translational product of missing genes was detected in the tissue samples, inferring possible hidden genes in sand rat's genome [12]. Similarly, 274 genes were also reported missing in a bird's genomes essential for its survival [13]. It is considered that the high GC content of these hidden genes affects detection by current DNA sequencing technology.
The detection of hidden genes is mainly based on the sequence similarity of RNA sequencing reads by assuming canonical transcription. Similarly, with the recent advancement in sequencing and bioinformatic technologies numerous proteins are assigned as hypothetical genes/protein whose existence is predicted but lack experimental evidences. These hypothetical proteins are predicted assuming canonical transcription followed by translation based on the identification of open reading frames in sequenced genome. However, it is very possible that many of these gene products are not canonical RNA or peptide.
Two additional types of non-canonical mitochondrial RNAs have been detected in human whole cell transcriptomes. These RNAs differ from template DNA along systematic rules. Some noncanonical RNAs result from transcription systematically exchanging nucleotides along one among 23 bijective transformations. These produce 'swinger RNA'. Nine symmetric exchanges are of type X$Y (for example A$C) [31][32][33][34] and fourteen asymmetric exchanges of type X!Y!Z!X (for example A!C!G!A) [34][35][36][37]. Swinger DNA has been reported until now only for one symmetric exchange, A$T+C$G, in organelles [38,39] and for eukaryote nucleus-encoded rRNAs [40]. Swinger RNAs were confirmed by two NGS methods (454 and SOLID) in the amoeban giant virus Mimivirus [1]. Some RNAs are chimeric: they partly correspond to regular transcription, and an adjacent stretch is transformed along a non-identical swinger transformation [41]. Peptides matching translation of such chimeric swinger RNAs (chimeric peptides) also exist [42].
Here we use del-transformed versions of the human mitochondrial genome to mimic and detect actual noncanonical RNAs in two transcriptome datasets, one originating from the whole cell and one from purified mitochondrial lines. RNAs aligning with the transformed versions of the mitogenome are considered produced by del-transcription. To study noncanonical delRNAs, we focus on the mitochondrial genome because the entire mitogenome is transcribed i.e coding and non-coding sequences on both mitogenome strands are transcribed [52], and the short mitogenome is compatible with current computational limitations for analyses that consider transformations such as systematic deletions.
Here the principle of systematic deletions after each transcribed triplet is examined beyond deletion of mono-and dinucleotides [43], up to deletions of twelve nucleotides. Systematic deletions can start at different positions on a sequence, defining the transcriptional deletion To construct the delRNA 3-1 transformation of the mitogenome, the 4 th nucleotide following each transcribed trinucleotide is deleted. Similarly, every 4 th and 5 th nucleotides are deleted in delRNA 3-2 transformations of the mitogenome. The nucleotide(s) highlighted are the nucleotide(s) missing in del-transcribed delRNAs. These principles work for any deletion size window of k (1 to 12) nucleotides. Where k is the number of nucleotides deleted after each transcribed nucleotide triplet. 'frame', each producing different delRNAs. Numbers of potential deletion frames increase with deletion size (Fig 2), 3+k delRNAs for deletion size k. We also search for potential punctuation motifs signaling del-transcription, according to the previous study on homopolymers and circular code codons in delRNA 3-1 and delRNA 3-2 [47]. DelRNAs could result from enzyme slippage caused by homopolymers (AAA, CCC, GGG, TTT). However, on the other hand, delRNAs are expected to have less number of codons that usually conserves transcription-and/or translation frame, such as the natural circular code identified in prokaryotes and eukaryotes [48][49][50][51], a set of 20 comparatively conserved codons (AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC) [53] that enables ribosomal translation frame retrieval [48,54,55] and apparently conserves transcriptional frames [47]. Punctuations by homopolymers and circular code codons have opposite roles, therefore they are expected over-, and under-represented in detected delRNAs, respectively. These patterns have been previously found for delRNA 3-1 and delRNA 3-2 [47]. We expect similar patterns for k = 3 to k = 12: over-and under-representation of homopolymers and circular code codons, respectively.
Analyses of whole cell transcriptome data [56] are replicated on the transcriptome data extracted from purified mitochondrial lines [57] and on GenBank's human EST database, to test del-transcription reproducibility across independent datasets and sequencing techniques, and verify that results are not due to confounding effects from cytosolic RNAs matching deltransformations of the mitogenome by chance. To test the hypothesis of transcriptional frame retrieval, analyses of the universal circular code codons (X) identified in the genes of bacteria, eukaryotes, plasmids, and viruses [48][49][50][51] are compared with the circular code codons identified in the protein-coding genes of mitochondria (ACA, ACC, ATA, ATC, CTA, CTC, GAA, GAC, GAT, GCA, GCC, GCT, GGA, GGC, GGT, GTA, GTC, GTT, TTA, TTC) [58]. This candidate mitochondrial circular code includes 13 among 20 codons in common with the universal circular code but differs from the latter because it is not self-complementary, and its circular permutations do not produce circular codes (notated as the "C 3 " property occurring in the universal circular code). This mitochondrial circular code is derived from a much smaller sample than the universal circular code and hence could result from sampling biases. Nevertheless, if it is relevant to mitochondrial frame detection, the mitochondrial circular code should show stronger negative associations with detected delRNAs than the universal circular code.
We produce in silico 3+k del-transformed versions of the human mitogenome, where each version corresponds to a different deletion frame (Fig 2). For example, a del-transformed version of the mitogenome with three transcribed nucleotides followed by deletion of the next three nucleotides (k = 3) is constructed and noted as "delRNA 3-3.0 " (Fig 2A.i). Similarly, a second mitogenome version follows the same systematic transcription/deletion pattern besides that it initially deletes the first nucleotide of the transformed sequence. This transformation is noted delRNA 3-3.1 (Fig 2A.ii). A third mitogenome version excludes the 2 first nucleotides of the genome, then the rest of the genome is assumed transcribed along with the same systematic transcription/deletion pattern, noted delRNA 3-3.2 (Fig 2A.iii). A fourth mitogenome version deletes the 3 first nucleotides before initiating the systematic transcription/deletion pattern delRNA 3-3.3 (Fig 2A.iv). The fifth mitogenome version excludes the 4 first nucleotides, then applies the above systematic transcription/deletion pattern, noted delRNA 3-3.4 (Fig 2A.v). A sixth mitogenome version excludes the 5 first nucleotides, then applies the above systematic transcription/deletion pattern, noted delRNA 3-3.5 (Fig 2A.vi). Similarly, Fig 2B shows all possible del-transformed mitogenome versions for delRNA [3][4] (i.e. k = 4), producing seven deltransformed versions of the mitogenome. For delRNA  , the six del-transformation frames include all possible theoretical delRNA transformations for transcription of the first three nucleotides followed by the deletion of the next nucleotide triplet (k = 3). There are 3+k deltransformed versions of any sequence. In total 114 del-transformed mitogenome versions were created along deletion sizes 1 to 12 (S1 File).

Whole cell transcriptome analyses
Analyses below follow methods previously described for the transcriptome analyses of delRNA 3-1 and delRNA 3-2 [43]. We analyse seventy-one samples (SRX768406-SRX768476) [56] of human transcriptomic datasets available in the Sequence Read Archive (SRA) of Gen-Bank. Twenty SRA entries were analysed simultaneously by BLASTN detecting RNA reads aligning with in silico constructed delRNA versions of the human mitogenome. We used default alignment criteria for BLASTN searches.

Transcriptome analyses of purified mitochondrial lines
The same BLASTN analysis method was applied to transcriptome data (SRX084350-SRX084355 and SRX087285) extracted from purified mitochondrial lines [57]. BLASTN results of both whole cell and purified mitochondrial transcriptomes were compared to test the detection reproducibility of delRNA coverages.

EST database search
The same 114 del-transformed mitochondrial sequences as used to search SRA data are used to search for ESTs matching delRNAs in GenBank's human EST database. MegaBLAST parameters (word size of 16 and gap cost (Existence: 5, Extension: 2)) were tailored to ensure inclusion of short delRNAs matching ESTs. DelRNAs detected by MegaBLAST with more than 90% identity with input del-transformed mitogenome versions were scrutinized for further analyses.
Since both transcriptome and ESTs database originate from independent studies, delRNAs detected in these datasets were mapped on the mitogenome. In the case that delRNAs are random sequencing artefacts, or random BLAST hits due to nuclear DNA/RNA contamination, overlaps between mitogenome sequences covered by del-RNAs originating from ESTs and SRA datasets should be rare and follow random predictions. Positive bias for overlapping sequences would be considered as evidence for reproducibility in delRNA detection across independent experiments and independent sequencing methods, and strong confirmations that del-transcription exists.

Control sequence
A randomized mitogenomic sequence with the same nucleotide composition and size as the natural human mitogenome was created as a negative control. The randomized mitogenome had no significant sequence similarity with the actual mitogenome. This randomized mitogenome was used to produce 114 del-transformed (k = 1 to 12) randomized mitogenomic versions as explained in Fig 2. These sequences were separately used as a control for BLASTN and MegaBLAST searches and analyses.
However, to eliminate any false positive alignments, delRNAs that aligned in SRA/EST database and also mapped on natural mitogenome sequence (NC_012920.1). delRNAs that mapped with significant identity were removed from further analysis.

BLASTN detects delRNAs with k = 1 to 12 in the human whole cell and purified mitochondrial transcriptomes
BLASTN yields alignment of 11869 reads (Sheet A in S2 File) with the 114 del-transformed versions of the human mitogenome, as detected within 71 publicly available human whole cell transcriptome datasets (SRX768406-SRX768476). Contig alignment of these reads gave 968 delRNA contigs (Sheet B in S2 File) with mean length of~36bp, with average identity >90%. Similarly, 114 del-transformed versions of the human mitogenome aligned with 5084 SRA reads (Sheet A in S3 File) in 7 publicly available transcriptome datasets from purified human mitochondrial lines (SRX084350-SRX084355 and SRX087285). The mean length of 1767 detected delRNA contigs is 29.354bp, with mean average identity >93% (Sheet B in S3 File).
DelRNAs in both transcriptomic datasets were mapped on the mitogenome to find numbers of delRNAs covering the same mitogenomic regions. Details about delRNAs detected in the whole cell and purified mitochondrial transcriptome, i.e. the number of reads in BLASTN search for each delRNA, their position in the transformed mitogenome, percentage identity and the length of each delRNA for all deletion sizes are enclosed in S2 and S3 Files. A total of 367 delRNAs in the whole cell transcriptome (S2 File) and 390 delRNAs in the purified mitochondrial transcriptome (S3 File) had more than 2 reads.
Sequencing errors producing RNAs that artificially differ from the original natural sequences typically occur specifically for one specific strand, as cDNA libraries consist of double-stranded DNA produced on the template of natural RNA. DelRNAs could result from such sequencing artefacts. This possibility would be confirmed if delRNAs covering the same genome region overwhelmingly originate from the same DNA strand. On the contrary, if delRNAs originate from both strands, sequencing artefacts are far less likely, confirming that delRNAs are natural phenomena. This method has been previously used to confirm that A!I hyper-edited reads are not sequencing artifacts [29]. Indeed, 75.38% of delRNAs having more than 2 reads originate from both + and-strands (Sheet B in S2 File and Sheet B in S3 File). This result suggests that most delRNAs detected in this study are not sequencing artifacts.

Whole cell vs purified mitochondrial transcriptomes
Surprisingly, coverages of del-transformed mitogenomes by detected delRNAs are greater for transcriptome data extracted from purified mitochondrial lines than those extracted from whole cells for 97 among 114 del-transformations of the human mitogenome (85.1%) (S4 File). This might be an effect of internal cutoffs automatically set by BLASTN considering the size of the analysed dataset (the purified mitochondrial line dataset includes fewer reads than the whole cell data). This could also reflect technical and/or biological effects specific to each experiment. Independently of this, this result shows that a wide majority of detected delRNAs are not false positives due to confounding effects of large populations of cytosolic RNAs. At most, few isolated delRNAs might be false positives due to cytosolic contaminations.
Mitogenome sequences covered by delRNAs detected in both whole cell and purified mitochondrial data are more frequent than expected by chance for 109 among 114 del-transformed versions of the mitogenome (95.6%) (S4 File). Fig 3 gives an overview of number of delRNAs detected in both datasets along with number of overlapping delRNAs in both datasets compared to number of overlaps estimated by chance for each systematic nucleotide deletion (k = 1 to 12). All 12 del-transformations (k = 1 to 12) across both transcriptomes had more overlapping delRNAs than expected by chance (P = 0.000122 according to a one-tailed sign test using the binomial distribution). Overall, 436 delRNAs detected in both transcriptomes were overlapping same mitogenomic region with average length of~24bp. This overlapping of delRNAs is four times more frequently than expected by chance (S4 File).
Del-RNAs detected for a del-transformed randomized version of the human mitogenome in the mitochondrial transcriptome were also mapped on their respective del-transformed genome. Coverages of delRNAs detected in the purified mitochondrial transcriptome were higher for 11 among 12 deletion sizes (k) than for the randomized mitogenome sequence with the same deletion size. Obtaining this result across 12 del-transformations has P = 0.00158 (one tailed sign test, binomial distribution). Hence delRNA detections are reproducible across independent SRA transcriptome datasets and sequencing methods, and are not confounded by random matches.

MegaBLAST detects delRNAs in EST database
The Genbank ESTs aligned with the 114 del-transformed versions of natural mitogenome and randomized del-transformed mitogenome were mapped on respective del-transformed mitogenomic version. The total mean coverage by delRNAs for each deletion size (i.e k = 1 to 12) is higher for del-transformed versions of the natural mitogenome than del-transformed versions of randomized genome (S1 Fig). Obtaining this result across all 12 deletion sizes (k) has P = 0.000122 (one tailed sign test according to the binomial distribution). Fewer randomized del-transformed mitogenome versions obtained hits with the EST database than the del-transformed versions of the natural mitogenome (one tailed P = 0.0027, Fisher exact test) (Sheet A and B in S5 File). EST-delRNAs having more than 90% identity were further considered for analyses (S6 File). In total, 1395 ESTs were detected for 111 among 114 del-transformed mitogenomic versions in Genbank's human EST database (S6 File). The mean length of these delR-NAs is 24 base pairs with average identity of 97.76%. The del-transformed randomized mitogenome sequence matches with 602 ESTs across 98 among 114 del-transformed randomized version (Sheet B in S5 File). The mean length of EST-delRNAs is comparatively smaller than the delRNAs detected in mitochondrial and whole cell transcriptome. This difference in mean length of delRNAs in the EST database could be due to the stringent MegaBLAST search used for delRNAs in Genbank's human EST database than BLASTN search used for transcriptome search. The length of ESTs aligned with delRNA 3-1 was larger than the rest del-transformed versions (k>1) (S6 File). We believe that this difference in size of EST-delRNAs is due to the limitation of BLAST search algorithm to allow alignment of sequences with more than one gap after each trinucleotide, affecting the alignment score.
DelRNAs detected in Genbank's human EST database were mapped with the delRNAs detected in purified mitochondrial transcriptome and whole cell transcriptomic data to test for overlaps between delRNA coverages from these independent experimental datasets and sequencing methods.
DelRNAs in Genbank's human EST database overlapping delRNAs in purified mitochondrial transcriptome. Fig 4 shows delRNAs detected in the purified mitochondrial transcriptome, the EST database and numbers of delRNAs overlapping the same mitogenomic region and overlaps expected by chance for each systematic nucleotide deletion (k). Among a total of 1395 delRNAs from the EST database (S6 File), 615 (44.02%) EST-delRNAs overlapped with delRNAs detected in the mitochondrial transcriptome. Overall, these overlaps are 8 times more frequent than expected by chance (Fig 4, S7 File). DelRNAs in the EST database were Overlapping delRNAs in all three independent datasets are strong evidence that deletion transcription is a true phenomenon and not a sequencing artefact or contamination. Analyses below also underline the mitogenomic regions that are hotspots for deletion-transcriptions.
Association of delRNAs with protein coding genes. The 227 EST-delRNAs (S9 File) that were also detected in whole cell and purified mitochondrial transcriptome were mapped on the mitochondrial genome (NC_012920.1) to determine if these del-transformation is associated with coding or non-coding region. However, with the increase in number of systematic deletions (k), the coverage of delRNAs will increase. Therefore, it was expected that most of these delRNAs will overlap on both coding and non-coding region making it difficult to link these delRNAs to any particular mitogenomic region. Surprisingly, among 227 EST-delRNAs, 222 (97.8%) mapped on either coding or noncoding region. mapped on protein coding genes, 17 (7.5%) mapped on rRNA region, 19 (8.4%) on dloop, 2 (0.88%) and 5 delRNAs mapped on more than one mitogenomic regions. Interestingly, all the 19 delRNAs of dloop mapped on the hypervariable region 2 and 3 (HVR2 and HVR3; coverage 3.47% of mitogenome) suggesting possible association of delRNAs with HVR2. Results indicate that del-transformation is rare in conserved tRNA region and rRNA region whereas it is associated with dloop-hypervariable and protein coding regions of mitogenome. Since with increase in systematic deletion size the coverage of delRNAs on natural mitogenome increases. Therefore, we could not find an association among delRNAs and disease-causing mutations. DelRNAs that mapped on dloop hypervariable region had homopolymer frequency 2.85 times than the remaining del-transformed versions (S9 File). Similarly, 184 delRNAs mapping on protein coding genes had an overall homopolymer frequency 1.98 times than remaining del transformed mitogenome.
Homopolymers in delRNAs. Homopolymer nucleotide triplets (AAA, CCC, GGG, TTT) cause frameshifts during DNA replication and transcription [46,62] and ribosomal Mitochondrial transcripts with systematic 1-12 nucleotides deletion slippages during translation [46,[63][64][65][66]. Frequencies of these homopolymers in detected delR-NAs are compared with their frequencies in the remaining transformed mitogenome not covered by detected delRNAs. Table 1 shows frequencies of homopolymers in detected delRNAs for each del-transformation (k = 1 to k = 12) in the whole cell and purified mitochondrial line transcriptomes. In both datasets, across all del-transformation (k = 1 to k = 12), frequencies of homopolymers in detected delRNAs is higher than homopolymer frequencies in the remaining del-transformed mitogenome (Table 1). Obtaining this result across all 12 del-transformations has P = 0.000122 according to a one-tailed sign test using the binomial distribution. Chisquare tests show that the difference in homopolymer frequencies between regions covered by delRNAs and those not covered by delRNAs is statistically significant at P <0.05 for each deltransformation, for each whole cell and purified mitochondrial transcriptomes. The only exception is for delRNA 3-6 for the whole cell transcriptome data (Table 1).
Overall, homopolymer frequencies were 1.5X times higher in detected delRNAs than in the remaining del-transformed mitochondrial sequences for analyses of both transcriptome datasets (Table 1). This shows that polymerase slippage at homopolymers contributes to noncanonical del-transcription. These patterns are incompatible with spurious detections of delR-NAs by alignment searches among massive sequence data.
To further strengthen our findings, we scrutinized homopolymer frequencies in overlapping delRNAs detected independently in the transcriptomes of the whole cell and purified mitochondrial cell lines. Among 12 del-transformations (k = 1 to 12), the percentage of homopolymers in overlapping delRNAs is greater than in delRNAs detected only once, either in the whole cell or the purified mitochondrial cell line transcriptomes in 11 among 12 comparisons (Table 1). This result has P = 0.00158 according to a one-tailed sign test using the binomial distribution.
Homopolymer frequencies of delRNAs detected in Genbank's human EST database unsurprisingly gives similar result. DelRNAs detected for all 12 systematic nucleotide deletion (k) have overall higher homopolymer frequencies (P = 0.000122 according to a one-tailed sign test using the binomial distribution). Similarly, homopolymer frequencies of EST-delRNAs mapping with delRNAs detected in whole cell and mitochondrial transcriptome (S10 File) are higher for all delRNAs from k 1 to 12 with P = 0.000122 as per one-tailed sign tests using the binomial distribution, respectively.
Detection of overlapping delRNAs from two independent transcriptomes and in Genbank's human EST database in itself supports our hypothesis of del-transcription, confirming result reproducibility. The higher homopolymer frequencies in overlapping delRNAs, as compared to those detected only in one dataset, indicates that high homopolymer densities increase the frequency of del-transcription, and further confirms that del-transcription exists.
Circular code and delRNAs. The natural circular code (X) is a specific set of 20 codons universally overrepresented in the coding frame of protein-coding genes as compared to their remaining, non-coding frames [48]. As a group, codons of X enable recognizing the coding frame [55,67,68]. Though mechanisms by which this occurs are still unknown, it is believed that stretches of codons belonging to X enable ribosomes to recognize the translation frame [48,[68][69][70]. Distributions of triplets belonging to X in the structures formed by ribosomes [70] and tRNAs [69,71] suggest that X motifs in these molecules central to translation play roles in recognizing protein-coding frames. Previous analyses of delRNA 3-1 and delRNA  showed that X also regulates the frame of deletion transcriptions [47].
Here frequencies of motifs corresponding to the universal circular code X proposed for the protein-coding genes in prokaryotes and eukaryotes [48,50,51] in detected delRNAs (considering the nucleotide triplets that are not separated by deletions) are lower in 9 among 12 deltransformations than in the rest of the mitogenome not covered by delRNAs for whole cell The table shows the percentage of homopolymers in delRNAs detected for each del-transformation in whole cell and purified mitochondrial cell line transcriptomes. It also shows the homopolymer percentage in mitogenomic regions covered by overlapping delRNAs detected in both whole cell and purified mitochondrial RNA datasets. Homopolymer frequencies in detected delRNAs are higher for all del transformations in the whole cell transcriptome, mitochondrial transcriptome and mitogenomic regions covered by delRNAs from both RNA datasets than in the remaining del-transformed mitogenome. Considering the frequencies of A, C, G and T in the human mitogenome, the expected frequency of homopolymer triplets is 0.077496. All results find more homopolymer triplets in del-transformed human mitogenome versions than expected using the mononucleotide frequencies: A = 0.303931257 C = 0.31269238, G = 0.13090712 and T = 0.24708794. This is also the case for the untransformed human mitogenome, for which the expected homopolymer triplet frequencies are AAA = 0.02959325; CCC = 0.03057397; GGG = 0.00224331 and TTT = 0.01508532 (total 0.07749586) and the observed frequency is 0.0887199. transcriptomic and purified mitochondrial line transcriptomic data both (Table 2). This tendency for the avoidance of the universal circular code in mitochondrial delRNAs detected in the whole cell and mitochondrial transcriptomes has P = 0.0365, respectively (one-tailed sign tests using the binomial distribution, Table 2) and a combined P = 0.004 using Fisher's method for combining P values [72], based on a chi square distribution with 2xk degrees of freedom, where k is the number of independent P values that are combined. We also analysed frequencies of codons belonging to the proposed mitochondrial circular code X 0 (MIT) [58]. Among 12 del-transformations, 11 del-transformations in the whole cell, and 9 del-transformations in the purified mitochondrial transcriptome (Table 3) had lower frequencies of X 0 (MIT) in detected delRNAs with P values of 0.00158 and 0.0365, respectively (one-tailed sign test using the binomial distribution for each transcriptome dataset). The combined P values according to Fisher's method to combine P values yields P = 0.00028.
Frequencies of codons belonging to the universal circular code were also calculated for delRNAs detected in the human EST database (S11 File). Among 12 del-transformations, 6 del-transformations had higher percentages of universal X than in the remaining del-transformed mitogenome (P = 0.3063 one-tailed sign test using the binomial distribution). Eight among 12 del-transformations had higher percentages of universal X in EST-delRNAs overlapping delRNAs detected in whole cell transcriptome than for the remaining non-overlapping delRNAs detected in only a single database (P = 0.0969). Similarly, percentages of codons belonging to the circular code in EST-delRNAs overlapping delRNAs from the mito-transcriptome is higher for 9 among 12 del-transformed versions (P = 0.0365, one-tailed sign test, binomial distribution). The combined P values according to Fisher's method to combine P values yields P = 0.0126.
Proposed mitochondrial circular code X 0 (MIT) are slightly stronger than the universal circular code for delRNAs detected in EST-delRNAs. Among 12 del-transformations, 7 del- Table 2. Frequencies of codons belonging to the universal circular code X {AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT,  GCC, GGC, GGT, GTA, GTC,   transformations in EST database, 8 del-transformations for EST-delRNAs overlapping delR-NAs detected in the whole cell transcriptome and 8 del-transformations for EST-delRNAs overlapping delRNAs detected in the mito-transcriptome had higher percentages of X X 0 (MIT) (S12 File), with P = 0.0969 according to one-tailed sign tests using the binomial distribution. Fisher's method to combine P values yields combined P = 0.011. Positive results were stronger for avoidance of the proposed mitochondrial circular code in delRNAs as compared to the universal circular code X identified for prokaryote and eukaryote protein-coding genes (Tables 2 and 3). These results cautiously support that a different circular code exists in mitochondria [58]. We have no explanation for the higher frequency of X ( Table 2, Table 3, S11 File, S12 File) in delRNAs detected in the mitochondrial transcriptome for high k (delRNA 3-10 , delRNA 3-11 and delRNA [3][4][5][6][7][8][9][10][11][12] ).

General discussion and future prospects
In this study we tested a phenomenon of systematic nucleotide deletion during transcription in three independent datasets. Results confirm the working hypotheses that sometimes, transcription systematically deletes nucleotides. Analyses detect delRNAs corresponding to systematic deletion of 1 to 12 nucleotides after every transcribed tri-nucleotide in the human mitochondrial transcriptome, for three independent transcriptome datasets, two sequenced by NGS methodologies and one by Sanger methodology. Numerous delRNAs were detected in all three datasets, indicating high reproducibility of the human mitochondrial del-transcriptome. DelRNAs detected in this study are around 25-30 bp long. BLASTN search of delRNA 3-1 transformed mitogenome aligned with much longer ESTs having a systematic single gap in subject sequence and having continuous alignment at the terminals of the subject sequence (Not shown here). We believe these delRNAs could be part of much longer chimeric RNAs due to the RNA polymerase switching from canonical transcription to non-canonical deltranscription.   Table 2. https://doi.org/10.1371/journal.pone.0217356.t003 Homopolymers AAA, CCC, GGG and TTT, which cause polymerase slippage [46] and transcriptional frameshift [46] have higher frequencies in detected delRNAs than in the rest of the mitogenome. This suggests that these nucleotide triplets signal del-transcription. Their function seems opposite to codons belonging to the circular codes which apparently prevent del-transcription. DelRNAs might result from systematic deletions occurring during transcription, but a second possibility, RNA editing by an elusive mitochondrial spliceosome could also produce delRNAs, resembling observations from eukaryotic cytosols [73]. Circular code codons (X) enable recognizing the coding frame [55,67,68] and transcriptional frame [47]. Higher circular code frequencies in higher del-versions (k>12) of delRNAs might indicate the role of circular codes in post transcriptional editing by the mitochondrial splicesome.
DelRNAs detected in the purified mitochondrial line transcriptome have particularly high homopolymer frequencies (Table 2). Notably, detected delRNAs are poor in codons belonging to the hypothetical mitochondrial circular code, strengthening the still weak evidence indicating the existence of a different circular code in mitochondria as opposed to the universal circular code. This observation strengthens the discussion about mitochondria representing an independent branch of the tree of life [74], perhaps together with giant viruses [75].
Observations here support the hypothesis that mitochondrial genomes have greater coding potential than believed, also supported by the observations of two lncRNAs in human mitochondrial DNA [76]. An association between detected delRNAs (k = 1 and 2) and peptides matching human proteome data [43] suggest translation of delRNAs. We recommend similar mass spectrometry human proteome analyses for peptides translated from delRNAs with k = 3 to 12. Noncoding RNAs associated with ribosomes are translated into peptides [77], suggesting possible dual roles of delRNAs. Further in-depth analyses of delRNAs may help to explain various missing/hidden genes called "Dark DNA", whose translational products are detected in mass spectrometry data [12]. 1. systematic deletion lengths (k = 1 to 12 deleted nucleotides); 2. 'n': nucleotides deleted before del-transcription starts; 3-5. Numbers of delRNAs detected with their coverage on deltransformed mitogenome and average lengths for each del-transformation in whole cell transcripome data (SRX768406-SRX768476); 6-8. Numbers of delRNAs detected, coverage on deltransformed mitogenome and their average lengths in purified mitochondrial transcriptome data (SRX084350-SRX084355 and SRX087285); 9 and 10. Number of delRNAs with their average lengths detected in both whole cell and purified mitochondrial transcriptomes and covering the same mitogenomic regions; and 11. Numbers of delRNAs covering the same mitogenomic region expected by chance.