Evolution and structural variations in chloroplast tRNAs in gymnosperms

Chloroplast transfer RNAs (tRNAs) can participate in various vital processes. Gymnosperms have important ecological and economic value, and they are the dominant species in forest ecosystems in the Northern Hemisphere. However, the evolution and structural changes in chloroplast tRNAs in gymnosperms remain largely unclear. In this study, we determined the nucleotide evolution, phylogenetic relationships, and structural variations in 1779 chloroplast tRNAs in gymnosperms. The numbers and types of tRNA genes present in the chloroplast genomes of different gymnosperms did not differ greatly, where the average number of tRNAs was 33 and the frequencies of occurrence for various types of tRNAs were generally consistent. Nearly half of the anticodons were absent. Molecular sequence variation analysis identified the conserved secondary structures of tRNAs. About a quarter of the tRNA genes were found to contain precoded 3′ CCA tails. A few tRNAs have undergone novel structural changes that are closely related to their minimum free energy, and these structural changes affect the stability of the tRNAs. Phylogenetic analysis showed that tRNAs have evolved from multiple common ancestors. The transition rate was higher than the transversion rate in gymnosperm chloroplast tRNAs. More loss events than duplication events have occurred in gymnosperm chloroplast tRNAs during their evolutionary process. These findings provide novel insights into the molecular evolution and biological characteristics of chloroplast tRNAs in gymnosperms.


Background
Gymnosperms comprise a large group of seed plants with a widespread distribution around the world. Gymnosperms are the dominant species that form forest ecosystems in the Northern Hemisphere, which constitute 39% of the world's forests, and they have great ecological and economic significance [1]. According to the Christenhusz gymnosperms system, the extant gymnosperms are divided into 12 families, 86 genera, and about 1063 species [2]. Conifers are the most abundant group of existing gymnosperms, and they occupy a similar niche to that in the early stages of their evolution because they have strong drought resistance [3]. The genetic relationships between gymnosperms and angiosperms mean that their phylogenetic status is important [4]. Furthermore, gymnosperms have a long and extensive fossil record that dates back to the Carboniferous (c. 290 million years ago (Mya)). The five main lineages of gymnosperms (cycads, Ginkgos, cupressophytes, Pinaceae, and gnetophytes) separated from each other during the Late Carboniferous to the Late Triassic (311-212 Mya) [5].
Transfer RNA (tRNA) is one of the most ancestral types of RNA and tRNAs are ubiquitous in all living organisms from prokaryotes to eukaryotes [6]. tRNAs comprise a class of microRNAs that carry and transport amino acids, and they play central roles as the links between mRNA and protein. During protein translation, a tRNA pairs its anticodon with a codon on mRNA and carries specific amino acids to ribosome sites to mediate protein biosynthesis [7,8]. tRNAs are multifunctional molecules that are involved in multiple metabolic processes in cells in addition to their translation function, e.g., aminoacyl-tRNA is a biosynthetic precursor and amino acid donor for other macromolecules [9]. Each tRNA can carry only one amino acid, but one amino acid can be carried by multiple tRNAs called isoreceptor tRNAs [10]. In 1965, the first tRNA comprising tRNA Ala in yeast was sequenced to determine its primary structure [11]. The secondary structures of tRNAs are mostly conserved and clover-shaped, where they have an amino acid receiving arm, D-arm, anticodon arm, D-loop (a loop coupled to the D-arm), anticodon loop (a loop coupled to the anticodon arm), and a TΨC loop (a loop coupled to the TΨC arm) [12]. The nucleotide sequence of a tRNA is hydrogen bonded to form a clover-shaped secondary structure, which then folds into an inverted L-type tertiary structure [13].
Chloroplasts are multi-copy organelles in plant cells that are responsible for photosynthesis and carbohydrate metabolism [14]. Chloroplasts play vital roles in the growth and development of plants, including the synthesis of nucleotides, amino acids, fatty acids, vitamins, phytohormones, and several other metabolites [15][16][17]. The chloroplast genome is a highly conserved, doublestranded circular molecule containing genes that encode tRNAs, rRNAs, and many proteins [18,19]. The semiautonomous and complete expression system of the plant plastid genome makes it a good material for evolutionary and genomics research [20,21]. In addition, tRNAs act as a bridge in the gene expression process. Therefore, analyzing the tRNA genes in chloroplasts can provide a theoretical basis to facilitate further studies of the structure, function, and evolutionary relationship of tRNAs.
Previous studies have investigated the evolution and structure of tRNAs in several gymnosperms, Adoxaceae plants, and Oryza sativa [22][23][24]. In the present study, we selected 54 species belonging to 54 different genera in the gymnosperm phyla and systematically analyzed their chloroplast tRNAs. We extracted and re-annotated tRNA genes in the chloroplast genome of each species to determine the differences in the composition, conservation, and structural changes in chloroplast tRNAs in different plants, as well as the evolutionary relationships and main events that affected tRNAs during their evolutionary process. In addition, the relationships between the structure of gymnosperm chloroplast tRNAs and their minimum free energy were studied for the first time. This main aims of this study were to understand: (1) the distributions and conservation of different types of tRNAs in gymnosperm chloroplasts; (2) why certain tRNAs always contain precoded 3′ CCA tails; (3) how the minimum free energy affects the stability of the secondary structure of tRNAs; and (4) the main types of events that have occurred in gymnosperm chloroplast tRNAs during their evolutionary history.

Chloroplast tRNA gene compositions in gymnosperms
In the chloroplast genomes of the 54 gymnosperms considered in this study (Table S1), 1779 tRNA genes were annotated that encoded 20 essential amino acids. The chloroplast tRNA gene contents of the plants were relatively uniform [8]. The average number of chloroplast tRNA genes in each species was approximately 33. Callitris rhomboidea, Dacrycarpus imbricatus, and Pseudotaxus chienii encoded only 27 tRNAs, and Gnetum parvifolium and Macrozamia mountperriensis encoded up to 39 tRNAs (Fig. 1).
Almost every tRNA was encoded in the chloroplast genome of each species, but some tRNAs were not encoded in some species (Fig. 1). In particular, tRNA Ala was found to be missing in eight species, tRNA Thr , tRNA Glu , tRNA Phe , and tRNA Leu were missing in one species, tRNA Val was missing in two species, tRNA Lys was missing in 14 species, and tRNA Gln was missing in five species. More tRNA Ser , tRNA Arg , and tRNA Leu genes were present in the chloroplast genomes of all species. tRNA Ser appeared three times in most species, two or four times in some species, and six times in Nothotsuga longibracteata. tRNA Arg and tRNA Leu generally appeared 2-3 times in many species, but tRNA Leu appeared six times in Ephedra equisetina. tRNA Gly , tRNA Pro , and tRNA Thr were the next most abundant tRNA genes and they occurred twice in most species. However, suppressor tRNA and selenocysteine were completely absent from the chloroplast genomes of the 54 gymnosperms, as also found in Adoxaceae [23] and monocot plants [24].
The lengths of the gymnosperm chloroplast tRNAs ranged from 56 to 90 nucleotides, and the average length was about 82 nucleotides. tRNA Gly (UCC) in Cunninghamia lanceolata was the smallest gene detected and it only contained 56 nucleotides. The sequences of tRNA-Leu , tRNA Ser , and tRNA Tyr all contained more than 80 nucleotides. A few tRNA Ser genes contained 90 nucleotides, and tRNA Gly (UCC) in Sequoia sempervirens was also 90 nucleotides in length. The lengths of the other tRNAs were all about 73 nucleotides, but a few were shorter than 70 nucleotides.

Conservation of gymnosperm chloroplast tRNAs
Different tRNAs can transport different amino acids according to their nucleotide compositions and structures. The tRNA sequences were analyzed to identify their conserved regions (Table 2). Comparative analysis of the nucleotide compositions in the tRNA loops and arms detected conserved nucleotides or nucleotide sequences in multiple positions. In particular, these analyses showed that at the first position in the acceptor arm, tRNA Ala (UGC), tRNA Gly (GCC and UCC), tRNA Thr (GGU), tRNA Ser (GCU, GGA, and UGA), tRNA Arg (ACG and CCG), tRNA Leu (CAA, GAG, UAA, and UAG), tRNA Lys (UUU), tRNA Phe (GAA), tRNA Asp (GUC), tRNA Glu (UUC), tRNA His (GUG), tRNA Ile (CAU, GAU), tRNA Tyr (GUA), and tRNA Cys (GCA) contained a conserved 5′ G nucleotide, whereas tRNA Pro (UGG), tRNA Met (CAU), and tRNA Val (GAC and UAC) contain a conserved A nucleotide, and tRNA Asn (GUU) and tRNA Gln (UUG) contained a U nucleotide. However, the nucleotide in the first position in the acceptor arm was not highly conserved in tRNA Trp (CCA), tRNA fMet (CAU), and tRNA Thr (UGU). The G nucleotide content was higher in the region of the acceptor arm. tRNA Ser (GCU and UGA) had conserved G-G-A-G-A-G-A nucleotide sequences in the acceptor arm. In the first position in the D-arm, tRNA Val (GAC) and tRNA Lys (UUU) contained a conserved A nucleotide, tRNA Tyr (GUA) contained a conserved C nucleotide, and tRNA Pro (GGG) and tRNA Thr (GGU) contained an A or G nucleotide, whereas tRNA Met (CAU) contained no conserved nucleotides in this position, and all of the other tRNAs contained a conserved G nucleotide. In addition, tRNA Ala (UGC), tRNA Thr (UGU), tRNA Val (UAC), tRNA Arg (ACG and CCG), tRNA Leu (GAG), tRNA Phe (GAA), tRNA Asn (GUU), and tRNA Ile (GAU) contained a conserved G-C-U-C nucleotide sequence, and tRNA Cys (GCA), tRNA His (GUG), and tRNA Gln (UUG) contained a conserved G-C-C nucleotide sequence. The D-loop was found to contain a conserved A nucleotide in the first position, except in tRNA Gly (GCC), tRNA Ser (GCU and GGA), tRNA Leu (UAA), and tRNA Ile (CAU). The last position in the D-loop comprised a highly conserved A nucleotide, except in tRNA Gly (GCC). The degree of conservation was lower in the anticodon arms with no conserved nucleotides in any position ( Table 2). The second position in the anticodon loop was a conserved T nucleotide. The last position in the anticodon loop was generally a conserved A nucleotide. In addition, it should be noted uracil and adenine were strongly preferred in the anticodon loop. Moreover, the conservation of nucleotides was very low in the variable region because of its structural variability, although many tRNAs still possessed a conserved C nucleotide in the last position in the variable region. The Ψ-arm and Ψ-loop were the most highly conserved regions in terms of both the nucleotide number and nucleotide composition. The Ψarms all contained five nucleotides in the last two positions in this region, and they were mostly G nucleotides in the tRNAs. The Ψ-loops all contained seven nucleotides with a highly conserved U-U-C sequence and most tRNAs had a conserved U nucleotide in the last position. The presence of an intact CCA sequence is a basic prerequisite for the participation of tRNAs in the mRNA decoding process [26]. The 3′ terminal regions of  eukaryotic tRNAs generally lack a CCA sequence, and thus adding a 3′ CCA tail is an important step in tRNA biosynthesis. In the gymnosperms investigated in the present study, tRNA Ala , tRNA Arg , tRNA Glu , tRNA Leu , tRNA Tyr , and tRNA Lys were found to contain a 3′ CCA tail (Fig. 2), but most tRNAs did not have 3′ CCA tails.

Nucleotide variations in tRNA arms and loops
The number of nucleotides was also conserved in the loop arm of each tRNA. In the 1779 tRNAs considered in this study, the number of nucleotides in the acceptor arm ranged from 0 to 8 (  contained four, and the remaining tRNAs contained five nucleotides. All tRNAs possessed seven nucleotides in the Ψ-loop.

Four types of structural changes in tRNAs
The general structure of a tRNA is characterized by an amino acid receiving arm, D-arm, D-loop, anticodon arm, anticodon loop, variable region loop, TΨC arm, and TΨC loop. However, some novel tRNA structures were found in the present study, which were assigned to the following four types ( , and tRNA Tyr (GUA) had the same structure in all species, with extra loops and arms. tRNA Leu also possessed a UAG anticodon, but the variable region did not have this structure. The only two tRNAs with the type 4 structure were tRNA Tyr (GUA) and tRNA Ser (UGA). We calculated the minimum free energy (ΔG) for the novel tRNA and some normal tRNA structures ( Table  4). The result showed that the average minimum free energy was − 12.6 kcal/mol for tRNAs with the type 1 structure, which was much higher than the normal tRNAs (ΔG = − 26.5 kcal/mol). Therefore, the absence of the acceptor arm generally reduced the stability of the tRNA structure. The minimum free energy was around − 19.3 kcal/mol for the tRNAs with the type 2 structure. tRNA Gly (GCC) in Sequoia sempervirens had the lowest minimum free energy (ΔG = − 28.3 kcal/mol) among those with the type 2 structure, and thus the presence of extra nucleotides at the 3′ end greatly improved the stability of the structure. By contrast, tRNA Met (CAU) in Cephalotaxus oliveri had the highest minimum free energy (ΔG = − 11.8 kcal/mol), and thus its stability was greatly reduced due to the presence of atypical nucleotides at the 3′ end. The average minimum free energy was − 33.2 kcal/mol in tRNAs with the type 3 structure. The minimum free energy values determined for these tRNAs were generally below − 30.0 kcal/mol. The values were always very low for tRNA Tyr (GUA). Therefore, the loops and arms in the variable region acted together with the structures in other regions to create an extremely stable tRNA structure. However, compared with other tRNAs with the type 3 structure, tRNA Leu (CAA) was remarkable because of its higher minimum free energy value of around − 26.1 kcal/mol, which was much greater than the average for tRNAs with the type 3 structure (ΔG = − 32.8 kcal/mol) and close to that for tRNAs with the normal structure (ΔG = − 26.5 kcal/mol).
Thus, the structural changes in the variable region of tRNA Leu (CAA) had no obvious effects. The average minimum free energy was − 28.3 kcal/mol for tRNAs with the type 4 structure and the values were quite different for each of these tRNAs, where some were above the average value and some were below. Therefore, multiple influences may have been involved when the structure changed at the 3′ end and in the variable region. Moreover, considering the average minimum free energy value for the tRNAs with normal structures (ΔG = − 26.5 kcal/mol) as a reference, the values for those with type 1 and type 2 structures were much higher, but lower for those with the type 3 structure. Thus, changes in the structures of the tRNAs affected their stability.

Gymnosperm tRNAs evolved from multiple common ancestors
In this study, the consensus coding sequences (CDSs) in the complete chloroplast genomes of 54 gymnosperms and the chloroplast genome of Alsophila spinulosa as an outgroup were used to construct a phylogenetic tree (Fig. S1). The result showed that species from the same family clustered on the same branch, which is consistent with the Christenhusz gymnosperms system [2] and previous studies [27]. In addition, a phylogenetic tree was constructed used the maximum likelihood method to assess the evolutionary relationships among all of the gymnosperm tRNAs (see Fig. 4 and Fig. S2, where the numbers on the branches of the evolutionary tree represent the bootstrap values). The phylogenetic tree contained two large clusters and 32 small groups. Cluster I contained 28 groups and it was much larger than cluster II with four. Not every type of anticodon was present in a group and the anticodons that occurred less frequently were often present on the same branch as other anticodons. For example, tRNA Lys (CUU) appeared only once in Cunninghamia lanceolata and it grouped together with tRNA Asn (GUU). In addition, tRNA Ile (UAU) appeared only once in Taxus baccata and it grouped with tRNA Val (GAC). Similarly, tRNA Leu (GAG) appeared twice in Ephedra equisetina and it grouped on the branch with tRNA Ile (GAU). These findings were due to the high similarity among the tRNA sequences. The low values on most branches were due to the extremely high conservation of tRNAs, where there were very few differences among the sequences.
In the top clade in the phylogenetic tree, the branches with tRNA fMet (CAU) and tRNA Trp (CCA), tRNA Asn (GUU), tRNA Arg (ACG), and tRNA Arg (CCG) together indicated a stepwise evolutionary relationship. However, the other UCU anticodon of tRNA Arg did not appear with tRNA Arg (ACG) and tRNA Arg (CCG) in 55 tRNAs, and it co-occurred with another stepwise evolutionary relationship involving tRNA Glu (UUC), tRNA Gly (UCC), and tRNA Lys (UUU). The three tRNA Ser anticodons (GGA, UGA, and GCU) occurred simultaneously in 208 tRNAs and grouped together with tRNA Gln (UUG) on the same branch. These findings suggest that tRNA Gln and tRNA Ser belonged to a common evolutionary lineage. The three tRNA Leu anticodons (CAA, UAA, and UAG) occurred simultaneously in 158 tRNAs on one branch and they were at the bottom of the phylogenetic tree. The branches containing tRNA Leu and tRNA Pro (GGG) together formed the second cluster. Therefore, tRNA Pro and tRNA Leu had a close relationship and they were far from the first cluster of tRNA groups. Moreover, tRNA Thr (UGU), tRNA Val (UAC), and tRNA Ala (UGC) grouped together, thereby indicating their common evolutionary lineage. Similarly, the common evolutionary lineage of tRNA Met (CAU), tRNA Thr (GGU), and tRNA Val (GAC) was evident because they were present on the same branch, and the same applied to the branch containing tRNA Tyr (GUA), tRNA Pro (UGG), tRNA Cys (GCA), and tRNA His (GUG). The phylogenetic tree also showed that tRNA Phe (GAA) and tRNA Ile (GAU) were grouped separately, where they each occupied a small branch instead of grouping together with the other types of tRNAs.

Higher rate of transitions than transversions
A transition is a change from one purine to another purine (A to G or G to A) or one pyrimidine to another pyrimidine (C to U/T or U/T to C). A transversion is a change from one purine to a pyrimidine (A or G to U/T or C) or the opposite (U/T or C to A or G) [28]. Analyzing the patterns of base mutations can help to understand the molecular basis of evolution. Table 5 shows the transition and transversion rates for each tRNA as well as the overall levels in the gymnosperms investigated in the present study. tRNA Asp

Duplication and loss events in gymnosperm chloroplast tRNAs
After a gene duplication event, a copy of each replicated gene pair tends to undergo a loss event. Gene loss events occur frequently [29]. We calculated the duplication and loss events in the gymnosperm chloroplast tRNAs (Fig. 5 and Fig. S3) and found that 1333 genes were duplicated whereas 3657 genes were lost. In addition, 314 genes were affected by conditional duplication events. Loss events were far more frequent than duplication events, and most of the chloroplast tRNAs had been affected by loss events during the course of their evolution.

Distribution of tRNAs
Our analysis of gymnosperm chloroplast tRNAs showed that the tRNA genes were conserved in terms of both their quantity and composition. The number of tRNA genes in the chloroplast genome differed little between species and the frequency of each tRNA gene was basically the same, with only slight differences. Some tRNAs may have been lost occasionally in a few species, but tRNA genes in the nucleus or other organelles can replace the functions of these missing tRNA genes [30]. It has been shown that tRNA Ser , tRNA Arg , and tRNA Leu always occur at higher frequencies in the chloroplast genomes of gymnosperms. In addition, the lengths of tRNA Ser and tRNA Leu sequences can clearly be longer due to the different nucleotides in the variable region. A previous study also demonstrated that tRNA Leu has a large variable region [31]. The main function of the variable region in tRNAs has not been fully elucidated, but it has been shown that larger variable regions can increase the affinity of tRNA for ribosomes and stabilize the tRNA-ribosomal complex in various environments to enhance the interactions between tRNAs and ribosomes [32]. This may explain why these types of tRNAs are more commonly found in plant chloroplasts and their association with many biological processes. Suppressor tRNA is a mutated form of tRNA and it can read mRNA in a new manner and allow the insertion of appropriate amino acids at a mutation site in a protein-coding gene to suppress the phenotypic effect of a coding mutation, thereby affecting the production of functional cellular proteins [33][34][35][36]. Suppressor tRNA is not found in gymnosperm chloroplast genomes. In addition, selenocysteine inserting tRNAs are absent from the chloroplast genomes in gymnosperms, Adoxaceae, and monocot plants [23,24]. Selenocysteine is an atypical amino acid [37] and the 21st amino acid involved in the ribosome-mediated synthesis of proteins via a UGA codon. Selenocysteine is found in both prokaryotes and eukaryotes [38], but it is an oxygen-labile amino acid with a degree of toxicity [39]. In the present study, we found at least one tRNA Met and one tRNA fMet in each species, where both corresponded to the CAU anticodon. It is known that tRNA fMet is necessary for initiating the protein translation process in prokaryotes [40][41][42][43]. The initiator tRNA is always well conserved [44]. tRNA fMet (CAU) and tRNA Met (CAU) are both essential in plants [45]. Interestingly, we found that tRNA Met and tRNA Ile both contained the same CAU anticodon. The relationship between the identification and matching of codons is highly complex. Previous studies in bacteria [46][47][48] have shown that when the C nucleotide is modified in the CAU anticodon, tRNA Ile can recognize isoleucine, whereas the unmodified tRNA Ile with the CAU anticodon will interact with methionine. It has also been demonstrated that this change has the same effect in plant chloroplasts [45], which can be explained by the prokaryotic origin of the chloroplast.

Distribution of anticodons
The genetic code is based on 64 codons where 61 can encode amino acids and three are stop codons, but they usually do not all appear together. In the present study, only 34 types of anticodons were found in gymnosperm chloroplast tRNA genes, where some anticodons occurred in all species and some were only found occasionally in a few species. These 34 types of anticodons can fulfill the roles of all 61 anticodons and they are responsible for protein translation in the chloroplast. By contrast, 28 anticodons are found in the chloroplast genomes of Adoxaceae species [23] and 28 anticodons in monocot plants [24]. The degeneracy of the genetic code is explained by the "wobble hypothesis" where the first and second bases in a codon pair strongly with the anticodon but the third base can form a non-Watson-Crick     base pair with the anticodon [49]. Thus, some types of tRNAs can correspond to multiple anticodon types and one amino acid can be carried by multiple tRNAs. Substitutions in protein-coding genes are usually distributed according to the codon structure and substitutions often occur at the third position in the codon. Moreover, multiple anticodons corresponding to one tRNA have the same "evolutionary potential" [25,50]. In addition, organisms can differ in terms of their codon usage preferences. The use of synonymous codons is non-random and it is mainly determined by specific preferences in the translation process [51]. There is a strong correlation between codon usage and the tRNA content, and the codon selection pattern tends to be highly conserved in the evolutionary process. Genes with high expression levels often have codons that correspond to more abundant tRNA types [52][53][54], and thus the gene expression levels are strongly related to codon usage preferences [55,56]. According to the results in Fig.  1 and Table 1, the overall frequencies of the codons contained in tRNA Ser , tRNA Arg , and tRNA Leu were higher. Codon usage selectivity occurs in organisms because the use of common codons in highly abundant tRNAs can greatly reduce the risk of depleting the translation mechanism [57].
Highly conserved secondary structure of tRNAs The secondary structure of tRNAs is shaped like a clover leaf, with an acceptor arm containing seven nucleotides, D-arm containing 3-4 nucleotides, D-loop containing 4-12 nucleotides, anticodon arm containing five nucleotides, anticodon loop containing three nucleotides, variable region containing 4-23 nucleotides, Ψ-loop containing five nucleotides, and Ψ-arm containing seven nucleotides [58,59]. However, we found that some of the chloroplast tRNAs had different secondary structures in gymnosperms and not all fully conformed to the traditional pattern. Moreover, the differences in the numbers of nucleotides in different tRNA regions were strongly related to the type of tRNA and they even varied according to the corresponding anticodon. For example, tRNA Gly (GCC) contained four nucleotides in the variable region but tRNA Gly (UCC) contained five nucleotides (Table 2). In addition to the number of nucleotides, the nucleotide compositions in different regions also varied. The sequences of tRNAs were found to be highly conserved with common nucleotides or sequences in almost every region, and their conservation was related to the type of tRNA. Alignment of the tRNA sequences showed that the Ψ-loop was the most highly conserved without any changes and the Ψ-arm was also extremely well conserved, where only a small part of the tRNA was mutated in this region. Similar results were found in a previous study of the conserved regions of chloroplast tRNAs in monocot plants [24]. The Ψ-loop contained a common sequence comprising U-U-C and it was previously reported that conserved bases in the Ψloop determine the stability of tRNAs in thermophilic bacteria [60]. The anticodon loops were also highly conserved where most contained seven nucleotides. The anticodon loop is the region that matches with the codon in mRNA, so high accuracy is required. The addition of a conserved C-C-A sequence at the 3′ end of tRNA is necessary for tRNA maturation, which is mediated by tRNA nucleotidyltransferase, and tRNAs can only carry amino acids when the CCA tail is present [61,62]. However, the addition of CCA tails does not always require the action of tRNA nucleotidyltransferase, and CCA tails are sometimes included in the tRNA gene templates in bacteria. It has been reported that the templated 3′ CCA sequence in bacteria is very common in the initial tRNA (tRNA fMet ) as well as in tRNA Tyr [63].
In the present study, we found that gymnosperm plant chloroplast tRNA genes for tRNA fMet and tRNA Tyr all carried an encoded 3′ CCA sequence in each species, which suggests that part of the prokaryotic translation mechanism was retained during chloroplast evolution. In addition, the main factor that affects protein synthesis is the initiation of translation [64,65], and thus 3′ CCA templating can greatly enhance the rate of protein expression because it accelerates the maturation of tRNA.

Phylogenetic relationships
The phylogenetic analysis of all tRNA genes showed that tRNA fMet (CAU) appeared twice in the phylogenetic tree, i.e., at the top of the tree grouped together with tRNA Trp (CCA), and in the lower part of the tree grouped together with tRNA Thr (GGU) and tRNA Val (GAC) (Fig. 4 and Fig. S2). These findings indicate that tRNA fMet (CAU) evolved from multiple common ancestors, and that tRNA fMet (CAU) has undergone more frequent duplication events during its evolution.

Effects of structural changes on the stability of tRNAs
The minimum free energy of a molecule is closely related to its structure and it ensures the thermodynamic stability of RNA [66]. The minimum free energy can be used to predict the secondary structure of RNA [67][68][69].
In thermophiles, the folding of tRNA undergoes adaptive changes to improve its stability because changes in the tertiary structure can affect the stability of tRNA [70]. In this study, we found several changes in the structure of tRNAs, which were roughly divided into four categories and clear patterns were identified in the corresponding minimum free energy values. Compared with the normal structure of tRNAs, these structural changes increased or decreased the minimum free energies of tRNAs. It has been reported that changes in the acceptor arm will increase the free energy of tRNA [71], which is consistent with the results obtained in the present study because the free energy was higher when the acceptor arm was lacking nucleotides or redundant nucleotides were present. tRNAs with large variable regions were not rare and the large variable regions have even evolved into conserved structures in some types of tRNAs, such as tRNA Leu . Thus, this type of structural change greatly reduced the free energy of tRNA to increase the stability of the structure. Figure 1 shows that the frequency of occurrence was relatively high for tRNA Leu , which may indicate that this type of structural change in tRNA Leu proved beneficial for its utilization by plants.

Evolution of substitution rate
Eight types of transversion and four types of transition are possible, and thus transversions should be more frequent from a probabilistic perspective, but our statistical results indicated a high transition rate, i.e., "transition bias" [72]. This bias can be explained by the fact that transitions have less effect on proteins than transversions [73]. In particular, conversion involves substitution with bases of the same type whereas inversion involves substitution with bases of a different type. The structural differences are small among the members within the separate purine and pyrimidine families, whereas the structural differences are large between purines and pyrimidines. Thus, transitions have less effect on the structure of proteins. In addition, the transversion rate was zero for tRNA Asp . One possibly because it has not undergone any transversions during its evolution. Another possibility is that the synthesis of this tRNA will be terminated if a transversion occurs in this gene, thereby resulting in an undetectable transversion rate. Overall, the chloroplast tRNAs in gymnosperms mainly underwent base transversion during their evolution.

Duplication and loss events during evolution
Gene duplication and loss events have occurred very frequently in plant genomes, and they have been important factors during their evolution [74][75][76]. The size of the chloroplast genome has decreased throughout evolutionary history and gene loss events continue to occur [77]. As shown in Fig. 1 and Table 1, some tRNAs were absent from certain species and nearly half of the anticodons were also absent. These results demonstrate that loss events mainly occurred during the evolution of chloroplast tRNA genes in gymnosperms.

Conclusions
This work provides a further explanation for the structural variations and evolution of the chloroplast tRNA in gymnosperms. We found that the chloroplast tRNAs in gymnosperms mainly underwent base transversion and loss events during their evolution. The precoded 3′ CCA sequences were found in some gymnosperm chloroplast tRNAs sequences, it suggested that part of the prokaryotic translation mechanism was retained. In addition, we speculated that the utilization of certain tRNA types in gymnosperms chloroplasts might be related to the patterns of tRNA minimum free energy.

Materials and methods
Acquisition of chloroplast tRNA genes and secondary structure analysis Chloroplast genomes for 54 gymnosperms (Table S1) were downloaded from the public database at the National Center for Biotechnology Information (NCBI, https:// www.ncbi.nlm.nih.gov/). Chloroplast genome annotation and tRNA gene extraction were conducted with Geneious [78]. All of the gymnosperm chloroplast tRNA gene sequences were uploaded to the tRNAscan-Se server to predict their secondary structure and to obtain other related results [79]. The free energies of tRNAs with structural changes were calculated using the RNAalifold web server with the default parameters.

Multiple sequence alignment
All tRNAs were classified according to their different types to identify the consensus sequence in each region. Similarly, the consensus sequence in each region was determined at the overall tRNA level. Multiple sequence alignment of tRNA genes was performed with the Multalin server [80]. All of the sequences were used for alignment analysis with the following parameters in FASTA format: sequence input format, auto; display of sequence alignment, colored; alignment matrix, Blo-sum61-12-2; gap penalty at opening and extension, default; gap penalty at extremities, none and one iteration only, none; highest consensus value, 90% (default); and low consensus value, 50% (default). In the displayed alignments, red indicates similarity/conservation of 90% or more, blue indicates sequence conservation less than 90%, and black indicates no conservation. The CDS sequences of chloroplast genomes in each species were obtained using Geneious and the consensus CDS sequences were then extracted. The consensus CDS sequences in the chloroplast genomes in gymnosperms and Alsophila spinulosa were aligned with the Linux version of MAFFT software [81].

Phylogenetic tree construction
A phylogenetic tree was constructed using MEGA7 software to identify the phylogenetic relationships among all of the tRNAs [82]. The model with the lowest Bayesian information criterion (BIC) score was selected as the best model for constructing a phylogenetic tree. Calculations using MEGA7 software showed that the K2 + G + I model had the lowest BIC score (50,455.017), and thus it was used to construct a phylogenetic tree based on gymnosperm chloroplast tRNAs. The other parameters used to construct the phylogenetic tree were: analysis, phylogeny reconstruction; statistical model, maximum likelihood; test of phylogeny, bootstrap method; no. of bootstrap replicates, 1000; substitution type, nucleotide; rates among sites, Gamma distributed with invariant sites (G + I); no. of discrete Gamma categories, 5; gaps/ missing data treatment, partial deletion; site coverage cutoff, 95%; and branch swap filter, very strong. The phylogenetic tree based on the consensus CDS sequences in the chloroplast genomes in gymnosperms and Alsophila spinulosa was constructed using RaxML via the CIPRES Science Gateway [83]. Node supports for the maximum likelihood analyses were estimated by performing 1000 bootstrap iterations.

Analysis of transitions and transversions
The tRNA gene sequences used to construct the phylogenetic tree were also employed to calculate the transition and transversion rates. All of the tRNA gene sequences were classified according to their different types, before calculating the transition and transversion rates. The same calculations were performed at the overall tRNA level. The calculations were performed using MEGA7 software [82]. The following parameters were used to calculate the transition and transversion rates: analysis, substitution pattern estimation (ML); tree to use, automatic (neighborjoining tree); statistical method, maximum likelihood; substitution type, nucleotide; model/method, Kimura2parameter model; rates among sites, Gamma distributed (G); no. of discrete Gamma categories, 5; gaps/missing data treatment, partial deletion, site coverage cutoff 95%; and branch swap filter, very strong.

Analysis of gene duplication and loss events
The phylogenetic trees based on the tRNA genes and species were reconciled in order to calculate duplication and loss events in tRNA genes. A species tree was constructed based on 54 gymnosperm species via the NCBI taxonomy server (https://www.ncbi.nlm.nih.gov/ Taxonomy/CommonTree/wwwcmt.cgi). The species tree and gene tree were reconciled using Notung 2.9 [84].