Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Codon Adaptation of Plastid Genes

  • Haruo Suzuki,

    Current address: Institute for Advanced Biosciences, Keio University, Fujisawa, Japan

    Affiliation Graduate School of Science and Engineering, Yamaguchi University, Yamaguchi, Japan

  • Brian R. Morton

    bmorton@barnard.edu

    Affiliation Department of Biology, Barnard College, Columbia University, New York, New York, United States of America

Abstract

Codon adaptation is codon usage bias that results from selective pressure to increase the translation efficiency of a gene. Codon adaptation has been studied across a wide range of genomes and some early analyses of plastids have shown evidence for codon adaptation in a limited set of highly expressed plastid genes. Here we study codon usage bias across all fully sequenced plastid genomes which includes representatives of the Rhodophyta, Alveolata, Cryptophyta, Euglenozoa, Glaucocystophyceae, Rhizaria, Stramenopiles and numerous lineages within the Viridiplantae, including Chlorophyta and Embryophyta. We show evidence that codon adaptation occurs in all genomes except for two, Theileria parva and Heicosporidium sp., both of which have highly reduced gene contents and no photosynthesis genes. We also show evidence that selection for codon adaptation increases the representation of the same set of codons, which we refer to as the adaptive codons, across this wide range of taxa, which is probably due to common features descended from the initial endosymbiont. We use various measures to estimate the relative strength of selection in the different lineages and show that it appears to be fairly strong in certain Stramenopiles and Chlorophyta lineages but relatively weak in many members of the Rhodophyta, Euglenozoa and Embryophyta. Given these results we propose that codon adaptation in plastids is widespread and displays the same general features as adaptation in eubacterial genomes.

Introduction

Codon usage bias, or the non-uniform representation of synonymous codons within a coding sequence, is a universal feature of genomes that arises from a combination of an underlying mutational bias and natural selection [13]. When considering codon usage bias a distinction must be made between the pattern, or ‘direction’, of codon bias–that is, the specific set of codons that are over-represented–and the degree of codon bias. Two genes can have the same degree of bias, measured as deviation from uniform representation of synonymous codons, but be biased towards a different set of codons and since mutation and selection can vary across a genome, genes within a genome can vary in both the degree and pattern of codon usage bias.

Mutation biases that shape genomic G+C content typically result in different genes within a genome displaying variation only in the degree of codon bias, not in the pattern of codon bias. In general, the underlying mutational process of a genome is biased either towards A+T or towards G+C and these two situations will lead to a pattern of codon usage bias in which the NNA and NNT codons (those with A or T at the third codon position) of all synonymous codon groups are over-represented or under-represented respectively. The degree to which this bias is observed in any individual gene within a genome can vary across the genome depending upon the variation in mutation bias across chromosome loci [4].

The contribution of natural selection to the codon usage of a gene can take two forms [5]. First, there can be selection at individual nucleotide sites that is independent of any protein-coding function of that site. We will consider this to be general background selection that can be folded into the mutational bias such that we can simply refer to the contribution of the substitution bias in a genome to codon usage. The second possible contribution of selection is dependent on the amino acid coding function of codons. In some genomes there is evidence that selection acts to increase the translation efficiency of certain genes by favoring a set of codons that are optimal for this process, with the implication that there are fitness differences between synonymous codons [2,6,7]. This action of selection is commonly thought to increase the representation of codons that yield the best trade-off between more rapid and more accurate translation by the available tRNA population in the cell [3,6,8,9], which could involve a co-evolution of codon usage and tRNA levels [10].

This second role for selection results in codon adaptation, which we define as an adaptation of the codon usage of a gene towards an increased representation of the codons that increase translation efficiency. These codons are referred to as adaptive codons. Codon adaptation has been observed in many organisms, particularly unicellular organisms [2,6]. The strength of this sort of selection varies amongst genes within a genome as a function of expression level with selection acting most strongly on highly expressed genes [2]. The result, in genomes where there is codon adaptation, is variation amongst genes in the pattern of codon usage as well as in the degree of bias towards the adaptive codons. In such genomes a large majority of genes show a codon usage pattern that can be largely attributed to the underlying substitution bias while a smaller number of highly expressed genes show a pattern of codon usage with an increased representation of adaptive codons. If codon adaptation and the substitution bias converge on the same codon usage pattern then adaptation can be more difficult to detect, but the two codon usage patterns are often distinguishable.

Given the influences of substitution bias and selection, or the general difference between degree and pattern of codon bias, we need to distinguish between codon adaptation and strict codon bias. Codon adaptation in the highly expressed, or highly translated, genes under strong selection will be manifest in a strong bias towards adaptive codons. If we consider a synonymous group with two codons, Cs and Ca, where Ca is the adaptive codon, if the substitution bias is such that it would lead to Cs >> Ca then selection can result in Ca >> Cs in highly translated genes. However, genes under weaker selection can have levels of codon adaptation in which the action of selection somewhat offsets the action of the substitution bias such that Ca ≅ Cs. In these cases there is codon adaptation since selection has led to an increased frequency of Ca but there is very little absolute codon bias. To account for the pattern of codon bias we can employ measures of codon usage, such as the Codon Adaptation Index (CAI, [11]), that measure the degree of bias towards a specified set of adaptive codons (such as Ca, which must be determined separately, typically from codon usage in highly expressed genes) as opposed to statistics, such as the Effective Number of Codons (ENC, [12]), which measure only the degree of deviation from uniform codon usage regardless of which codons are over-represented. Ca >> Cs and Cs >> Ca both have high codon bias (low ENC) but the latter will have a low CAI. Cs >> Ca will have a higher codon bias (lower ENC) than Ca ≅ Cs but the latter will have a higher CAI.

In this study we investigate codon adaptation across a wide taxonomic range of sequenced plastid genomes. The plastid genome is a small genome encoding a limited set of genes that are fully expressed within the organelle. Evidence indicates that plastids are descended from a single cyanobacteria-like endosymbiont with the green plants, red algae and Glaucophytes retaining the descendant of this primary endosymbiont and at least two subsequent secondary endosymbiotic events giving rise to plastids in other lineages such as the Euglenozoa, Alveolata, Stramenopiles, Cryptophyta and Haptophyceae [13,14]. Among the 601 of the 605 completely sequenced and annotated plastid genomes at National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) as of September 2014, genome size ranged from 29,623 base pairs (bp) to 521,168 bp, with a median of 152,968 bp, and the protein-coding sequence (CDS) number ranged from 21 to 273 with a median of 84. (The other four of the 605 sequenced plastid genomes had no annotated CDS, which may indicate incomplete annotation and so they are excluded from this analysis.) Across different lineages there is a general substitution bias towards A and T with the sequenced genomes showing a %G+C content ranging from 13.7 to 57.7 with a median of 37.2; of the sequenced genomes only 3 have a %G+C greater than 50%. The general pattern of codon usage in plastid genes reflects this substitution bias with a high representation of NNA and NNT codons [15]. However, the substitution process in flowering plant chloroplasts is known to be strongly context-dependent in that the mutational dynamics of any given nucleotide are influenced by the composition of nucleotides flanking that site [1618]. The result of this is that, despite a general bias towards NNA and NNT codons, the exact pattern of codon usage bias across synonymous codon groups is somewhat more complex in flowering plant chloroplasts [15,19]. Since it is not known if similar context effects exist in other plastids, the general bias towards NNA and NNT codons may mask similar complexity that we cannot consider in this study.

We perform an analysis of plastid gene codon usage in these plastid genomes with a focus on codon adaptation. Given that seed plant chloroplasts make up the vast majority of the sequenced genomes we limited their representation to a single dicot (Nicotiana tabacum), a single monocot (Oryza sativa) and a single gymnosperm (Pinus thunbergii) leaving a total of 103 genomes. We apply a uniform approach to all of the genomes and show evidence that highly translated genes, particularly psbA and rbcL, display codon adaptation in most plastid genomes and that selection favors the same, or a very similar, adaptive pattern of codon usage across all of the lineages. We also attempt to assess the strength of selection for codon adaptation, either in terms of the degree of bias towards the adaptive pattern or in the number of genes showing evidence for some degree of codon adaptation, across the plastid genomes. In general we find that Rhodophyta, with the exception of the Bangiales, and seed plants (represented in our study by Nicotiana tabacum, Oryza sativa, and Pinus thunbergii) have relatively low, but detectable, levels of codon adaptation while Stramenopiles and Chlorophyta in particular have relatively strong codon adaptation.

The results lead us to propose that all plastid codon usage can be largely explained by a single general model of substitution bias towards A+T and codon adaptation to the same set of codons in all plastids. This adaptation is based on the limited plastid tRNA population. The degree of adaptation towards these favored codons varies across genomes and across genes as a function of some aspect of gene expression, most likely translation efficiency. Although there are likely to be additional factors that make minor contributions to the variation in codon usage amongst genes within any specific genome, our data indicate that codon adaptation is a common feature of plastid genomes.

Materials and Methods

Software and Databases

Most analyses described below were implemented on the G-language Genome Analysis Environment version 1.9.0 [2022], available at http://www.g-language.org. The exception is the codon resampling test, which was performed using a Java Package written by BRM. Statistical computing and graph drawing were conducted with R version 3.1.2 [23], available at http://www.R-project.org.

A list of the 103 plastid genomes along with taxanomic information is given in S1 Table. All genome sequences were taken from NCBI FTP Site (ftp://ftp.ncbi.nlm.nih.gov/genomes/) in September 2014. tRNA genes encoded in the 38 plastid genomes were retrieved from the tRNA Gene DataBase Curated by Experts "tRNADB-CE" [24], available at http://trna.ie.niigata-u.ac.jp/.

Codon fitness values

Codon fitness values, also called the relative adaptiveness of codons [11], are typically calculated from the codon usage of highly expressed genes. Based on the fact that product of the psbA gene is the major translation product in chloroplast [25] we used the psbA gene from a small set of taxa to estimate codon fitness. For each codon, fitness is calculated by dividing the usage of that codon by the maximum usage within the synonymous group. A codon with no representation in the reference group is assigned a fitness value of 0.002 as in [11]. For any gene, or cumulative set of genes, the geometric mean of the codon fitness values of the codon usage table is the Codon Adaptation Index (CAI). The use of high expression genes to define fitness values results in a CAI that is a measure of adaptation for expression, more specifically translation efficiency [11] but more generally, CAI as an information statistic can be used to measure the degree of fit to any defined codon usage pattern. The codon fitness values used in this study are listed in S2 Table. The number of high expression genes used to estimate codon fitness values was limited so that the CAI value of a gene would not generally utilize codon fitness values derived from that same gene or from genes from the same genome. However, altering the set of high expression genes used to infer codon fitness values did not substantively alter any of the results (data not shown).

Codon resampling test

We applied a resampling method to each genome to test for genes with codon adaptation levels significantly above what would be generated by the genome substitution bias. The null hypothesis is that all genes within the genome have the same level of codon adaptation; this would be the level of adaptation arising from the genome substitution bias. Therefore, if variation in substitution bias is minimal across the small genome, then the codon usage in each gene would represent a random sample from the same codon distribution. To test this hypothesis the codons from every protein-coding sequence from a given genome were pooled. For every gene we generated a random codon usage by drawing with replacement from this pool until the sampled codons had the same amino acid usage as the gene and calculated CAI for this random codon usage using the codon fitness values described above. This resampling was repeated 1000 times for each gene to yield an expected distribution of CAI for that gene under the null hypothesis. A gene was rejected if the observed CAI was more than 3 standard deviations greater than expected.

Once we had run the resampling on every gene within a genome we repeated the test leaving out all genes rejected in the previous round. This was repeated until no genes were rejected in a round indicating that the level of codon adaptation in each of the remaining genes could be explained by the same codon usage. All genes rejected in this resampling were considered to have evidence for significant codon adaptation. The protocol was performed using a Java Package written by BRM.

Hierarchical clustering of genes based on codon usage

Dendrograms were constructed using hierarchical clustering (Unweighted Pair Group Method with Arithmetic mean; UPGMA) based on dissimilarity in codon usage between genes. Dissimilarity between two genes based on 59 variables of codon usage was measured using Pearson correlation distance (one minus Pearson product-moment correlation coefficient). Absolute codon frequencies (codon count data) were used for clustering genes putatively translated at high (psbA, rbcL, and psbC) and low (rps3, rps4, and rpoB) levels from 43 plastid genomes. To control for amino acid composition, codon usage data were normalized by dividing the usage of each codon by the maximum usage in each amino acid. The normalized codon usage data were used in the analysis of all psbA and psbN genes.

The strength of selection for codon adaptation (S)

Following [7] and [26], an S value was calculated for each plastid genome using the codon frequencies for four amino acids, Phe (Ca = TTC and Cs = TTT), Tyr (Ca = TAC and Cs = TAT), Ile (Ca = ATC and Cs = ATT), and Asn (Ca = AAC and Cs = AAT), where the two codons (Cs and Ca) are recognized by the same tRNA species but Ca is recognized more efficiently. The S is based on a comparison of codon frequencies within these synonymous groups between high expression genes and all other genes ([7] and [26]). Given the small number of genes coded by the plastid genome we used only psbA as the reference highly expressed gene to calculate S values for all 103 plastids. A second analysis used three genes, psbA, rbcL and psbC as the reference highly expressed genes and the value generated in this case referred to as S3.

Within-group correspondence analysis (WCA) of codon usage

WCA combines multivariate data into a small number of variables (axes) that explains most of the variation among the original variables [27,28]. In our study our variables are the 18 codons for 9 two-fold degenerate amino acids (C, D, E, F, H, K, N, Q, and Y by the single letter amino acid code) for each gene in a genome, and WCA yields the coordinates of each gene on each new axis.

Replication strand skew analysis

The degree of replication strand bias was measured by the GC skew index (GCSI) [29,30]. The GCSI was calculated with a window number of 256, considering that each window should contain at least 100 bp and that the genome sizes for the plastid genomes ranged from 29,623 to 521,168 bp. The GCSI can take values from 0 (no bias) to approximately 1 (high bias), and empirically a genome with a clear GC skew has a GCSI of > 0.1. For the plastid genomes with a GCSI over 0.1, the origin and terminus of DNA replication were predicted using cumulative GC skew [31] so that genes were located on the leading or lagging strand.

Results

Codon usage patterns in plastid genes

Previous analyses of a limited number of plastid genes showed evidence for two main patterns of codon usage, one of which was proposed to be a result of codon adaptation [15,25]. The two codon usage patterns are illustrated here using the codon usage tables from three widely separated taxa, the liverwort Marchantia polymorpha, the green alga Chlamydomonas reinhardtii and the red alga Porphyra purpurea (Table 1). One codon usage pattern is observed in the cumulative codon usage of each genome. This pattern is a general bias towards the NNA and/or NNT codons within each synonymous group and is consistent with the general bias across plastid genomes [15,19], likely due to a general substitution bias.

The highly expressed psbA gene displays a second general codon usage pattern as observed in Table 1. Although the psbA gene, which codes a highly translated core photosystem II protein, differs from the cumulative codon usage pattern of its genome, the codon usage patterns of the three psbA genes are similar to one another. The most obvious difference between the two codon usage patterns–that is, between the cumulative codon usages and the codon usage of the psbA genes—is in the two-fold degenerate NNY codon groups (i.e. two-fold degenerate codon groups with a third position pyrimidine). In these synonymous groups we see a relatively high frequency of the NNC codons in psbA when compared to the cumulative bias towards NNT (Table 1). Although this bias towards NNC is particularly obvious there are other differences between the two codon usage patterns; for example, unlike in the cumulative codon usage pattern the psbA genes do not utilize either CGG or CGA for Arginine, nor do they use GGG to code glycine, and there is a strong bias towards coding glycine with GGT instead of GGA.

To examine these two codon usage patterns across a broader spectrum of taxa, we constructed a dendrogram by clustering a set of high and low expression genes based on similarity of codon usage. The former set was composed of two highly translated genes, psbA, and rbcL, which codes the large subunit of RuBisCO [15,19,25], and a third gene, psbC, that codes a core component of photosystem II and so is likely to have a relatively high level of translation. For comparison to these we chose three plastid genes expected to have relatively low translation levels [15]. We clustered these genes from each of 43 genomes (Fig 1) that were randomly selected with the goal of having a representative of each Order while reducing the density of the cluster at the same time.

thumbnail
Fig 1. Codon usage patterns in plastid genes.

A cluster of three putative high-translation (psbA, rbcL, psbC in red) and three putative low-translation (rps3, rps4, rpoB in blue) genes from 43 plastid genomes selected to represent the major lineages (see text). Genes are clustered by similarity in codon usage as described in the Materials and Methods.

https://doi.org/10.1371/journal.pone.0154306.g001

The codon usages of the genes in Fig 1 show that the pattern of codon usage in the psbA genes in Table 1 is observed in high expression genes across a wide taxanomic range. Two major clusters of genes are apparent in Fig 1, one composed of the high translation genes and the other composed of the low translation genes. The differences in codon usage pattern between the low and high translation genes are the same as the differences between the codon usage patterns given in Table 1 (S3 Table). For simplification these differences are reduced to two general compositional differences, the increased frequency of NNC codons within twofold degenerate groups and the higher frequency of NNT codons in fourfold degenerate codon groups, and presented in Fig 2. As with the differences observed in Table 1, the psbA genes have a total of 66.4% NNC within the two-fold degenerate NNY groups while the low translation genes have a total of just 16.5% NNC in these groups. This increased frequency of NNC in the NNY codon groups is also apparent to a lesser degree in the high translation rbcL (41.4% NNC) and psbC (30.1% NNC) genes. In addition, this bias towards NNC in the genes within the high expression genes does not extend to the four-fold degenerate codon groups. Instead there is a lower frequency of NNC in these codon groups relative to the other genes (Fig 2). Consistent with previous proposals [15,25] we interpret this as evidence of codon adaptation in highly expressed genes across a wide range of taxa. This proposal will be discussed in detail below.

thumbnail
Fig 2. Third position composition patterns.

A plot of %C (C/[C+T]) base composition at two-fold degenerate and four-fold degenerate sites for the genes shown in Fig 1. Values are the cumulative base composition for each gene. For the low translation genes we show the cumulative composition of the rps3, rps4 and rpoB genes.

https://doi.org/10.1371/journal.pone.0154306.g002

Variation in levels of codon adaptation across plastid genomes

If the general codon usage pattern in the psbA genes in Table 1 and S3 Table is an adaptive pattern and selection favors the same adaptive pattern of codon usage across plastid lineages, then we can try estimate the relative strength of selection acting on different genomes by measuring the degree to which highly expressed genes are biased towards the adaptive pattern. The strength of selection could be manifest in the degree to which selection is increasing the codon adaptation of high expression/translation genes within a genome and/or in the number of genes that are under some level of selective pressure to increase the representation of adaptive codons above the level that would result from the substitution bias alone. We examined this using three different metrics that can measure one or the other, or both, of these two aspects of codon adaptation; a codon resampling test [15], the calculation of a genome selection coefficient, S, following [7] and comparison of CAI values based on the single set of codon fitness values as described in the Materials and Methods. Based on the three metrics we then put forward a single statistic, called Spca, to estimate relative selection strength within a genome. The full results of the analyses are given in S1 Table and the results of the three approaches are discussed separately below.

Codon resampling test.

The resampling test results are summarized in Tables 2 and 3. In this test, resampling with replacement from a combined genome pool of codons from all identified protein-coding sequences was used to estimate an expected level of codon adaptation for individual genes in that genome. As discussed in the Materials and Methods, rejection indicates that the level of adaptation is significantly higher than expected were the gene composed of codons representative of the genome-wide codon pool. The resampling test showed a strong trend towards rejecting highly expressed genes (Table 2) and there is striking similarity across genomes in what genes display a significantly increased CAI values, or codon adaptation. The psbA gene, the major translation product in plant chloroplasts, was rejected in the resampling test of every genome except Cyanidium caldarium [32], one of the few genomes in which rbcL was also not rejected. Other genes that were rejected in a majority of genomes were the small subunit of RuBisCo (rbcS), tufA, which codes a translation elongation factor [33], the cpeA and cpeB genes coding for the two subunits of the Phycoerythrin protein involved in light harvesting (see http://www.ncbi.nlm.nih.gov/gene/810008 and http://www.ncbi.nlm.nih.gov/gene/856988 respectively), psbC (discussed above), and another major gene for photosystem II psbD [34].

thumbnail
Table 2. Genes with highest rejection rate across genomes in the resampling analysis.

https://doi.org/10.1371/journal.pone.0154306.t002

thumbnail
Table 3. Genomes with the highest levels of rejection in the resampling analysis.

https://doi.org/10.1371/journal.pone.0154306.t003

Of the genes coded by at least 85 of the genomes we found the lowest rejection rates for psbN (1 of 97 genomes) a gene of unknown function associated with the biogenesis of both photosystems [35], rpoC2 (2 of 97 genomes) a component of the plastid RNA polymerase [36], ycf4 (2 of 93 genomes), which codes a non-essential assembly factor of photosystem I [37], rpl20 and rpl36 (3 of 98 each), both encoding ribosomal proteins, rpoA (3 of 95) and rpoC1 (3 of 91), both of which encode plastid RNA polymerase components [38], ccsA (3 of 86), which appears to encode a protein involved in heme attachment to chloroplast cytochromes [39], rpl23 (4 of 95), encoding a ribosomal protein and ycf3 (4 of 87), whose function is unknown but may be related to photosystem biogenesis [37].

Table 3 shows the number of genes rejected within each genome. The plastid genomes with the highest levels of rejection (i.e. fraction of genes rejected) are all from Chlorophytes, and the four plastid genomes with the highest rejection levels, Gonium, Chlamydomonas, Dunaliella and Pleodorina, are all from members of the Chlamydomonadales. These data indicate that within some lineages, particularly some of the green algae, a large number of plastid genes are under selection for codon adaptation. The plastid genomes with the lowest rejection rates are predominantly from Rhodophytes with notable exceptions being pine (Pinus thunbergii) and the non-photosynthetic Cryptomonas paramecium, which does not code a psbA gene. For the genomes with the lowest rejection rates the results indicate that essentially all genes are equivalent in terms of codon usage and it is most likely that their codon usage is determined almost exclusively by substitution bias. The few genes that are rejected in these genomes are genes such as psbA, rbcL, cpeA and cpeB that are widely rejected, and the rejection is indicative that they differ significantly from the cumulative codon pool. Two genomes, one from the bovine pathogen Theileria parva [40] and the other from the Chlorophyte Helicosporidium sp. [41], showed no evidence for significant codon adaptation in any gene. Theileria has a highly reduced genome of just 39,579 nucleotides and does not code any photosystem I or photosystem II gene, nor does it code rbcL. Helicosporidium is a parasitic green alga that also has a highly reduced genome of just 37,454 nucleotides. Like Theileria it does not code any photosystem genes or rbcL.

Genome S values.

The results of the S calculations are summarized in Table 4. S provides information on the strength of selection on those genes under selection as measured by the degree to which they differ from the overall genome codon usage. This is in contrast to the resampling which is more informative about the breadth, in terms of the number of genes affected, of selection.

thumbnail
Table 4. Genomes with the Lowest and Highest S Coefficients.

https://doi.org/10.1371/journal.pone.0154306.t004

Overall, the genomes with low S values are from Rhodophyta, including Cyanidium, which was the only genome that did not reject either psbA or rbcL in the resampling analysis, Euglenozoa and Embryophyta. Taxa with high S values are from a number of orders and include representatives of the Stramenopiles, Chlorophyta and basal lineages within the Streptophyta. Interestingly, the Rhodophyta have representatives with both low and high S values. The members of the Bangiales, represented here by Porphyra and Pyropia, have high S values while the other representatives of the Bangiophyceae—Cyanidioschyzon, Cyanidium and Galderia which are Cyanidiales, and Porphyridium which is a member of the Porphyridiales–as well as the members of the Florideophyceae have low S values.

One difficulty with using just the psbA gene to calculate S is that, although it is the best gene to use in terms of codon adaptation, the amino acid composition pattern of this gene results in an invalid S in several genomes. Therefore, we repeated the analysis using psbA, rbcL and psbC as the high translation genes (Table 5) to generate a second S value that we call S3. The genomes with the lowest S3 values are essentially the same as those with the lowest S values but the genomes with the highest S3 values are different. Since none of the 15 genomes with the highest S3 values had a valid S and some of the genomes with the highest S values did not have a valid S3 (S1 Table) the two sets of putative strong selection genomes are not directly comparable but we interpret either a high S or a high S3 as an indication of strong selection.

thumbnail
Table 5. Genomes with the Lowest and Highest S3 Coefficients1.

https://doi.org/10.1371/journal.pone.0154306.t005

Maximal Codon Adaptation Index (CAI).

Selection strength can also be assessed from the maximum CAI value observed for a gene in any given genome (Table 6), which is psbA in almost every genome (data not shown). As with the S statistic, the maximum CAI value provides information about the strength of selection on highly expressed genes but not about the number of genes that are under selection. Since all CAI values are based on the single set of codon fitness values (see Materials and Methods), they are comparable among different genes and genomes. Taxa with the highest maximal CAI values are predominantly members of the Chlorophyta, Stramenopiles and Alveolata. Genomes with low maximal CAI values are predominantly vascular plants and members of the Rhodophyta meaning that codon adaptation is relatively weak in these taxa. One interesting case is the Chlorophyte Trebouxiophyceae sp. which has a low maximal CAI. This genome also had a low rejection rate in the resampling analysis but it ranked relatively high in the S3 analysis. Of interest is that this is the only plastid genome in the analysis with a genome GC content over 50% (57.5%) and the only one with a GC3 content over 50% (63.5%). Since the calculation of S is based on the difference between genome composition and the composition of designated genes, the S3 result for this species is probably a result of its unusual genome composition.

A combined measurement of selection.

The three analyses presented above, the resampling test, the S values and the maximal CAI values, measure different aspects of codon adaptation. In an attempt to unify them we followed the example of [26] and generated a single summary statistic of codon adaptation by performing a principal component analysis on the percent rejection in the resampling test, S3 values and maximal CAI and taking the value along the first principal component for each genome. This summary statistic, which we call Spca, is given in S1 Table for all genomes with a valid S3. (S3 was chosen over S since more genomes had a valid S3 measurement.) The results are summarized in Table 7 and presented graphically in Fig 3, which shows the general strength of codon adaptation in the major plastid lineages.

thumbnail
Fig 3. Strength of codon adaptation across lineages.

A phylogeny of plastids with the strength of codon adaptation indicated for different lineages. Strength of selection is based on the Spca measure described in the text and given in S1 Table. An average value for the plastids within a given lineage greater than 1 is considered strong adaptation and an average value less than -1 is considered weak adaptation. A dashed line indicates variation within the genomes of that lineage. The primary endosymbiont is indicated as are the two proposed secondary events, one from green plant ancestors to the Euglenoids and another from the red algae ancestors to the lineage leading to extant Cryptophytes, Alveolates, Stramenopiles and Haptophytes (see text). Branches preceding the endosymbiosis are shaded black and indicate a lack of a plastid. The phylogeny overall is based on the general relationships from different sources [13,14].

https://doi.org/10.1371/journal.pone.0154306.g003

thumbnail
Table 7. Genomes with the strongest and weakest overall codon adaptation as measured by Spca.

https://doi.org/10.1371/journal.pone.0154306.t007

Correspondence analysis

Multivariate ordination analysis methods, e.g. principal components analysis and correspondence analysis, have been used to identify orthogonal axes that successively explain the most variation in codon usage among genes. We performed a correspondence analysis to compare to the codon adaptation results. Within-group Correspondence Analysis (WCA) can separate different directions of synonymous codon usage biases (as orthogonal axes) such as those towards NNA and NNT codons, or towards adaptive/favored codons [27,28]. Based on the adaptive pattern observed in the psbA genes in Table 1, we compared the first axis of a WCA to the C content at third codon positions over the nine two-fold degenerate amino acids. The results are shown for representative genomes with a low S value or a low rejection rate in the resampling test (Fig 4) and with high S values (Fig 5).

thumbnail
Fig 4. Within-group correspondence analysis (WCA) of codon usage for genomes with low S values.

WCA first component plot against %C at two-fold degenerate sites for four plastid genomes inferred to be under weak selection; Helicosporidium sp, Euglena gracilis, Chara vulgaris and Galdieria sulphuraria. Genes rejected in the resampling test are highlighted in red. Gene names are given based on NCBI annotation. Full taxon names from the NCBI annotation are given.

https://doi.org/10.1371/journal.pone.0154306.g004

thumbnail
Fig 5. Within-group correspondence analysis (WCA) of codon usage for genomes with high S values.

WCA first component plot against %C at two-fold degenerate sites for four plastid genomes inferred to be under strong selection; Chaetosphaeridium globosum, Dunaliella salina, Chlamydomonas reinhardtii and Pseudendoclonium akinetum. Genes rejected in the resampling test are highlighted in red. Gene names are given based on NCBI annotation. Full taxon names from the NCBI annotation are given.

https://doi.org/10.1371/journal.pone.0154306.g005

Overall, variation in NNC content is correlated with the primary axis in most genomes (data not shown) with the interesting exceptions being Theilaria and Helicosporidium, the highly reduced genomes discussed above for which no gene was rejected in our resampling test.

Genome structure

The GC skew index (GCSI) value ranges for different plastid genomes are summarized in Fig 6 and all values are presented in S1 Table. Seed plants (represented by Pinus thunbergii, Nicotiana tabacum and Oryza sativa in this study) are not included in the summary since their chloroplast genomes do not replicate from a single origin [42,43].

thumbnail
Fig 6. GC Skew in plastid genomes.

Box-and-whisker plots summarizing the GC skew index (GCSI) for plastid genomes of different lineages.

https://doi.org/10.1371/journal.pone.0154306.g006

We also tested whether or not genes tend to be coded on the leading strand. For the 68 plastid genomes with a GCSI over 0.1 the ratio of inferred leading strand genes to inferred lagging strand genes ranged from 0.46 to 1.00 with a median value of 0.70 (Fig 7). However, there was no indication that genes with high CAI values were coded on the leading strand (data not shown).

thumbnail
Fig 7. Gene distribution by strand in plastid genomes.

Box-and-whisker plots summarizing the distributions of the ratio of leading strand genes to the total number of genes for chloroplasts from nine phyla. A deviation from the ratio = 0.5 (red line) indicates that genes tend to be unevenly distributed between the leading and lagging strands of DNA replication.

https://doi.org/10.1371/journal.pone.0154306.g007

Discussion

Similar to what is observed in bacteria [44] low expression plastid genes display a codon usage with a general bias towards NNA and NNT codons, consistent with the general substitution bias towards A+T that is prevalent across plastid genomes [15,19]. In contrast, highly translated plastid genes, exemplified by psbA, across a wide range of taxa display a codon usage pattern that is distinct from low expression genes (Table 1, Figs 1 & 2). Since the bias towards C at the third codon position of some codon groups but towards T, away from both C and A, in other codon groups (Table 1), is not consistent with a general mutational model, some form of selection is required to explain the codon usage of these high expression genes such as psbA. We argue that this selection is for translation efficiency: the protein product of psbA undergoes oxidative damage during the light reaction and, as a result, is turned over at an extremely high rate making it the major translation product in chloroplasts [25,45]. This makes it likely that the predominant selective pressure generating this codon usage is to increase translation efficiency [15,25] and so we propose that the pattern of codon usage observed in the psbA genes in Table 1 and in the other high expression genes reflects codon adaptation in plastids.

This proposal concerning codon adaptation is also consistent with the match between the high frequency of NNC codons and the plastid tRNA population; each of the 38 plastid genomes listed in the tRNA database [24] (http://trna.ie.niigata-u.ac.jp/, see Table 1) has a single tRNA to translate the NNY codon group and in each case the anticodon is complementary to the NNC codon. Since there is no evidence for import of tRNAs from the cytoplasm [46] the tRNA population indicated in Table 1 represents those available for translation of plastid-encoded genes. This codon bias in highly expressed plastid genes is consistent with the proposal that codon selection would be expected to favor C over T in the TTY, TAY, ATY and AAY degenerate groups in all species [7]. The clustering of high translation genes from different taxa suggests that codon adaptation favors the same, or a very similar, set of adaptive codons across plastid genomes. Differences between psbA, rbcL and psbC in Fig 2 would then reflect different levels of codon adaptation based on different levels of translation.

Although a recent article [47] has claimed that plastid genes do not display codon adaptation, this analysis, and some earlier studies [15,25], have shown compelling evidence for codon adaptation in highly expressed plastid genes. The existence of codon adaptation, along with the context-dependent nature of substitutions in the chloroplast genome [1618], makes analyses of codon usage in plastid genomes more complex than is frequently considered. For example, a number of recent studies, particularly in flowering plants [4851], failed to account for codon adaptation and context-dependency with the result that their conclusions concerning selection are likely to be incorrect or only partially correct. The conclusions in these analyses concerning selection based on plots of Effective Number of Codon plotted against %G+C content at third codon positions (ENC-GC3 plot) were all based on the assumption that every synonymous site should display a uniform equilibrium A+T content. However, because of the context-dependent nature of substitutions [1618] the expected equilibrium A+T composition varies significantly across sites and this draws into question any conclusions in these studies about selection on codon usage. In another example, Guisinger et al. [52] analyzed chloroplast genomes from the family Geraniaceae using an ENC-GC3 plot and suggest that relaxed selection and/or mutational biases lead to increased G+C content, which in turn led to a change in codon usage bias. However, since this analysis fails to account for the context-dependency of equilibrium base frequencies the suggestion is unwarranted. Further, given the evidence presented here, analyses of plastid codon usage need to be performed within the context of codon adaptation on highly expressed or translated genes.

The evidence that different plastid lineages show the same adaptive codon usage pattern allows us to measure the degree of ‘fit’ of any gene to this pattern. Although this does not require that the underlying explanation is necessarily codon adaptation, we interpret the degree of fit as a measure of the level of codon adaptation, which would also represent the strength of selection, on a given gene. Therefore, comparisons of genes across genomes will give evidence for different levels of selection in different lineages.

No single measure captures all aspects of codon adaptation so three different metrics of selection strength were presented along with a proposed unifying measurement that we called Spca. The resampling test can measure the “breadth” of selection in terms of how many genes may be under selective pressure, but S and maximal CAI provide evidence for the strength of selection on the most highly expressed genes. One drawback to our measurement of S is the low number of genes in the plastid genome, which limits the number of reference genes that can be used. This results in many genomes having no valid S value, something that also limits calculations of Spca.

Despite these drawbacks, the data indicate that the Embryophyta and the Rhodophyta other than the Bangiales have relatively weak codon adaptation. In contrast, the evidence suggests that the Chlorophyceae clade of the Chlorophyta, Cyanophora paradox (the lone Glaucocystophyceae), the Dinophyceae (Alveolata) and the Bacillariophyta (Stramenopiles) are under relatively strong selection for codon adaptation. Noticeably, Helicosporidium sp., Theileria parva, which seem to be under very weak codon adaptation as discussed above, did not have valid Spca measures (and so are not given in Table 7). However, given the results of the individual metrics we would infer that these two genomes have among the weakest codon adaptation levels. In fact, given the data above we infer that there is virtually no codon adaptation in either of these two genomes. The mapping in Fig 3 indicates that strong codon adaptation is distributed widely making it difficult to infer whether or not it is an ancestral state but given the presence of at least weak codon adaptation in essentially every plastid genome it is likely that the primary endosymbiont had some degree of adaptation.

Variation across taxa in the level of codon adaptation could have a number of causes [3]. Although the plastid genomes share a relatively conserved set of genes it is likely that the translation level of a specific gene varies across genomes. The results strongly suggest that some genes, psbA in particular but also rbcL and a few others (Table 2), are highly translated across most, or all, taxa but other genes might have stronger variation. A broad analysis of translation rates across different genes and plastid lineages would be required to assess this, and to compare codon adaptation to translation level generally. This is not a trivial comparison. We have focused on translation level with respect to codon adaptation since selection on codon usage is most likely to act at the level of protein translation. However, it is not established that this is the only, or even predominant, aspect of gene expression that could influence codon adaptation and even the manner in which translation level might be related to codon usage could be extremely complex [3]. Considering this, and the observation that chloroplast protein levels are not significantly influenced by changes in transcript abundance [53], transcript abundance is probably not well correlated with codon adaptation [3]. Therefore, even though some transcript data are available for chloroplasts [54] they are not informative for this type of study. Overall, how the levels of codon adaptation we measure are related to gene expression will require a complex analysis of different stages of expression.

Another consideration is variation in effective population size (Ne). Species with lower Ne values, which would presumably include the seed plants, may have selective pressures that are equivalent to other species, such as green algae, but which are insufficient to overcome genetic drift. Since selection on codon bias involves very small selective differences between synonymous codons [55] we would only expect to find evidence for codon adaptation in species with fairly large effective population sizes. This would be similar to studies of bacteria showing that relatively recent reductions in effective population size have led to a relaxation of selective pressure on codon usage [2,7]. Selective pressures on seed plants may not be different than what exists in the Chlorophyta but a difference in Ne would lead to the observed difference in levels of codon adaptation. Overall, the variation in codon adaptation that we observe is likely to result from a combination of all of these factors.

The evidence from our WCA is consistent with the other analyses of codon adaptation. There is a consistent separation of psbA along the two axes except for certain genomes that show evidence for relatively weak selection such as Cyanidium caldarium, Euglena gracilis, Chara vulgaris, Gracilaria salicornia, Euglenaformis proxima, and Galdieria sulphuraria. Interestingly, the genes psbJ and petL stand out along the primary axis in several genomes. Both of these genes are very short (amino acid length ranges from 39 to 50 for psbJ, and from 27 to 62 for petL) and so, although it is possible that they are under relatively strong codon adaptation in some genomes, codon sampling error (due to the short gene length) makes it difficult to be conclusive.

In each of the WCA plots (Fig 4 and Fig 5) the genes rejected by the resample study in each genome are highlighted. These genes do no generally display a complete separation from the other genes along the primary axis. This supports the data from Table 1 and Fig 2 that indicated that codon adaptation must involve more than just the NNC codons of two-fold degenerate groups. As stated above, the NNC codons are useful as an estimate of codon adaptation but they are not a complete description. The codon usage of different clusters in Fig 1 show that the usage of NNT codons in four-fold degenerate groups is correlated with the bias towards NNC codons of two-fold degenerate groups in high expression genes (Fig 2). Therefore, although the NNC codons are not a complete measure of codon adaptation, the WCA results strongly support the general proposal concerning codon adaptation across plastid genes.

We also compared codon adaptation to aspect of genome organization or structure. No relationship is observed between our measures of the strength of selection on codon usage, covered above, and degree of skew: GCSI is not correlated with percentage rejection in the resampling, with S or with maximum CAI (data not shown). The lineages with the strongest skews in Fig 6 are the Euglenazoa (such as E. gracilis), which have weak codon adaptation, and the Alveolata. The Chlorophyta, which have very strong adaptation, show very little skew, lower than the Rhodophyta, which are generally under weak adaptation. The one group of Rhodophyta with strong adaptation, the Bangiales, has relatively low skew. Overall there is no discernible relationship between skew and codon adaptation. Many prokaryotic genomes display asymmetric base composition between the leading and lagging strands of replication [5659] and some have a tendency to code highly expressed genes on the leading strand [5961]. Strand asymmetry, or skew, in composition and/or gene content is also observed in some plastid genomes [41,62,63] including E. gracilis [64] although no systematic analysis of genome skew across different plastids has been performed. The evidence here that plastid genes with high CAI values are not preferentially coded on the leading strand could be due to the fact that high expression genes are under strong selection for codon usage because of pressures at the translation level, not the transcription level.

Conclusions

Based on the data we propose that a single, general, model can explain the vast majority of the variation in codon usage across plastid genomes. Codon usage is determined by a substitution bias towards A+T and selection for adaptive codons that are favored by selection for translation efficiency, as with the basic model presented for bacteria [7]. Given the similarity in the codon usage of the psbA gene from different plastid genomes, the evidence indicates that although there is variation in the strength of selection across genomes and genes, the set of adaptive codons is the same across all plastid genomes, possibly as a result of similar tRNA gene contents (see S4 Table).

Supporting Information

S1 Table. Taxonomic information and summary statistics for the 103 genomes analyzed in this study.

https://doi.org/10.1371/journal.pone.0154306.s001

(XLSX)

S2 Table. Codon fitness values, as defined in the Materials and Methods, used to calculate CAI values.

https://doi.org/10.1371/journal.pone.0154306.s002

(XLSX)

S3 Table. Cumulative codon usages for the high and low expression genes clustered in Fig 1.

https://doi.org/10.1371/journal.pone.0154306.s003

(DOCX)

S4 Table. tRNA genes coded in the curated 38 plastid genomes as indicated in the Materials and Methods.

Presence (1) or absence (0) of a complementary tRNA is indicated. Some genomes contain multiple tRNA genes with the same anticodon sequence but these are not enumerated.

https://doi.org/10.1371/journal.pone.0154306.s004

(XLSX)

Author Contributions

Conceived and designed the experiments: HS BM. Performed the experiments: HS BM. Analyzed the data: HS BM. Contributed reagents/materials/analysis tools: HS BM. Wrote the paper: HS BM. Designed software used in analysis: HS BM.

References

  1. 1. Bulmer M (1991) The selection-mutation-drift theory of synonymous codon usage. Genetics 129: 897–907. pmid:1752426
  2. 2. Sharp PM, Emery LR, Zeng K (2010) Forces that influence the evolution of codon bias. Philos Trans R Soc Lond B Biol Sci 365: 1203–1212. pmid:20308095
  3. 3. Plotkin JB, Kudla G (2011) Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet 12: 32–42. pmid:21102527
  4. 4. Ermolaeva MD (2001) Synonymous codon usage in bacteria. Curr Issues Mol Biol 3: 91–97. pmid:11719972
  5. 5. Hershberg R, Petrov DA (2008) Selection on codon bias. Annu Rev Genet 42: 287–299. pmid:18983258
  6. 6. Ikemura T (1985) Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol 2: 13–34. pmid:3916708
  7. 7. Sharp PM, Bailes E, Grocock RJ, Peden JF, Sockett RE (2005) Variation in the strength of selected codon usage bias among bacteria. Nucleic Acids Res 33: 1141–1153. pmid:15728743
  8. 8. Akashi H (1994) Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136: 927–935. pmid:8005445
  9. 9. Rocha EP (2004) Codon usage bias from tRNA's point of view: redundancy, specialization, and efficient decoding for translation optimization. Genome Res 14: 2279–2286. pmid:15479947
  10. 10. Bulmer M (1987) Coevolution of codon usage and transfer RNA abundance. Nature 325: 728–730. pmid:2434856
  11. 11. Sharp PM, Li WH (1987) The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15: 1281–1295. pmid:3547335
  12. 12. Wright F (1990) The 'effective number of codons' used in a gene. Gene 87: 23–29. pmid:2110097
  13. 13. Keeling PJ (2004) Diversity and evolutionary history of plastids and their hosts. Am J Bot 91: 1481–1493. pmid:21652304
  14. 14. Reyes-Prieto A, Weber AP, Bhattacharya D (2007) The origin and establishment of the plastid in algae and plants. Annu Rev Genet 41: 147–168. pmid:17600460
  15. 15. Morton BR (1998) Selection on the codon bias of chloroplast and cyanelle genes in different plant and algal lineages. J Mol Evol 46: 449–459. pmid:9541540
  16. 16. Morton BR, Clegg MT (1995) Neighboring base composition is strongly correlated with base substitution bias in a region of the chloroplast genome. J Mol Evol 41: 597–603. pmid:7490774
  17. 17. Morton BR, Oberholzer VM, Clegg MT (1997) The influence of specific neighboring bases on substitution bias in noncoding regions of the plant chloroplast genome. J Mol Evol 45: 227–231. pmid:9302315
  18. 18. Zheng T, Ichiba T, Morton BR (2007) Assessing substitution variation across sites in grass chloroplast DNA. J Mol Evol 64: 605–613. pmid:17541677
  19. 19. Morton BR (2003) The role of context-dependent mutations in generating compositional and codon usage bias in grass chloroplast DNA. J Mol Evol 56: 616–629. pmid:12698298
  20. 20. Arakawa K, Mori K, Ikeda K, Matsuzaki T, Kobayashi Y, Tomita M (2003) G-language Genome Analysis Environment: a workbench for nucleotide sequence data mining. Bioinformatics 19: 305–306. pmid:12538262
  21. 21. Arakawa K, Tomita M (2006) G-language System as a platform for large-scale analysis of high-throughput omics data. J Pesticide Sci 31: 282–288.
  22. 22. Arakawa K, Suzuki H, Tomita M (2008) Computational Genome Analysis Using The G-language System. Genes, Genomes and Genomics 2: 1–13.
  23. 23. R_Core_Team (2012) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
  24. 24. Abe T, Inokuchi H, Yamada Y, Muto A, Iwasaki Y, Ikemura T (2014) tRNADB-CE: tRNA gene database well-timed in the era of big sequence data. Front Genet 5: 114. pmid:24822057
  25. 25. Morton BR (1993) Chloroplast DNA codon use: evidence for selection at the psb A locus based on tRNA availability. J Mol Evol 37: 273–280. pmid:8230251
  26. 26. Vieira-Silva S, Rocha EP (2010) The systemic imprint of growth and its uses in ecological (meta)genomics. PLoS Genet 6: e1000808. pmid:20090831
  27. 27. Charif D, Thioulouse J, Lobry JR, Perriere G (2005) Online synonymous codon usage analyses with the ade4 and seqinR packages. Bioinformatics 21: 545–547. pmid:15374859
  28. 28. Suzuki H, Brown CJ, Forney LJ, Top EM (2008) Comparison of correspondence analysis methods for synonymous codon usage in bacteria. DNA Res 15: 357–365. pmid:18940873
  29. 29. Arakawa K, Tomita M (2007) The GC skew index: a measure of genomic compositional asymmetry and the degree of replicational selection. Evol Bioinform Online 3: 159–168. pmid:19461976
  30. 30. Arakawa K, Suzuki H, Tomita M (2009) Quantitative analysis of replication-related mutation and selection pressures in bacterial chromosomes and plasmids using generalised GC skew index. BMC Genomics 10: 640. pmid:20042086
  31. 31. Arakawa K, Tomita M (2012) Measures of compositional strand bias related to replication machinery and its applications. Curr Genomics 13: 4–15. pmid:22942671
  32. 32. Glockner G, Rosenthal A, Valentin K (2000) The structure and gene repertoire of an ancient red algal plastid genome. J Mol Evol 51: 382–390. pmid:11040290
  33. 33. Hwang S, Kawazoe R, Herrin DL (1996) Transcription of tufA and other chloroplast-encoded genes is controlled by a circadian clock in Chlamydomonas. Proc Natl Acad Sci U S A 93: 996–1000. pmid:8577775
  34. 34. Tsunoyama Y, Ishizaki Y, Morikawa K, Kobori M, Nakahira Y, Takeba G, et al. (2004) Blue light-induced transcription of plastid-encoded psbD gene is mediated by a nuclear-encoded transcription initiation factor, AtSig5. Proc Natl Acad Sci U S A 101: 3304–3309. pmid:14976253
  35. 35. Krech K, Fu HY, Thiele W, Ruf S, Schottler MA, et al. (2013) Reverse genetics in complex multigene operons by co-transformation of the plastid genome and its application to the open reading frame previously designated psbN. Plant J 75: 1062–1074. pmid:23738654
  36. 36. Chen Z, Schertz KF, Mullet JE, DuBell A, Hart GE (1995) Characterization and expression of rpoC2 in CMS and fertile lines of sorghum. Plant Mol Biol 28: 799–809. pmid:7640353
  37. 37. Krech K, Ruf S, Masduki FF, Thiele W, Bednarczyk D, Albus CA, et al. (2012) The plastid genome-encoded Ycf4 protein functions as a nonessential assembly factor for photosystem I in higher plants. Plant Physiol 159: 579–591. pmid:22517411
  38. 38. Little MC, Hallick RB (1988) Chloroplast rpoA, rpoB, and rpoC genes specify at least three components of a chloroplast DNA-dependent RNA polymerase active in tRNA and mRNA transcription. J Biol Chem 263: 14302–14307. pmid:3049574
  39. 39. Xie Z, Merchant S (1996) The plastid-encoded ccsA gene is required for heme attachment to chloroplast c-type cytochromes. J Biol Chem 271: 4632–4639. pmid:8617725
  40. 40. Gardner MJ, Bishop R, Shah T, de Villiers EP, Carlton JM, Hall N, et al. (2005) Genome sequence of Theileria parva, a bovine pathogen that transforms lymphocytes. Science 309: 134–137. pmid:15994558
  41. 41. de Koning AP, Keeling PJ (2006) The complete plastid genome sequence of the parasitic green alga Helicosporidium sp. is highly reduced and structured. BMC Biol 4: 12. pmid:16630350
  42. 42. Kunnimalaiyaan M, Nielsen BL (1997) Fine mapping of replication origins (oriA and oriB) in Nicotiana tabacum chloroplast DNA. Nucleic Acids Res 25: 3681–3686. pmid:9278490
  43. 43. Nielsen BL, Cupp JD, Brammer J (2010) Mechanisms for maintenance, replication, and repair of the chloroplast genome in plants. J Exp Bot 61: 2535–2537. pmid:20513654
  44. 44. Hershberg R, Petrov DA (2010) Evidence that mutation is universally biased towards AT in bacteria. PLoS Genet 6: e1001115. pmid:20838599
  45. 45. Mullet JE, Klein RR (1987) Transcription and RNA stability are important determinants of higher plant chloroplast RNA levels. EMBO J 6: 1571–1579. pmid:16453773
  46. 46. Pfitzinger H, Weil JH, Pillay DT, Guillemaut P (1990) Codon recognition mechanisms in plant chloroplasts. Plant Mol Biol 14: 805–814. pmid:2102858
  47. 47. Sugiura M (2014) Plastid mRNA translation. Methods Mol Biol 1132: 73–91. pmid:24599847
  48. 48. Xing ZB, Cao L, Zhou M, Xiu LS (2013) [Analysis on codon usage of chloroplast genome of Eleutherococcus senticosus]. Zhongguo Zhong Yao Za Zhi 38: 661–665. pmid:23724670
  49. 49. Nair RR, Nandhini MB, Monalisha E, Murugan K, Sethuraman T, Ganesh D (2012) Synonymous codon usage in chloroplast genome of Coffea arabica. Bioinformation 8: 1096–1104. pmid:23251044
  50. 50. Sablok G, Nayak KC, Vazquez F, Tatarinova TV (2011) Synonymous codon usage, GC(3), and evolutionary patterns across plastomes of three pooid model species: emerging grass genome models for monocots. Mol Biotechnol 49: 116–128. pmid:21308422
  51. 51. Liu Q, Xue Q (2005) Comparative studies on codon usage pattern of chloroplasts and their host nuclear genes in four plant species. J Genet 84: 55–62. pmid:15876584
  52. 52. Guisinger MM, Kuehl JV, Boore JL, Jansen RK (2011) Extreme reconfiguration of plastid genomes in the angiosperm family Geraniaceae: rearrangements, repeats, and codon usage. Mol Biol Evol 28: 583–600. pmid:20805190
  53. 53. Eberhard S, Drapier D, Wollman FA (2002) Searching limiting steps in the expression of chloroplast-encoded proteins: relations between gene copy number, transcription, transcript abundance and translation rate in the chloroplast of Chlamydomonas reinhardtii. Plant J 31: 149–160. pmid:12121445
  54. 54. Lee J, Kang Y, Shin SC, Park H, Lee H (2014) Combined analysis of the chloroplast genome and transcriptome of the Antarctic vascular plant Deschampsia antarctica Desv. PLoS One 9: e92501. pmid:24647560
  55. 55. Hartl DL, Moriyama EN, Sawyer SA (1994) Selection intensity for codon bias. Genetics 138: 227–234. pmid:8001789
  56. 56. Lobry JR (1996) Asymmetric substitution patterns in the two DNA strands of bacteria. Mol Biol Evol 13: 660–665. pmid:8676740
  57. 57. Necsulea A, Lobry JR (2007) A new method for assessing the effect of replication on DNA base composition asymmetry. Mol Biol Evol 24: 2169–2179. pmid:17646257
  58. 58. Morton RA, Morton BR (2007) Separating the effects of mutation and selection in producing DNA skew in bacterial chromosomes. BMC Genomics 8: 369. pmid:17935620
  59. 59. Rocha EP (2008) The organization of the bacterial genome. Annu Rev Genet 42: 211–233. pmid:18605898
  60. 60. McLean MJ, Wolfe KH, Devine KM (1998) Base composition skews, replication orientation, and gene orientation in 12 prokaryote genomes. J Mol Evol 47: 691–696. pmid:9847411
  61. 61. Wu H, Qu H, Wan N, Zhang Z, Hu S, Yu J (2012) Strand-biased gene distribution in bacteria is related to both horizontal gene transfer and strand-biased nucleotide composition. Genomics Proteomics Bioinformatics 10: 186–196. pmid:23084774
  62. 62. de Cambiaire JC, Otis C, Lemieux C, Turmel M (2006) The complete chloroplast genome sequence of the chlorophycean green alga Scenedesmus obliquus reveals a compact gene organization and a biased distribution of genes on the two DNA strands. BMC Evol Biol 6: 37. pmid:16638149
  63. 63. de Cambiaire JC, Otis C, Turmel M, Lemieux C (2007) The chloroplast genome sequence of the green alga Leptosira terrestris: multiple losses of the inverted repeat and extensive genome rearrangements within the Trebouxiophyceae. BMC Genomics 8: 213. pmid:17610731
  64. 64. Morton BR (1999) Strand asymmetry and codon usage bias in the chloroplast genome of Euglena gracilis. Proc Natl Acad Sci U S A 96: 5123–5128. pmid:10220429