Codon bias and the folding dynamics of the cystic fibrosis transmembrane conductance regulator

Synonymous or silent mutations are often overlooked in genetic analyses for disease-causing mutations unless they are directly associated with potential splicing defects. More recent studies, however, indicate that some synonymous single polynucleotide polymorphisms (sSNPs) are associated with changes in protein expression, and in some cases, protein folding and function. The impact of codon usage and mRNA structural changes on protein translation rates and how they can affect protein structure and function is just beginning to be appreciated. Examples are given here that demonstrate how synonymous mutations alter the translational kinetics and protein folding and/or function. The mechanism for how this occurs is based on a model in which codon usage modulates the translational rate by introducing pauses caused by nonoptimal or rare codons or by introducing changes in the mRNA structure, and this in turn influences co-translational folding. Two examples of this include the multidrug resistance protein (p-glycoprotein) and the cystic fibrosis transmembrane conductance regulator gene (CFTR). CFTR is also used here as a model to illustrate how synonymous mutations can be examined using in silico predictive methods to identify which sSNPs have the potential to change protein structure. The methodology described here can be used to help identify “non-silent” synonymous mutations in other genes.

and then discuss how in silico methods can be used to help identify these types of mutations.
Amyotrophic lateral sclerosis (ALS) or Lou Gehrig's disease is a rapidly progressive fatal neurological disease with unclear etiology, although mutations in copper/zinc superoxide dismutase (SOD1) are often associated with this disease. Michael Strong and colleagues demonstrated that wild type SOD1 mRNA forms ribonucleoprotein complexes with protein homogenates of neuronal tissues that stabilize the SOD1 mRNA, whereas mRNAs containing ALS missense mutations fail to form these complexes and subsequently have less stable mRNA [6]. More interestingly, 4 silent mutations that have been identified in ALS, Gly11 (C/T) [14], Ser60 (T/C; rs373888553) [15], Thr117 (A/G) [16], and Ala141 (T/A; rs143100660) [15], and all of these fail to form the ribonucleoprotein complexes in a manner similar to that seen for the missense mutations [6]. The results from these studies indicate that loss of ribonucleoprotein binding results in a loss of mRNA stability. This illustrates an interesting and unexpected mechanism for how a synonymous mutation can affect protein expression levels.
Another example is found in the catechol-O-methyltransferase (COMT) gene. COMT regulates pain perception and there are three haplotypes of the COMT gene that are associated with pain perception and developing temporomandibular joint disorder [17]. The three haplotypes are made up of four SNPs, one located in the promoter region, and the other 3 in coding regions. Two are synonymous mutations, one at His62 (C/T; rs4633) and another at Leu136 (C/G; rs4818), and the third is a missense mutation, Val158Met (A/G; rs4680) [7]. Interestingly, the synonymous changes account for the largest change in enzyme activity [7]. Diatchenko and colleagues focused on understanding how these synonymous mutations caused this loss of protein expression and suggested that the differences were associated with mRNA secondary structural differences. To test this idea, they analyzed the mRNA secondary structures with the predictive algorithms Mfold [18] and Afold [19]. This study also illustrated that the in silico predictive methods for mRNA folding were confirmed by using site-directed mutagenesis to destabilize the predicted stem-loop structures. This manipulation destabilized the predicted mRNA structures and resulted in an increase in protein and enzyme activities, establishing that very stable mRNA secondary structures or those less likely to unfold easily during translation are associated with less translated protein [19].
Even in the cancer genomics field, the role of synonymous mutations is now beginning to be appreciated. Yardena Samuels and colleagues identified somatic mutations in 29 melanoma samples and found an interesting synonymous mutation in the Bcl-2-like protein 12 (BCL2L12) gene that is as an anti-apoptotic factor (a C to T change at position 51 (F17F)) [8]. This mutation leads to increased BCL2L12 mRNA and protein levels. In characterizing this silent mutation, they found that this mutation occurred in 10 of 256 melanomas and that the elevations in mRNA and protein were not due to splicing or translation changes or to changes in protein stability [8]. They found that the mutation causes an accumulation in mRNA and protein and this promoted antiapoptotic signaling in the melanoma cells [8]. Analysis of the mechanism involved revealed the surprising finding that this synonymous mutation elevated mRNA levels because of the differential targeting of wild type and mutant BCL2L12 by hsa-miR-671-5p [8]. Interestingly, this type of effect was seen previously between synonymous mutations and altered miRNA binding in the immunity-related GTPase family M (IRGM) gene in Crohn's disease [20].
An extremely thorough and interesting examination of the multidrug resistance 1 (MDR1) gene indicated that a synonymous SNP in this gene altered the drug and inhibitor interactions in the gene product, P-glycoprotein [1]. P-glycoprotein is an ATP-driven efflux pump that contributes to the multidrug resistance of cancer cells. One particular synonymous SNP, C3435T, that was a part of a common haplotype was associated with altered P-glycoprotein activity, and Chava Kimchi-Sarfaty analyzed this mutation in detail in a broad range of cell lines and found no change in the level of mRNA or protein in cells expressing this sSNP [1]. Their results, however, indicated that this sSNP introduced a rare codon that altered the protein structure and function, suggesting that translation was altered in the presence of the rare codon [1].
Perhaps the biggest reason that synonymous mutations are often overlooked is that the vast majority of them, at least in most cases, are functionally neutral. In a study on the human dopamine receptor D2 gene (DRD2), however, Gejman and colleagues examined the functional properties of six known naturally occurring synonymous mutations and surprisingly found that two had functional effects [9]. C957T was predicted to alter the mRNA folding and this affected the mRNA stability and translation, and importantly, a weakened response to dopamine-induced up-regulation of DRD2 [9]. The other synonymous mutation, G1101, did not have an effect on its own, but did block the effects of the C957T mutation in the compound clone C957T/G1101A, demonstrating that compound synonymous mutations can have unexpected consequences [9]. Given that dopamine receptors are drug targets in the therapies of schizophrenia, Parkinson's and Huntington's diseases [9], the importance in analyzing synonymous mutations in this gene are obvious.
In human congenital heart disease, there are known mutations in cardiac-specific transcription factor genes that impact protein function and the NK2 transcription factor related, locus 5 gene (NKX2-5) provides a good example [10] of this. In NKX2-5, more than 40 mutations have been identified in congestive heart failure patients. In a recent study by Jurgen Borlak and colleagues, they analyzed cardiac biopsies of 28 patients and identified a missense mutation in the NKX2-5 gene, A119E, along with two synonymous mutations in-cis, c.543G > A (Q181Q) and c.63A > G (E21E) [10]. In vitro functional analyses of the transcriptional activities of NKX2-5 using reporter plasmids revealed that the A119E mutation resulted in as much as a 40 % reduction in activity, and the addition of one or two synonymous mutations reduced the transcriptional activities even further, suggesting that the synonymous mutations exacerbated the phenotype of the missense A119E mutation [10]. Furthermore, using the Vienna RNA folding algorithm for predicting mRNA structure, the authors found that the mRNA secondary structure A119E mutant differed from wild type mRNA and that the addition of the two synonymous mutations changed the structure even further [10]. These studies suggest that in some cases, synonymous mutations, while perhaps not causal in and of themselves, can exacerbate the effects of a missense mutation [10].

The cystic fibrosis transmembrane conductance regulator (CFTR) gene and the F508del mutation
Prior studies by Bebok and colleagues found a similar effect in the CFTR gene [12]. We examined a synonymous mutation in the most common mutation in the CFTR gene, F508del, an out-of-frame deletion of phenylalanine that creates a synonymous mutation for isoleucine at position 507 [12]. The human CFTR gene is particularly interesting given that it codes for a protein that is highly sensitive to co-translational folding [1,[21][22][23]. CFTR is a chloride and bicarbonate channel and key regulator of epithelial functions [24][25][26][27]. Mutations in the CFTR gene lead to reduced or dysfunctional CFTR protein and cause cystic fibrosis (CF), a generalized exocrinopathy affecting multiple organs, but is most notably associated with lung disease [28].
The CFTR protein consists of a modular structure composed of two membranespanning domains (MSD1 and MSD2, each comprising six transmembrane regions), two nucleotide binding domains (NBD1 and NBD2), and a unique domain among ATP-binding cassette (ABC) transporters called the regulatory domain (R) ( Fig. 1) [24]. NBD1 and NBD2 participate in ATP binding and hydrolysis, while phosphorylation of the R domain regulates channel gating [24]. Achieving the proper conformation of the individual domains and interactions between these domains during protein synthesis is critical for proper CFTR assembly [21,22,29,30].
In order to reach its native tertiary structure, CFTR molecules undergo complex hierarchical folding processes and posttranslational modifications [31][32][33]. The rate of wild type and F508del CFTR translation in transfected human embryonic kidney (HEK 293) cells has been calculated to be 2.7 residues per second based on an average translation rate of 9.2 min [34]. This is slower compared to the average translation rate for other proteins of 4-5 residues per second [35], suggesting that the CFTR translation rate is unusual [21]. CFTR folding begins co-translationally [21], and is completed post- Fig. 1 A schematic model of the proposed structure of the cystic fibrosis transmembrane conductance regulator (CFTR) using RasMol 2.7.5.2 (http://www.openrasmol.org) based on the RSCB PDB database coordinates deposited for the human CFTR. The domains model are based on the data published by [57] translationally [22,31,32]. Du et al. (2009) suggest that the individual domains achieve loosely folded conformations co-translationally, but the final native tertiary structure requires completion of the proper domain interactions [22]. CFTR domain assembly has been estimated to take~30-120 min [36].
Since CFTR translation and folding occur simultaneously, the choice of codons can affect the translational kinetics during protein synthesis, but how that is linked to protein folding is just beginning to be appreciated [37]. Translational kinetics are believed to be controlled, at least in part, by optimal and non-optimal codons (reviewed in [37]). Optimal codons are postulated to be translated faster, whereas non-optimal are translated slower, with non-optimal codons strategically placed to slow translation and promote co-translational folding. Given the co-translation folding of CFTR and its slow translation rate, we investigated how codon usage is predicted to influence CFTR's translational rate based on its utilization of optimal and non-optimal codons, and how these changes are predicted to affect the co-translational folding within the individual domains of CFTR. We also analyzed the known CFTR sSNPs that have been identified in order to predict how they might affect the CFTR translational kinetics, mRNA structure, and the co-translational protein folding changes.
In silico predictive methods for identifying synonymous mutations that impact protein function Highly expressed genes often contain codons that are recognized by the most abundant tRNAs and are considered optimal or fast since they are translated faster [38]. CFTR, on the other hand, is expressed at extremely low amounts, and the translation rate appears to be slower than average [34]. Complex proteins generally utilize rare codons that often localize at strategic domain-domain interfaces [39], and these rare codons (or clusters of rare codons) promote ribosome pauses that may contribute to changes in the folding pathways [39]. Since CFTR is a complex and multi-domain transmembrane protein with distinct transmembrane and cytoplasmic regions, we examined the composition of codons used in human CFTR and determined whether the codons were optimal or rare, and how their placement corresponded to predicted secondary structures and CFTR domain organization.
In order to identify the predicted fast and slow translating regions in CFTR, we used the relative synonymous codon usage (RSCU) method [40][41][42][43] to calculate the potential codon impact on the translation rate (for methods, see Additional file 1). The analysis reveals that CFTR's codon bias clearly consists of fast and slow translating regions, while the N-terminal transmembrane MSD1 domain shows the highest content of slow translating codons (Fig. 2a, negative log RSCU numbers, that were compared to the entire CFTR molecule that was normalized to 1). This is particularly evident at the end of MSD1, which is critical to endoplasmic reticulum-associated degradation (ERAD) escape and also the location responsible for binding to the CFTR misfolding corrector drug, VX-809 [44]. The log RSCU of the individual CFTR domains is shown in Fig. 2b. All of these regions were compared to the entire CFTR molecule that was normalized to 1. The results shown in Fig. 2b indicate that transmembrane MSD1 and MSD2 are predicted to be the slowest translated regions in CFTR (~5 fold slower than the mean rate of CFTR). Other regions predicted to be translated slowly include the sequences between MSD1 and NBD1 (MSD1/NBD1) and between MSD2 and NBD2 (MSD2/NBD2), and the carboxy-terminal tail region (C'). The RSCU predictions also indicate that the N terminal region (N'), NBD1 and NBD2, the region between NBD1 and the R domain (NBD1/R), and the R domain are translated significantly faster than the CFTR average. The sequence between NBD1 and R domain has the highest log RSCU value in CFTR (Fig. 2b). These data predict that CFTR translation starts relatively fast, slows down while forming the MSD1 domain and then speeds up again during synthesis of the NBD1 and R domains. The MSD2 domain translation is slow, and the slowest predicted translational rate is at the Fig. 2 a The distribution of optimal and rare codons in CFTR. The logarithm transformed moving median of RSCU values (3-amino acid window) suggests the presence of slow/nonoptimal (negative log RSCU values) and fast/optimal (positive) translated patches within the CFTR primary structure. The amino acid medians were normalized to whole CFTR median RSCU (value 1). The CFTR domain location is marked above the graph. b CFTR domains are translated with different rates as shown by their median RSCU values. The domain medians were normalized to whole CFTR median RSCU (value 1). Significantly faster (>1) and slower (<1) translation of the domains are marked with an *, while error bars represent the standard error of the mean (SEM) interface between MSD2 and NBD2 (MSD2/NBD2). After this, the translation of NBD2 is predicted to proceed quickly again, before slowing down again at the C-terminus (C') ( Fig. 2b).
Both transmembrane domains (MSD1 and MSD2) are composed of membrane spanning alpha helices connected by extracellular and cytosolic loops. Since both of these regions appear to be translated more slowly than average for CFTR, we examined these regions in more detail. As shown in Fig. 3a, almost the entire MSD1 is predicted to be composed of relatively slow translating codons except for cytosolic loop 2 (CL2) and external loop 3 (EL3). The slowest translating regions of MSD1 are helices 2 and 6 and external loop 1 (EL1) (Fig. 3a). The MSD2 codon bias is similar to MSD1 except that the prediction is that helices 1' and 4' and external loops 4 and 6 (EL4 and EL6) are translated faster than their counterparts in MSD1 (Fig. 3b). Interestingly in both MSDs, the prediction is that the final external loops are translated very rapidly (EL3 and EL6), while the final helices are translated very slowly (helices 6 and 6'), suggesting that the slowdown in translation is important at the end of both of these transmembrane domains.

Codon usage effects on the translational kinetics
There are 128 reported sSNPs in the human CFTR mRNA sequence and we analyzed the RSCU value changes introduced by these sSNPs to see how these might affect the translational kinetics (Additional file 2: Table S1). The median ΔRSCU value for the SNPs was 0.075, and almost one-third of those analyzed (42 sSNPs) introduced either significantly higher or lower RSCU values (highlighted in light grey in Additional file 2: Table S1). To determine if these sSNPs were predicted to alter translational rate, we analyzed each sSNP in 3-, 5-, and 10-amino acid RSCU windows as described in the Additional file 1. As shown in Table 1, only 5 of the selected sSNPs (shown in bold in Table 1) were predicted to alter the translational rate (one standard derivation above or below the median for all 3 RSCU window analyses: c.1098A > G (G366 at the MSD1/ NBD1 interface); c1641A > T (T547 in NBD1); c.3472C > A (R1158 at the MSD2/NBD2 interface); c.3772 T > C (L1258 in NBD2); and c.3789 T > C (T1263 in NBD2) ( Table 1). The other sSNPs listed in Table 1 showed effects in at least 2 of the 3 RSCU windows, and depending on their location, could potentially affect the CFTR translational kinetics.

Role of sSNPs in mRNA structural changes
The sSNP location within the domain, however, may not be the only decisive factor that affects protein folding. Here we examine how codon usage changes could potentially alter mRNA structure given that mRNA structural changes can affect the translational rate and protein function [12,13]. Using RNAsnp software [45,46] to determine if any of the SNPs could potentially influence CFTR mRNA structure (Additional file 3: Table S2), 8 SNPs were identified as potential candidates. All of the identified 8 SNPs introduced changes in the mRNA secondary structure by introducing hairpin turns, or by reorganizing or removing them (shown in Additional file 4: Figure S1).

Predicted pathogenicity of the sSNPs using CADD
In the third type of analysis, we used a recently described tool for estimating the relative pathogenicity of human genetic variants based on the combined annotation dependent depletion (CADD) method [47]. CADD estimates the relative pathogenicity of variants based on annotations from a variety of sources and combining them into a single measure that is expressed as a C-score [47]. The calculated C-scores resemble results from both conservation-based metrics and subset-relevant functional metrics. Interestingly, the sSNP distribution in CFTR is characterized by significantly higher average C-scores than observed in whole genome SNP distribution (Additional file 5: Table 1 sSNPs selected significant change in local codon bias motifs analyzed for 3-, 5-and 10-aa  clusters   sSNP  CFTR domain  Impact on  CFTR  translation Significant RSCU change when analyzed in 3 aa 5 aa 10 aa Figure S2) [47]. Examining this for CFTR, we considered only C-score values that were significantly higher than whole genome sSNP mean (6.36866 + 3.701498 = 10.07) [47], and this led to the initial selection of 47 SNPs. (Additional file 5: Figure S2; Additional file 6: Table S3). Interestingly, 4 of these sSNPs also showed up in the mRNA structural changes analyses, and one of these showed up in all 3 analyses, c.3472C > A (R1158), that is in the MSD2/NBD2 interface (Table 2).
Translation rates and protein folding CFTR fits well into the classic model of protein folding that suggests that the transmembrane regions are translated slower than cytosolic regions. Indeed, if one takes a closer look at the composition of the CFTR transmembrane domains (Fig. 3), it is clear that the helixes are formed very slowly, particularly as the amino acids leave the transmembrane region. MSD2 is formed slightly faster than MSD1 and both of these domains share common features. The last external loops of these transmembrane domains (EL3 and EL6) contain optimal codons whereas the last helices (6 and 6′), as well as the interface between the membrane spanning domains and the NBDs (MSD1/ NBD1 and MSD2/NBD2) are composed of rare codons. It is clear that MSD1 formation and cytosolic loop assembly are crucial for both co-and posttranslational CFTR folding. Hence, the changes in codon bias in these regions, especially at helices entering or leaving the membrane introduced by mutations or sSNPs could affect CFTR folding efficiency significantly and thus the levels and/or function of the mature protein.
Translation pauses are a general strategy that is employed in the co-translational folding of individual domains in a multi-domain protein. The time separation provided by the pause allows completion of the processes without interruption, thus helping to avoid problems in protein folding and aggregation [48]. Interfering with this process with changing codons introduced by sSNPs can lead to downstream effects. For example, changing codons has been shown in a number of cases to alter protein structure and function [1,3,39,49,50]. SNPs have been proposed to lead to alternate folding pathways through ribosome stalling, a lower concentration of cognate tRNAs (codon usage), or through alteration of the RNA structure [12,13,39,51]. Altered RNA secondary structures have been shown to influence the length of the pause cycles and the rate of translation [52]. Thus, mRNA structural-related changes in translational dynamics likely influence membrane integration and co-translational folding of multispanning membrane proteins like CFTR [21,53]. Our previous studies demonstrate that mRNA structural changes associated with the I507 SNP introduced by the F508del CFTR mutation results in a decreased translational rate of F508del CFTR [12]. Furthermore, a synonymous single nucleotide variant of the F508del CFTR (Ile507ATC), that reverts I507 ATT triplet to original ATC found in the wild type sequence, has wild type-like CFTR mRNA structure and enhanced expression levels when compared with native F508del CFTR [12]. More importantly, this substitution also affects the function of the protein [13] and sensitivity to drugs [11]. CFTR folding appears to be extremely complex (reviewed in [36]). The slow predicted translation rate for the MSDs makes sense given that homology models predict a complex domain swap structure of two six-spanning helical bundles containing transmembranes 1-2, 9-12 and transmembranes 7-8, 3-6 that are twisted around a central ionconducting pore [36,54]. Furthermore, CFTR transmembrane helices contain a number of charged residues which may be important for this complex arrangement of TMDs, and this in combination with the hydrophobic amino acids and the non-optimal codons could slow down translation, and in doing so, provide the necessary time for the proper assembly for this complex, pore-forming structure. Hopefully, this type of in silico analysis and discussion provides a framework for where to begin analyzing synonymous polymorphisms and establishes the concept that these types of changes should not be overlooked in future genetic screens.

Prospects
How codon usage and mRNA structure affect protein translational rates are just beginning to be understood. Algorithms for mRNA structure predictions identify the lowest energy structure among a mixture of structures that certainly exist in equilibrium [45,55,56]. Even if the correct structure is identified, supporting biochemical evidence by circular dichroism studies or mRNA folding assays such as the SHAPE assay need to be performed to confirm the predictions [12]. This also means that clonal cell lines have to be established to test for these effects, and the mRNA and protein expression levels and stabilities need to be tested. In the case of I507-ATC-> ATT, we found this synonymous codon change altered the mRNA structure and protein expression levels [12], increased the thermal stability and channel gating properties as monitored whole-cell patch-clamp recordings and single channel recordings, respectively [13], and altered the channel's sensitivity to drugs [11]. In this particular case, the I507 sSNP exacerbated the effect of the F508del mutation. This suggests the intriguing possibility that other silent polymorphisms have the potential to exacerbate or even mollify disease-causing mutations, or in extreme cases, even be diseasecausing themselves. Given the large number of silent polymorphisms found in most genes, and the amount of work required to determine if a polymorphism actually has any effect, bioinformatics approaches such as the ones described here will continue to be an important aspect of future studies that determine which silent polymorphisms alter protein expression and/or function.

Conclusions
An interesting aspect of these studies is the fact that the individual rates of translation of the different domains of CFTR are predicted to be very different and are consistent with the idea that the domains fold co-translationally. How these sSNPs actually affect the translational kinetics, however, can only be determined experimentally. An intriguing possibility, however, is that sSNPs, especially in combination with known mutations, could either exacerbate or mollify the severity of the mutation through their influence on the translational kinetics of the domain itself or within a domain-domain interface.