Genomic Evidence for Simultaneous Optimization of Transcription and Translation through Codon Variants in the pmoCAB Operon of Type Ia Methanotrophs

Microbial methane oxidation plays a fundamental role in the biogeochemical cycle of Earth’s system. Recent reports have provided evidence for the acquisition of methane monooxygenases by horizontal gene transfer in methane-oxidizing bacteria from different environments, but how evolution has shaped the coding sequences to execute methanotrophy efficiently remains unexplored. In this work, we provide genomic evidence that among the different types of methanotrophs, type Ia methanotrophs possess a unique coding sequence of the pmoCAB operon that is under positive selection for optimal resource allocation and efficient synthesis of transcripts and proteins. This adaptive trait possibly enables type Ia methanotrophs to respond robustly to fluctuating methane availability and explains their global prevalence.

to climate change (4). Microbial oxidation of methane also provides an entry point for methane into the global food web and can serve as a primary carbon source for large trophic systems (5,6). Methane oxidation is a globally distributed phenotype expressed in microorganisms from diverse taxonomic groups. Based on a range of phenotypic (e.g., metabolic pathways for carbon fixation, fatty acid compositions) and phylogenetic (e.g., Proteobacteria, Verrucomicrobia, NC10) features, methanotrophic bacteria can be categorized into seven major types (7): Ia, Ib, Ic, IIa, IIb, III, and the candidate division NC10. The distribution of marker gene sequences for the major methanotroph types suggests that they are differentially prevalent across environments (7). In bacteria, methane oxidation begins with two enzymatic steps where methane is first converted to methanol by a soluble or particulate methane monooxygenase (sMMO or pMMO, respectively), and then methanol is oxidized to formaldehyde by a pyrroloquinoline quinone-containing methanol dehydrogenase that can be calcium dependent (Mxa) or lanthanide dependent (Xox). The resulting formaldehyde can be directed to energy production or biomass synthesis (8). The functional sMMO is encoded in the six-gene operon mmoXYZBCD, while pMMO is encoded in the three-gene operon pmoCAB. Of the methanol dehydrogenases, the functional Mxa enzyme is encoded in the two-gene operon mxaFI and Xox is encoded by the gene xoxF.
In evolutionary history, methane oxidation appeared at around the same time as oxygenic photosynthesis, nitrogen fixation, nitrification, and denitrification (9), and it is possible that the emergence of methanotrophy occurred soon after the last universal common ancestor (10). Hence, evolution has likely shaped methanotrophs, with many as-yet-undiscovered properties (10). One unexplored question of fundamental importance to our understanding of methanotrophy is how the genes that encode the methane oxidation metabolic module have been shaped by evolution to efficiently execute methanotrophy, especially after recent reports have suggested that MMOs were potentially acquired by horizontal gene transfer in some types of methanotrophs (11)(12)(13)(14). In the three cellular domains and in viruses, evolution has selected gene sequences that perform cellular functions beyond just encoding the amino acid compositions of proteins. For example, gene sequences have nucleotide and codon variants that direct mRNA folding (15,16), transcript abundance (17), mRNA degradation (18), RNA toxicity (19), protein synthesis (20), and cotranslational protein folding (21), and they promote the interaction of peptides with the signal recognition particle (22). Gene sequences can also affect the cellular economy of protein synthesis (23), reduce the metabolic burden of nucleotide synthesis by incorporating less expensive nucleotides (24), and allocate resources required for transcription (25) and translation (26).
In this study, we performed an in-depth genomic meta-analysis to investigate the coding sequences of the genes that encode the methane oxidation metabolic module (as defined in the KEGG metabolic module M00174 [conversion of methane to formaldehyde]) in bacteria. We found that evolution has shaped the pmoCAB operon of type Ia methanotrophs with a unique coding sequence that optimizes resource allocation by reducing the biosynthetic costs of transcription and translation while ensuring translation efficiency and accuracy. This study provides novel insights into the molecular biology and evolution of methanotrophic bacteria and extends our understanding of the mechanisms developed by nature to sustain metabolism and life on Earth.
Type Ia pmoCAB coding sequences have an anomalous nucleotide composition bias. We measured the distribution of nucleotide composition biases of all coding sequences in each genome (Fig. S1b). The guanine-plus-cytosine (GC) content of every coding sequence was calculated, as well as the GC content in the third nucleotide of each codon (GC 3 content). In genomes of methanotroph types Ia, Ib, IIa, and IIb, the GC 3 contents of coding sequences tended to be higher than their GC contents, while in type III genomes, GC 3 contents tended to be lower than GC contents (Fig. S1b). Next, we analyzed the nucleotide composition biases of the 12 coding sequences in the four operons forming the methane oxidation metabolic module (KEGG M00174, methane to formaldehyde) (Fig. S1c). All coding sequences from all five types showed patterns of relative GC/GC 3 bias consistent with those of the whole genomes (Fig. S1), except for pmoCAB coding sequences of type Ia methanotrophs, where the GC 3 content was lower than the GC content (Fig. 1). The pxmABC and hao coding sequences, encoding a copper᎑containing membrane monooxygenase (13) and a hydroxylamine oxidoreductase (27), respectively, were selected as outgroups as they are present in a large number of methanotroph genomes but are not part of the set of genes that formed the focus of this study. As expected, these outgroup coding sequences showed a pattern of relative GC/GC 3 bias consistent with those of the whole genomes (Fig. S1c). As with the type Ia pmoCAB sequences, the type III coding sequences tended to have lower GC 3 contents than the GC contents ( Fig. S1c), consistent with the whole-genome pattern in type III genomes (Fig. S1).
The anomalous nucleotide composition of type Ia pmoCAB coding sequences provides an atypical codon usage bias. The anomalous nucleotide composition of type Ia pmoCAB coding sequences led us to hypothesize that this may generate an atypical bias in codon usage. To test whether codon usages differ among coding sequences in the methane oxidation metabolic module of each methanotroph, we first calculated the relative synonymous codon usage (RSCU) in every coding sequence (outgroups were also included). On ordination of the RSCU values of type Ia coding sequences (Fig. S2a), pmoCAB coding sequences clustered separately (Euclidean distance permutational multivariate analysis of variance [PERMANOVA], F ϭ 36.85, P Ͻ 0.01), and the effect of the source operon in the separation was strong (R 2 ϭ 0.38). A reduced effect of the source operon was observed in the clustering pattern for coding sequences of types Ib (F ϭ 3.82, P Ͻ 0.01, R 2 ϭ 0.21) and IIa (F ϭ 12.67, P Ͻ 0.01, R 2 ϭ 0.24), while weak effects (R 2 Ͻ 0.2) or nonsignificant probabilities (P Ն 0.01) were observed for types IIb and III (Fig. S2).
This result indicates that the type Ia pmoCAB coding sequences have an atypical codon usage bias compared to the codon usages of other coding sequences in the methanotrophy module. We hypothesized that this may be an adaptation to allow for higher expression of these genes. To test this, we calculated a codon adaptation index (CAI) that compared the codon usage of a coding sequence to the codon usage of a reference set of sequences from the same genome. The first CAI, named CAI genome , used the codon frequency of all coding sequences in the genome as a reference, while the second CAI, named CAI ribosome , used the codon frequency of only ribosomal protein genes as a reference. The CAI ribosome thus served as a proxy to represent the codon usage of highly expressed genes. The comparison between the CAI genome and CAI ribosome showed that mxaF coding sequences tended to have codon usages similar to those of both reference sets in all types of methanotrophs (Fig. 2a), which was also observed for mxaI and xoxF coding sequences in type Ib genomes. Type Ia mxaI coding sequences tended to deviate from the codon bias of ribosomal protein genes and toward the genome-scale codon usage pattern, while the opposite was observed for type IIa and IIb mxaI coding sequences. Type Ia xoxF coding sequences appeared in the top percentile ranks of the CAI ribosome but below the 50th percentile ranks of the CAI genome . A similar but much stronger pattern was observed in type Ia pmoCAB coding sequences, which appeared in the bottom 10% of CAI genome values but in much higher percentile ranks for the CAI ribosome (Fig. 2a). The results for all coding sequences in the methane oxidation module and outgroups are given in Fig. S3a. Overall, the extreme deviation of type Ia pmoCAB coding sequences from the genomic norm shows that they have a codon usage more similar to that of the ribosomal protein-coding sequences. This deviation of codon usage from the genomic signature may be due to a higher codon usage bias in pmoCAB coding sequences. To test this, we computed the effective number of codons (ENC) of every coding sequence in all type Ia methanotrophs. The ENC measures the number of synonymous codons used to encode amino acids. Thus, the ENC is 61 if a coding sequence uses all codons equally to produce all 20 standard amino acids, whereas the ENC is 20 if only one codon is used per amino acid (i.e., it has a high codon usage bias). Contrarily to our hypothesis, the ENC of type Ia pmoCAB coding sequences was not significantly different (two-tailed Wilcoxon signed-rank test, 99% confidence level) from those of mxaFI and xoxF coding sequences (Fig. 2b), while the ENC of pmoCAB coding sequences was significantly lower than those of coding sequences of ribosomal proteins and whole genomes (Fig. S3b). Overall, this indicates that even though the codons in coding sequences of pmoCAB and ribosomal proteins tend to be similar (Fig. 2a), the codon usage bias of pmoCAB coding sequences is higher. We hypothesized that if the type Ia pmoCAB coding sequences have a codon usage bias similar to those of other coding sequences in the methane oxidation metabolic module ( Fig. 2b) but their codon usage frequencies are divergent (Fig. S2a), there should be a clear association between the level of codon bias (Fig. 2b) and the anomalous nucleotide composition bias (Fig. 1). We found that type Ia pmoCAB coding sequences increase their level of codon bias at the expense of decreasing the GC 3 content (Fig. 2c), which contrasts with what occurs with all other methanotroph types and coding sequences tested (Fig. S3c). The pmoCAB coding sequences of type III show patterns similar to those of type Ia, but this is expected based on the whole-genome nucleotide bias (Fig. S1). Codon biases in methanotrophy operons optimize translation efficiency. The anomalous nucleotide composition bias of type Ia pmoCAB coding sequences suggests that there is a fitness advantage either to the particular codons selected or to having a lower codon diversity. We hypothesized that this advantage may lie in increased translation efficiency. To evaluate this, we first analyzed the encoded tRNA pool of all isolate genomes. MAGs were not included in this analysis, as their tRNA pool might be partial due to incomplete genome assembly. We found that the main difference among methanotrophs is that the type Ia genomes lack tRNAs with cytosine at the 3= end of the anticodon (Fig. S4a). Taking wobble pairing into account, we queried whether the codon usage bias is an optimization strategy in which type Ia pmoCAB codon usage has been selected to better match available tRNA isoacceptors and thereby improve translation efficiency by recruiting the most abundant tRNAs. To address this question, we estimated the translation efficiency of coding sequences using the tRNA adaptation index (tAI). The tAI considers exact and wobble pairing to estimate how efficiently a given coding sequence will be translated based on its codon usage and the genomic tRNA pool. We found that even though the preferences for individual codons are different between type Ia pmoCAB, mxaF, and xoxF coding sequences, their translation efficiencies are similarly ranked in around the top 10th percentile of each genome; the only coding sequence showing a significant difference from pmoC is mxaI (t test, 99% confidence level) (Fig. 3a). We included an analysis to test the difference between the pmoC coding sequence and outgroups (Fig. 3a) and found that, in this case, the percentile ranks of tAIs are significantly different (t test, 99% confidence level). The tAI results of other coding sequences and methanotrophs can be found in Fig. S4b.
Overall, these results suggest that for type Ia methanotrophs, the translation efficiencies of pmoCAB, mxaF, and xoxF coding sequences are similarly high (tAIs Ն 90th percentile rank) (Fig. 3a), and these coding sequences have been optimized in parallel through different codon usage biases. A possible evolutionary explanation for these observations is that the distinct codon usage bias of each coding sequence causes these genes to segregate between different tRNA isoacceptors, allowing simultaneous translation of the coding sequences required for methane oxidation while avoiding competition for the same tRNAs. To investigate this, we reconstructed the protein synthesis network accounting for the distribution of tRNAs among coding sequences of the metabolic module (Fig. 3b). We also included in the network the median RSCU value of each codon per coding sequence of type Ia methanotrophs ( Fig. S4c and d) so that we can account for the codon preference and its interaction with tRNAs through different pairing mechanisms. The median codon usage biases of the whole genome and coding sequences of ribosomal proteins (n ϭ 1,186; median number of ribosomal proteins per genome, 55; standard deviation, 3.6) were integrated in the network to compare the links between codons and tRNAs outside the methane oxidation module (details on the tRNA pool of each genome are given in Fig. S4e). The resulting networks ( Fig. 3b; see Fig. S5a for networks for all amino acids) did not support the hypothesis that the codons favored by each operon provided differential access to tRNA isoacceptors. We also analyzed how access to different tRNA isoacceptors changes with variations in individual codon preferences of coding sequences (Fig. 3c). We found that the different codon usage biases do not provide the operons with access to significantly different fractions of the tRNA pool (chi-square test for equality of proportions, P ϭ 0.87). Taken together, these results suggest that the codon usage bias of type Ia pmoCAB coding sequences does not provide a fitness advantage either in translation efficiency or in differential access to different tRNA isoacceptors compared to those of the mxaF and xoxF coding sequences. Despite the differences in codon usage between operons from the methane metabolic module, simultaneous translation of these operons would still require competition for the same tRNA isoacceptors.
Type Ia pmoCAB coding sequences optimize translation accuracy. We hypothesized that the divergent codon bias of pmoCAB coding sequences might be a strategy to optimize translation accuracy. This strategy prevents the incorporation of incorrect amino acids in the nascent protein (28, 29) by reducing the binding probability of near-cognate tRNA isoacceptors (i.e., tRNAs that differ by 1 base from the correct anticodon) for the ribosomal A site (30). Such an optimization for accuracy can be achieved at the coding sequence level through specialization of the codon-anticodon interaction in which an amino acid is encoded by a frequent and optimal codon that is recognized by an abundant cognate tRNA (31). We hypothesized that if translation accuracy is a selective pressure acting on a coding sequence, the positive selection for an optimal codon (or the purifying selection acting on nonoptimal codons) should increase as that codon's amino acid is increasingly used in the encoded protein.
To test this, we first calculated the amino acid usage of each protein (Fig. S5b) and mapped the RSCU with the highest median per amino acid (Fig. S4c) to the median frequency of amino acid usage. Finally, we used a linear regression to analyze the relationship between the codon preference in coding sequences and the amino acid usage frequency of their proteins. Consistently with our translational accuracy hypothesis, we found that among all methanotrophs and all coding sequences of the methane oxidation metabolic module, only the type Ia pmoCAB coding sequences showed a significant linear relationship (P Ͻ 0.01) (Fig. 3d and Fig. S5c). This is especially notable for the two amino acids with the highest codon degeneracy (i.e., the 6-fold Leu and Arg).
Minimizing protein synthesis cost through favoring of prebiotic amino acids. The use of the CAI, ENC, RSCU, and tAI enabled us to investigate the role of synonymous codons in optimization for translation efficiency and accuracy. However, as these measures account for amino acid usage biases, optimization strategies at the protein sequence level can be masked. We hypothesized that if translation optimization at the protein sequence level is achieved through reducing the synthesis cost of proteins (32,33), there should be an increased usage of the prebiotic (less expensive) amino acids (34) in the type Ia pMMO sequence. To test this, we first visually inspected the occurrence of prebiotic and modern amino acids in the linear regression between the RSCU and amino acid usage (Fig. 3d) and then quantitatively compared the median compositions of less expensive and expensive amino acids in each protein (Fig. 3e). We observed that the usage of prebiotic amino acids is significantly higher than that of modern (expensive) amino acids in pMMO, sMMO, and Xox and that Mxa showed a nonsignificant difference (t test, 99% confidence level) (Fig. 3e).
The codon usage of pmoCAB coding sequences results in an effective resource allocation strategy to synthesize highly demanded transcripts. While the codon usage and nucleotide composition biases of coding sequences in the type Ia methane oxidation module do not seem to provide an advantage in translation efficiency, it is possible that they provide an efficiency advantage at the level of transcription. Previous reports on methanotrophic species (35)(36)(37) and communities (38)(39)(40) have shown that pmoCAB coding sequences are consistently highly expressed regardless of the experimental conditions, and the demand on cellular resources to transcribe pmoCAB coding sequences may be significant. To test whether the biases confer increased transcriptional efficiency, the transcriptomic data sets of three different type Ia methanotrophs obtained under different experimental conditions were examined. For illustration purposes, the results of Methylobacter tundripaludum 31/32 grown with methane as the sole carbon source and supplemented with lanthanum (37) are described here. Detailed results with the other methanotrophs and experimental conditions can be found in Fig. S6.
To test the transcription optimization hypothesis, we first analyzed the purine and pyrimidine contents of coding sequences in each genome and found that purinederived residues occurred more frequently (52%) than pyrimidine nucleotides (Fig. S6a). We postulated that, if pmoCAB coding sequences are optimized for efficient transcription, they should have a higher pyrimidine than purine content, which reduces the mRNA synthesis cost, as pyrimidines have fewer carbon and nitrogen atoms per molecule (Fig. S6a). We found that pmoCAB were the most expressed transcripts and that they have a higher pyrimidine than purine content (Fig. S6b), contrasting with the xoxF and mxaFI coding sequences (Fig. S6b) and the genome-scale frequency (Fig. S6a). We calculated the mean pyrimidine content of 1,000 random combinations of three coding sequences (Fig. S6c) and found that Ͻ3.5% of the combinations had pyrimidine contents higher than those of the pmoCAB coding sequences.
To understand the effect of pmoCAB nucleotide frequency bias on cellular ribonucleotide composition, we analyzed the effect on the total ribonucleotide composition if we removed all the transcripts of an individual operon from the transcriptome. We calculated the total ribonucleotide composition by summing the ribonucleotide compositions of all transcribed coding sequences in the transcriptome. We found that removing mxaFI and xoxF transcripts had a negligible effect on the total ribonucleotide composition (Fig. S6d to f). However, removing pmoCAB transcripts caused a shift of ϳ1% in the purine/pyrimidine balance, a shift of Ͼ2% in the GC content, and a shift of ϳ2% in uracil content (Fig. S6d to f).
If the nucleotide composition bias of pmoCAB coding sequences is an adaptation to reduce transcript synthesis cost, it is expected that the atomic element investment per codon would be reduced in transcribed pmoCAB coding sequences compared to those of others in the methane oxidation metabolic module and to that of the whole-genome average. To evaluate this, we calculated the atomic element compositions of each transcribed coding sequence in the metabolic module and of all transcribed coding sequences in the genome based on their adenine, guanine, cytosine, and uracil compositions. We found that pmoCAB transcripts have fewer carbon, hydrogen, and nitrogen atoms per codon than other transcripts from the methane metabolic module and the genome-scale mean ( Fig. 4a and b). However, although the codon usage of pmoCAB transcripts requires reduced investment of carbon, nitrogen, and hydrogen, this comes at the expense of increased oxygen demand, with the oxygen demand exceeding the genome-scale mean (Fig. 4b). We selected 1,000 random combinations of three transcribed coding sequences from the whole genome, calculated the percodon element compositions, and compared them to the demand of the transcribed pmoCAB coding sequences ( Fig. 4a and b). We found that Ͻ3% of the combinations of transcribed coding sequences yielded a mean element composition below that of the transcribed pmoCAB coding sequences for nitrogen, carbon, and hydrogen or above that for oxygen. This pattern was even more pronounced for Methylomicrobium alcaliphilum 20Z (Ͻ0.4%) (Fig. S6g and h) and Methylomicrobium buryatense 5G (Ͻ1.5%) ( Fig. S6g and h). Next, we calculated the distance (Δ) between the mean carbon and oxygen atom counts per codon for methane oxidation metabolic module transcripts and all transcripts from each genome. We found that the pmoCAB transcripts had low carbon and high oxygen compositions per codon (Fig. 4c). Finally, we assessed whether the elemental composition of the pmoCAB operon is a result of a genome-wide trend to reduce the number of atoms per codon as a function of mRNA abundance. Rejecting this hypothesis, we found only weak correlations between elemental content and mRNA abundance (Fig. S6i). Together, these results suggest an optimization strategy unique to type Ia pmoCAB coding sequences whereby highly demanded transcripts use fewer carbon, nitrogen, and hydrogen atoms at the expense of more oxygen atoms per codon. Given the high oxygen requirement for the synthesis of pmoCAB transcripts, we hypothesized that type Ia genomes may harbor a genetic regulatory mechanism adjusting pmoCAB transcription in response to oxygen availability. In support of this, we identified proteins containing the PAS domain which can potentially act as oxygen sensors (41), located near the pmoCAB coding sequences in the type Ia genomes of the genera Methylosarcina, Methylomonas, Methylomicrobium, and Methylomarinum (about 10 coding sequences upstream) and the species Methylobacter tundripaludum 31/32 (5 coding sequences downstream) (Fig. S7).

DISCUSSION
This study provides genomic evidence that the pmoCAB coding sequences of type Ia methanotrophs possess a unique adaptive trait manifesting as a strong nucleotide bias (Fig. 1) that fine-tunes codon usage (Fig. 2) and can optimize methane oxidation through maximizing translation efficiency and accuracy (Fig. 3), while minimizing protein (Fig. 3) and transcript (Fig. 4) synthesis costs. The discovery of this unique coding sequence was enabled by meta-analysis of a large number of isolate genomes and MAGs from diverse taxonomies and geographies (Fig. S1). This finding illustrates a sophisticated adaptive linkage between molecular genotype and phenotype for methane oxidation. Given the high metabolic modularity of methanotrophy (10,42), the methods presented here can be applied to probe the molecular strategies encoded in other functional modules (e.g., copper uptake; mbnABCM, corAB, and copCD) that are of particular relevance to our understanding of methanotrophs.
From an environmental point of view, this adaptive trait enables type Ia methanotrophs to be competitive and efficient in oxidizing methane, which is a gas with limited solubility and mass transfer in liquid (43). Past reports on the molecular optimization of genetic sequences have shown that transcribed regions of genes can have fine-tuned nucleotide compositions that reduce their biosynthetic cost (24). For example, DNA coding sequences are found to reduce the number of nitrogen atoms per codon when nitrogen is scarce in the environment (44), and a large proportion of bacterial genomes present a dual-optimization strategy to reduce per-codon nitrogen demand while increasing translation efficiency (45). Extending these findings, we show that type Ia pmoCAB coding sequences possess an adaptation that minimizes not only nitrogen but also carbon and hydrogen per transcribed codon, while reducing the metabolic burden of erroneous biosynthesis (30) through optimal translation efficiency and accuracy. This optimization strategy may increase the fitness of type Ia methanotrophs in the competition for methane and might contribute to the prevalence of type Ia methanotrophs across diverse habitats with various environmental conditions (7). However, given that the number of sequenced genomes available for other types of methanotrophs is limited, we might find alternative optimization strategies at the molecular level with future sequencing projects. Nonetheless, the transcript optimization strategy found in type Ia pmoCAB coding sequences results in an increased demand for oxygen atoms per codon. The high demand for oxygen to synthesize pmoCAB transcripts may be part of a mechanism to modulate cellular metabolism based on oxygen availability in the environment and to tightly regulate the first enzymatic step of the methane oxidation metabolic module at the transcriptional level. This is supported by the presence of genes encoding the PAS domain proteins, potentially acting as oxygen sensors (41), flanking the pmoCAB coding sequences in the genomes of several type Ia methanotrophs (Fig. S7).
In the second enzymatic step of the methane oxidation metabolic module, methanol is oxidized to formaldehyde by the methanol dehydrogenase encoded by either the mxaFI or xoxF gene. We show here that the type Ia pmoCAB, mxaFI, and xoxF coding sequences have similar translation efficiencies (Fig. 3a), similar levels of access to tRNA isoacceptors ( Fig. 3b and c), and similar levels of codon bias (Fig. 2b); however, their preferences for individual codons (Fig. S2a) and translation accuracy (Fig. 3d) differ. Considering only the genes for the second enzymatic step, the difference in codon usage between the mxaFI and xoxF coding sequences reduces the xoxF transcripts' demand for carbon, nitrogen, and hydrogen atoms per codon (Fig. 4). We also show that protein synthesis has been optimized through the frequent incorporation of prebiotic (inexpensive) amino acids in the pMMO and Xox protein sequences, though not in Mxa (Fig. 3e). Together, these results indicate that although the mxaFI and xoxF coding sequences have similar translation efficiencies, the xoxF coding sequence reduces transcript and protein synthesis costs (Fig. 5), suggesting that Xox has evolved to be the predominantly expressed methanol dehydrogenase. Overall, the results in this study show that the pMMO-Xox configuration of the methane oxidation module should be the most efficient molecular strategy to catalyze the sequential oxidation of methane and methanol (Fig. 5). This finding complements the results of recent investigations of isolates (46) and communities (47) by providing a molecular basis to explain the prevalent expression of pmoCAB and xoxF coding sequences. Future investigations focusing on the implications of the pmoCAB and xoxF coding sequences in the context of ecogenomics and ecophysiology will be required.
Genes encoding pMMO have been scrutinized for nucleotide composition bias (48,49), and it has been proposed that they have evolved independently following an ancient speciation event (50). An earlier work indicated that methanotrophy in Alphaproteobacteria was a result of horizontal gene transfer of MMOs (11). In fact, recent evidence supports the horizontal transfer of MMO genes to methanotrophs in Proteobacteria (12)(13)(14). In the context of the literature, the most parsimonious explanation for our observations is that pmoCAB genes were acquired through an ancient horizontal gene transfer event and subsequently subjected to strong selective pressure that restricts gene amelioration (51) to the nucleotide composition of the type Ia genomes, which preserves a foreign codon usage bias that optimizes the transcriptional and translational machinery (52).
Fundamentally, our results emphasize that the type Ia pmoCAB coding sequences have been shaped by natural selection and are not the result of random genetic drift even at the level of synonymous codons. The ecological role of methanotrophs as the primary producers in complex communities (5,6) can exert strong evolutionary pressures on pMMO. Efficient conversion of methane to methanol not only serves as the first enzymatic step for methanotrophs but also provides methanol as the carbon source for methylotrophic bacteria (37). The availability of methanol can also support methane oxidation in methanotrophs that have a low affinity for methane when methane concentration is at atmospheric levels (53). Consequently, the unique adaptations of the type Ia pmoCAB coding sequences are likely the result of molecular evolution at the organismal, community, and perhaps even planetary scale (54).

MATERIALS AND METHODS
Detailed description of the methods used in this study is available in Text S1 in the supplemental material.
Methanotroph genomes. Of the 59 methanotroph genomes analyzed in this study, 47 are isolates and 12 are metagenome-assembled genomes (MAGs) (see Table S1 for details). This collection included most of the isolate genomes and relevant methanotrophic metagenomic data sets available in the public domain as of October 2017. Genome annotations were compiled from the JGI/IMG (55), NCBI/RefSeq (56), and KEGG/BlastKOALA (57) databases. The phylogenies of the genomes were assigned by PhyloPhlAn (58).
Metagenomic analysis pipeline. Briefly, raw reads of publicly available metagenomic data sets were obtained from the NCBI. Quality control of raw reads was performed with illumina-utils (59). Contigs were coassembled with MEGAHIT (60), and binning was performed with MaxBin (61). The quality of MAGs was assessed with CheckM (62) and MAGs were categorized according to the quality standards defined in the article "Minimum Information about a Single Amplified Genome (MISAG) and a Metagenome-Assembled Genome (MIMAG) of Bacteria and Archaea" (63). Annotation of MAGs was conducted with Prokka (64) and eggNOG-mapper (65).
Sequence analysis. Nucleotide composition bias was analyzed with seqinR (66) in R. Codon usage metrics were formulated in R according to the equations described for the RSCU (67) and CAI (67). tAIs were calculated with codonR (68). ENCs (69) were calculated with chips in EMBOSS (70). The tRNA data sets required for the reproduction of tAI calculations can be found in Table S2. The network data set was generated using in-house scripts. The network matrix was imported and edited with Cytoscape (71). Whole-genome analyses of codon metrics were conducted as detailed elsewhere (72).
Data availability. Scripts required to reproduce all the results and figures can be obtained from https://github.com/PLeeLab/methane_oxidation_genetic_trait/.