The predominance of nucleotidyl activation in bacterial phosphonate biosynthesis

Phosphonates are rare and unusually bioactive natural products. However, most bacterial phosphonate biosynthetic capacity is dedicated to tailoring cell surfaces with molecules like 2-aminoethylphosphonate (AEP). Although phosphoenolpyruvate mutase (Ppm)-catalyzed installation of C-P bonds is known, subsequent phosphonyl tailoring (Pnt) pathway steps remain enigmatic. Here we identify nucleotidyltransferases in over two-thirds of phosphonate biosynthetic gene clusters, including direct fusions to ~60% of Ppm enzymes. We characterize two putative phosphonyl tailoring cytidylyltransferases (PntCs) that prefer AEP over phosphocholine (P-Cho) – a similar substrate used by the related enzyme LicC, which is a virulence factor in Streptococcus pneumoniae. PntC structural analyses reveal steric discrimination against phosphocholine. These findings highlight nucleotidyl activation as a predominant chemical logic in phosphonate biosynthesis and set the stage for probing diverse phosphonyl tailoring pathways.

W ith stability and bioactivity imparted by a characteristic carbon-phosphorus bond, phosphonates are unusual and relatively unexplored biological molecules. Since the discovery of 2-aminoethylphosphonate (AEP) in nature six decades ago 1 , phosphonates and phosphinates have gained recognition as a small but commercially successful class of natural products exemplified by the antibiotic fosfomycin and the herbicide phosphinothricin ( Fig. 1a) 2,3 . Although exploration of these molecules and their biosynthesis is still in its infancy, bioinformatic analyses have revealed widespread genetic capacity to produce chemically diverse phosphonates [4][5][6] . For example, 5% of sampled microbes encode phosphoenolpyruvate mutase (Ppm) 4 , which catalyzes rearrangement of phosphoenolpyruvate (PEP) to the C-P bond-containing product phosphonopyruvate (PnPy) (Fig. 1b) 7 . Despite this predicted ubiquity, natural product classes like polyketides and nonribosomal peptides have been far more intensively investigated, and the relative dearth of characterized phosphonates partly reflects technical challenges associated with high polarity and lack of chromophores 8 . However, the few phosphonate biosynthetic gene clusters that have been characterized have revealed unusual enzymes catalyzing unprecedented chemical transformations 9 . These findings highlight the gap between current biosynthetic knowledge and future discovery potential, and motivate continued investigation of biological phosphonates.
Even more overlooked than freely diffusible small molecule phosphonates are those that modify, or tailor, cell surfaces. Comprehensive bioinformatic analysis of microbial ppm gene distribution predicted that most putative phosphonate biosynthetic gene clusters encode cell wall phosphonoglycans and phosphonolipids 4 . Although a handful of phosphonylated glycans and lipids have been structurally identified-such as the AEP-modified capsular polysaccharide from Bacteroides fragilis 10 (Fig. 1c)-almost nothing is known about their biosynthesis or biological roles. Modulating biological function by adding or removing small molecules on cell surface materials has long been recognized as a key mechanism of bacterial adaptation and virulence. Common cell surface modifications include the addition of acetyl 11 , pyruvyl 12 , phosphoethanolamine 13,14 , and phosphocholine (P-Cho) [15][16][17] groups to lipids and glycans. Indeed, P-Cho is a virulence factor encoded by the Lic pathway in pathogens, such as Streptococcus pneumoniae 15,18 , and lic gene disruption impedes P-Cho decoration of teichoic acids and attenuates S. pneumoniae virulence [19][20][21] . Understanding the biosynthetic logic of these modifications therefore represents an important goal with therapeutic implications, as phosphonates may similarly modulate virulence.
To address the gap between bioinformatic prediction and biosynthetic knowledge, we are characterizing several biosynthetic gene clusters predicted to encode cell wall phosphonoglycans or phosphonolipids. For instance, guided by the presence of ppm genes, we previously identified gene expression and/or phosphonate production from anaerobic bacteria Atopobium rimae, Treponema denticola, and Olsenella uli 22 , each of which have been associated with disease of the human oral cavity [23][24][25] . Commonly associated with ppm and found in these gene clusters are ppd and aept, which respectively, encode PnPy decarboxylase (Ppd) and AEP transaminase (AEPT) to generate AEP (Fig. 1b) 26,27 . Unfortunately, additional genes commonly found in predicted phosphonyl tailoring gene clusters have not been described, and the biosynthetic steps leading to phosphonyl modifications remain unknown (Fig. 1). Herein we reveal widespread occurrence of nucleotidyltransferase-encoding genes in phosphonate biosynthesis and characterize two representative enzymes as phosphonatespecific cytidylyltransferases from the Gram-positive actinobacterium A. rimae and the Gram-negative spirochete T. denticola. These enzymes efficiently activate AEP, presumably for subsequent capture by carbohydrate or lipid nucleophiles. We propose the phosphonyl tailoring (pnt) nomenclature for these gene clusters, and specifically use PntC for AEP cytidylyltransferase based on its sequence and biosynthetic similarity to LicC of the P-Cho tailoring (Lic) pathway. Indeed, in contrast to PntC, LicC from S. pneumoniae (Spn-LicC) strongly prefers P-Cho over AEP as a substrate for cytidylyl activation. Structure determination of PntC from T. denticola (Tde-PntC) and molecular dynamics analyses indicate that PntC selectivity for the smaller AEP substrate is primarily driven by active site steric constraints. Overall, identifying this remarkably widespread family of phosphonate-specific PntC cytidylyltransferases sets the stage for deciphering the biosynthetic roles of CMP-phosphonate conjugates and the biological consequences of this underexplored cell surface chemistry.  Fig. 2b). PF12804 is exemplified by Spn-LicC 29 , and PF01467 by the Fom1 CyTase 28 . This predominance of nucleotidyltransferase domains fused to Ppm enzymes was unexpected and supports a co-evolutionary trajectory of the two catalytic activities. In bacteria, co-evolution of functionally related genes is generally observed as colocalization, or clustering, on the chromosome 30 . A compilation of calculated distances, in number of genes, between ppm and the nearest gene encoding either PF12804 or PF01467 clearly illustrated a tendency for these genes to be in close proximity (Fig. 2c). For instance, 59.3% of all nearest nucleotidyltransferases were fused to ppm (distance = 0 genes). The next most frequent distance was two genes away, and about two-thirds (67.7%) of nucleotidyltransferases were within five genes of ppm (Fig. 2c inset). Note that these calculations identify only the nearest nucleotidyltransferase in a given gene cluster and therefore do not account for gene clusters possessing more than one nucleotidyltransferase. For example, the gene clusters in Fig. 3c possess nucleotidyltransferases that are both ppm-fused as well as one or two genes away. In these cases only the nearest nucleotidyltransferase would be tabulated, which for all three would be the fused domain at a distance of zero. In summary, this proximity bias indicates a functional relationship between Ppm and nucleotidyltransferases, and strongly implies an important role for the latter in phosphonate biosynthesis.

Nucleotidyltransferases
Interestingly, many ppm-associated nucleotidyltransferases are annotated as LicC-like cytidylyltransferases, which provides an important clue regarding possible biosynthetic function. The lic (lipopolysaccharide core) operon catalyzes choline import, phosphorylation and attachment of P-Cho to a glycan substrate destined for the cell wall 18,31 . Mirroring the chemical logic of the eukaryotic Kennedy pathway, a LicC cytidylyltransferase catalyzes formation of CDP-choline from CTP and P-Cho (Fig. 3b). LicC from S. pneumoniae (Spn-LicC) has been structurally and biochemically characterized 29,32,33 , and gene disruption attenuated virulence in a mouse model 21 . Overall, the similarities between P-Cho and AEP and genomic proximity of ppm and licC-like genes suggested that phosphonates may be activated as CMP-conjugates in a manner analogous to P-Cho activation by LicC (Fig. 3a).
PntC enzymes preferentially activate AEP. Intrigued by the genomic proximity of ppm and licC-like cytidylyltransferases, we sought to characterize the putative phosphonate-specific nucleotidyltransferases Oul593, Ari1348 and Tde1415. We observed transcription of oul593 and ari1348 in pure cultures (Supplementary Fig. 1), and previously reported chemical shifts consistent with phosphonates via solid-state 31 P NMR analysis of T. denticola and O. uli whole cells 22 . Furthermore, each genomic neighborhood encodes AEP production and biosynthetic machinery associated with cell wall glycans, such as LicD-like phosphotransferases or glycosyltransferases (Fig. 3c). In fact, both A. rimae and O. uli possess lic genes relatively nearby on the chromosome. The Ppm protein sequences also phylogenetically cluster into phosphonoglycan/lipid clades 4,22 , which further suggests that these phosphonates are directed to the cell surface. To evaluate the substrate preference of putative phosphonate nucleotidyltransferases relative to Spn-LicC, we expressed and purified Ari1348, Tde1415, and Spn-LicC, but Oul593 was insoluble ( Supplementary Fig. 2). Dynamic light scattering (DLS) analysis indicated monomeric quaternary structures for Spn-LicC and Ari1348 ( Supplementary Fig. 3), as previously observed for Spn-LicC 33 . In contrast, Tde1415 has a predicted monomeric molecular weight of~70 kDa with DLS data indicating a dimer of 146 kDa. Cytidylyltransferase activity detected using an HPLC assay revealed time-dependent formation of CDP-choline (CDP-Cho) and CMP-AEP by Spn-LicC and Ari1348, respectively (Fig. 4a). CDP-Cho was verified by comparison to commercially sourced authentic standard and high-resolution mass spectrometry (expected mass, 488.1073; observed, 488.1071). CMP-AEP was identified based on high-resolution mass spectrometry and  . Overall, these results clearly demonstrated Ari1348-catalyzed formation of CMP-AEP and imply shared biosynthetic logic for P-Cho and phosphonate tailoring pathways (Fig. 3).
Although activity was observed for each enzyme with its proposed substrate, a comparison of substrate preference further supported our cognate substrate assignments and prompted the adoption of new nomenclature to describe this class of phosphonate-specific enzyme. Previous work revealed a 2700fold higher specificity constant for Spn-LicC towards P-Cho relative to phosphoethanolamine (Table 1), but activity towards phosphonate substrates was not examined 33 . Our HPLC assays revealed poor cross-reactivity with non-cognate substrates (Fig. 4b). Specifically, Ari1348 clearly preferred AEP over P-Cho under the same conditions, signifying phosphonate cytidylyltransferase activity. In addition, both Ari1348 and Tde1415 exhibited strong activity towards CTP, limited activity towards ATP, and no activity detected for GTP ( Supplementary Fig. 8). Similar limited reactivity towards ATP was previously observed in the Fom1 CyTase domain 28 and Spn-LicC 32,33 . Based on phosphonate preference and functional analogy to LicC, we propose phosphonyl tailoring cytidylyltransferase (PntC) terminology to distinguish these enzymes from P-Cho-specific LicC cytidylyltransferases (Fig. 3).  The strict substrate specificity implied by the HPLC analyses of Fig. 4b prompted quantitative investigation via steady-state kinetic analysis. Using a coupled assay to detect pyrophosphate release 34,35 , our initial rate measurements of Spn-LicC towards P-Cho with saturating concentrations of CTP revealed a k cat of 1.06 ± 0.04 s −1 and K M of 2.36 ± 0.65 µM, affording a k cat /K M comparable to previous reports 29,32,33 (Table 1, Supplementary  Fig. 9). In contrast, AEP as a substrate yielded k cat and K M values of 0.723 ± 0.054 s −1 and 318 ± 129 µM, respectively, revealing a 200-fold preference for P-Cho over AEP. Ari-PntC (Ari1348) exhibited the opposite preference, with a 440-fold higher specificity constant for AEP versus P-Cho (Table 1, Supplementary Fig. 9). Similarly, Tde-PntC (Tde1415) was not sufficiently active towards P-Cho to obtain steady-state kinetic data, but activity towards AEP was similar to Ari-PntC (Table 1). Overall, clear preference for AEP as a substrate supports assignment of these enzymes as phosphonate-specific cytidylyltransferases, or PntCs.
Molecular determinants of specificity are primarily steric. Structural studies were undertaken to understand PntC substrate selectivity for AEP versus P-Cho. In contrast to Ari-PntC, the larger two-domain protein Tde1415 was readily crystallized to afford a 2.72 Å structure of the apo enzyme and a 1.95 Å structure of the product complex obtained in the presence of CTP and AEP substrates (Supplementary Table 1). As predicted from gene annotations (Fig. 3c), the overall structure of the enzyme included an N-terminal PntC connected to a C-terminal AEPT domain ( Supplementary Fig. 10). The classic type I aminotransferase fold of the AEPT domain is very similar to AEPT from Salmonella typhimurium, which is the only other AEPT structure known 36 . Interestingly, neither of these structures possesses a covalently linked pyridoxal 5′-phosphate (PLP) cofactor, and the colorless Tde1415 solution was consistent with a missing internal aldimine chromophore. To our knowledge, these two enzymes and the alanine-glyoxylate aminotransferase from Anabaena 37 are the only crystallographic examples of non-covalently bound PLP. Nonetheless, aminotransferase activity was detected as 31 P NMR chemical shifts consistent with the transformation of AEP to phosphonoacetaldehyde (PnAA, Supplementary Fig. 11) and confirmed by mass spectrometry (Supplementary Fig. 12). The AEPT domain of Tde1415 possesses PLP and the requisite active site features to support classical aminotransferase catalysis ( Supplementary Fig. 13).
The crystal structure of the PntC domain of Tde1415 in complex with the CMP-AEP product possesses a binding orientation reminiscent of the previously reported Spn-LicC: CDP-Cho structure (PDB 1JYL, Fig. 5) 29 . Strikingly, the presence of two adjacent magnesium ions in the Tde-PntC active site contrasts with the single ion observed in the Spn-LicC complex, and each apo-enzyme has one less metal ion. Consistent with our assignment of the metals as magnesium ions, the temperature factors increased from 31.3 to 62.7 Å 2 when Zn 2+ was included in crystallographic refinements. In addition, EDTA treatment inactivated Tde1415, but activity was restored by the addition of Mg 2+ or Zn 2+ but not Ca 2+ (Supplementary Fig. 14). Varying metal ion content is well established among phosphate-processing enzymes. For example, a two-magnesium mechanism is favored for CMP-Kdo synthetase (KdsB) 38,39 , FomD employs two metals for cytidylyl (S)-2-hydroxypropoylphosphonate hydrolysis 40 , DNA polymerases commonly employ a two-metal ion mechanism with evidence for an additional third divalent metal ion 41 , and terpene synthases require three metal ions 42 . Overall, in contrast to Spn-LicC, which loses a salt bridge between Arg129 and a metal-coordinating residue Glu216, the metal coordination sphere of Tde-PntC does not undergo major changes upon ligand binding ( Supplementary Fig. 10).
The crystal structures of Spn-LicC 29 and Tde-PntC provide an opportunity to probe the selectivity determinants for respective cognate substrates P-Cho and AEP, which are present in the active sites as CMP-conjugate products. The two substrates clearly differ in size, and possibly in charge; the positive charge of the quaternary amine of P-Cho does not necessarily occur at the primary amine of AEP. Although primary amines are typically positively charged (pK a~1 0) 43 , active site pK a perturbations could occur 44 to neutralize AEP. However, few clues exist from the Tde-PntC crystal structure, which reveals water, Glu196, and Glu104 positioned to interact with the primary amine of AEP via hydrogen bonds. Intriguingly, the equivalent positions in Spn-LicC (D192 and D105) are occupied by smaller Asp side chains (Fig. 5), which may provide more room to accommodate the larger P-Cho substrate. In addition, Spn-LicC possesses a composite aromatic box for binding the choline quaternary amine via cation-π interactions (Fig. 5) 45 . Notably, Ari-PntC also possesses the equivalent aromatic residues, suggesting that charge may not be a key determinant of P-Cho versus AEP selectivity, and highlighting that Ari-PntC is more closely related to Spn-LicC than Tde-PntC ( Supplementary Fig. 15). Therefore, protein cavity volumes were calculated 46 to evaluate the effects of steric restriction against the larger P-Cho substrate in the Tde-PntC active site. The larger pocket volume of Spn-LicC (~381 Å 3 ) compared to Tde-PntC (~287 Å 3 ) (Fig. 5) corresponds to the difference of~64 Å 3 in molecular volume between CDP-Cho and CMP-AEP, or the~75 Å 3 volume of the choline trimethylamine moiety 47 .
To further probe the effects of steric constraint on substrate selectivity, we pursued molecular dynamics simulations. In addition to the enzyme:product crystal structures of Spn-LicC: CDP-Cho and Tde-PntC:CMP-AEP, we also modeled noncognate enzyme:product pairs Spn-LicC:CMP-AEP and Tde-PntC:CDP-Cho using 10 × 150 ps independent trajectories (1.5 ns in total) for each complex (Fig. 5). Notably, the Tde-PntC active site did not readily accommodate the additional volume of CDP-Cho, and significant conformational changes in the ligand occurred immediately during the simulations. Specifically, the choline moiety was forced towards cytidine, with a concomitant shift in the phosphate backbone enabled by rotation about the ribose C5′-O5′ bond. The distribution of this dihedral angle during the Tde-PntC simulations indicates: (i) the magnitude of this shift for the non-cognate ligand CDP-Cho, and (ii) the comparatively narrow distribution of dihedral angles accessed by the cognate ligand CMP-AEP (Fig. 5c). Similar results were obtained for Spn-LicC, characterized by a narrow distribution for the cognate ligand CDP-Cho and a broad bimodal distribution of dihedral angles for the non-cognate CMP-AEP. Insofar as dihedral angle distributions represent movement of the ligand, the narrow distribution of cognate ligand conformations may reflect a more tightly bound ternary complex. Although not observed in the crystal structure, it is possible that a proportion of the LicC:ligand complex in solution would carry an additional Mg 2+ ion, similar to the observed Tde-PntC:CMP-AEP complex. We therefore also performed simulations with a second Mg 2+ for both Spn-LicC product complexes, revealing a bimodal distribution of the dihedral for CDP-Cho with a second peak around -150° (Supplementary Fig. 16) and a narrower distribution for CMP-AEP. This may explain why Spn-LicC can accept the noncognate substrate (e.g. with two Mg 2+ bound), albeit with increased K M (Table 1) due to the limited interactions of the primary amine in the LicC choline-binding pocket. Overall, the molecular dynamics simulations provided further support for the influence of steric effects on enzyme selectivity.
Mechanistic insight from structural data. The crystal structures determined in apo form (PDB 6PD1) and in the presence of CTP and AEP (PDB 6PD2) provide additional mechanistic insight. Electron density outlines the CMP-AEP product (Fig. 6a) with partial density at the expected location of the pyrophosphate (PP i ) leaving group. This PP i pocket was also detected by the Roll algorithm in both Spn-LicC and Tde-PntC 46 (Fig. 5a, b). The PP i site is defined by the first five residues of the GXG(T/S)RX 4-8 PK nucleotidyltransferase consensus sequence (boxed region, Supplementary Fig. 15) 32 , which provides stabilizing positive charge via backbone amide protons and the conserved residue Arg15. The terminal Lys25 of the consensus sequence is positioned to interact most strongly with the magnesium-coordinated α-phosphate of CTP, and alanine substitution of this highly conserved residue abolished cytidylyltransferase activity in the related enzymes FrbH (K38A) and YgbP (K27A) 48,49 . Similar to the Spn-LicC structure, Lys25 appears positioned to accept additional negative charge arising during a pentacoordinate transition state, which would result from nucleophilic attack of AEP in an associative mechanism (Fig. 6b). The partially conserved residue Lys153 may guide attack by interacting with phosphonate oxygen. This mechanistic trajectory was also proposed based on a similar crystallographic orientation of Lys27 in the MEP cytidylyltransferase YgbP 49 . Indeed, we observed markedly diminished activities (<10% of wildtype) when alanine was substituted at Arg15, Lys25, and Lys153 of Tde-PntC ( Supplementary Fig. 17), lending support to the mechanism proposed in Fig. 6b. Although detailed kinetic analysis was beyond the scope of this study, we expect that PntC enzymes will employ ordered sequential binding. Previous substrate and product inhibition analyses of Spn-LicC favored CTP as the leading substrate and CDP-Cho as the final product off the enzyme 29 , and initial CTP binding was concluded from pulse chase analysis of YgbP 49 .

Discussion
We have identified nucleotidyltransferase Pfam families in about two-thirds of phosphonate biosynthetic gene clusters. Surprisingly, about 60% of all Ppm enzymes are fused to one or both of two Pfams, with MobA-like NTP transferase (PF12804) as the most commonly fused domain. Only one member of this family was previously known to act on a phosphonate substrate; FrbH catalyzes cytidylyltransfer to L-2-amino-4-phosphonobutyrate during FR-900098 biosynthesis 48 . The other commonly fused domain is cytidylyltransferase-like (PF01467), which includes the PhpF phosphonoformate cytidylyltransferase functioning in phosphinothricin biosynthesis 50 and the Fom1 HEP cytidylyltransferase from the fosfomycin biosynthetic pathway 28 . In demonstrating AEP-specific cytidylyltransferase activity for two PF12804 enzymes, we have expanded the catalytic repertoire of this family to include phosphonyl tailoring cytidylyltransferases (PntCs) involved in putative cell surface tailoring pathways. Cytidylyl activation has been well documented in the biosynthesis of cell surface glycans and phospholipids, but is rare in natural product biosynthesis. Indeed, we are not aware of examples beyond small molecule phosphonates and the MEP pathway (e.g. YgbP of the IspD family) for terpenoid biosynthesis.
In contrast, CMP-activated sialic, legionaminic, pseudaminic acids, as well as CMP-Kdo, exemplify common intermediates in cell surface glycan biosynthesis 38,51 . Similarly, CMP-conjugates facilitate phospholipid biosynthesis, including CDP-diacylglycerol and the aforementioned CDP-choline 52,53 . The significance of cytidylyltransferase-enrichment in phosphonate biosynthesis is unclear, but may simply reflect that most phosphonates occur on cell surface glycans and lipids.
We expect that PntC enzymes will activate a variety of phosphonates to furnish a mosaic of cell surface phosphonate chemistry. Identifying CMP-conjugates will enable discovery of new phosphonyl modifications and elucidation of their biological roles. Further examples of related chemical logic in biosynthetic pathways, such as AEP and the virulence factor P-Cho, will present opportunities to understand and manipulate enzyme selectivity for developing biocatalysts and selective antivirulence agents.

Methods
General. CTP was purchased from Thermo Fisher Scientific (Mississauga, ON, Canada). Chelex 100 sodium form (50-100 mesh), AEP, P-Cho, and CDP-Cho were purchased from Sigma-Aldrich (Oakville, ON, Canada). P-Cho was treated with Chelex 100 resin as previously described prior to use in order to remove calcium, which inhibits the enzyme 32 . All other chemicals were used without further purification. DNA sequencing was performed at The Centre for Applied Genomics at the Hospital for Sick Children (Toronto, ON, Canada). HR-ESI-MS was performed at the Alberta Glycomics Centre (Edmonton, AB, Canada). HPLC analysis was performed using a Prominence LC-20AT system equipped with a diode array detector (Shimadzu, Kyoto, Japan). FPLC purifications were performed using an NGC™ Chromatography System (Bio-Rad, Mississauga, ON). NMR spectra were recorded on an Agilent DD2 operating at 400 MHz for 1 H and 100 MHz for 13 54,55 . The GA (gathering) score thresholds were applied for determining matches, which is considered a reliable curated threshold for defining family membership 55 .
Retrieving Ppm and PntC homologs from HMMs. HMMs matching the retrieved Ppm and PntC mentioned above (for Ppm: PF13714; for PntC: PF12804, PF00483, PF01129) were used to scan collectively annotated protein sequences from the NCBI Refseq genome databases 56 of complete genomes (n = 8582), genome scaffolds (n = 41,568), and contigs (n = 60,795), using HMMscan. After potential homologs were thus identified, a Python script was written and used to narrow the results into homologs sharing alignment to at least one of the HMMs for PntC and the single HMM retrieved for Ppm. To improve accuracy of the search, results were filtered to ensure that only sequences with ≥60% alignment to the HMMs were included. Ppm homologs were further screened to ensure the presence of the conserved catalytic motif EDKXXXXXNS, which distinguishes PEP mutase from other members of the isocitrate lyase family 4 .
Inventory of Ppm fusion domains. Protein sequences were retrieved for all identified Ppm homologs from NCBI RefSeq genome databases and scanned for HMMs using the HMMER software suite to establish an inventory of fusion domains. The GA (gathering score) threshold was once again applied for determining family membership. A Perl script was applied to account for overlapping domains. In the case of overlapping domains, only the domain with the highest score was included in analysis. Results were also filtered with ≥60% alignment to the HMMs as described above.
Distribution of distances between ppm and pntC genes. A Python script was written to calculate distances, in number of genes, between ppm and pntC homologs (after filtering for ≥60% HMM alignment and presence of Ppm motif) in complete, contig, and scaffold NCBI RefSeq genome databases. If multiple ppm or pntC were present in one genome/contig, only the closest distance was tabulated.
To further limit the results to non-redundant genomes (one per species), the genome database was filtered using tri-nucleotide DNA signatures as reported previously 57 . The genomes were filtered at a distance delta = 0.03, which roughly corresponds to species.
Molecular dynamics simulations. The crystal structures of 1JYL (chain D) and 6PD2 (focusing on the chain B active site of the BD dimer) were used with experimental coordinates for cognate substrates CDP-Cho and CMP-AEP, respectively. Non-cognate substrates (i.e. 1JYL:CMP-AEP and 6PD2:CDP-Cho) were built in PyMOL 60 . The Enlighten plugin for PyMOL was used to execute protocols using AmberTools14 software for ligand parameterization with GAFF 61 , hydrogen and solvent addition (PREP protocol), structure relaxation and minimization (STRUCT protocol), and molecular dynamics (MD at 300 K; DYNAM protocol) 62 . All simulated active site residues were in their standard protonation states; Tde1415 histidines within the solvated simulation sphere were protonated as follows: HIE57, HID186, HIE187, HIE197, and HIE226. For each of the four complexes, 10 independent 150 ps MD simulations were run using a 20 Å solvent shell surrounding P A atom (equivalent of CTP α-phosphate P) of the substrate. From the 1500 ps of simulation data for each complex, dihedral angles were measured every 1 ps using the cpptraj program included in AmberTools 63 . The simulation data was clustered using cpptraj (hierarchical agglomerative clustering on the RMSD of the substrate, with ε = 0.85 for all complexes except for Tde-PntC: P-Cho complex, which used ε = 1.1) to identify centroid conformations that were representative of most common conformations observed.
Bacterial strains and media. Escherichia coli BL21(DE3) and E. coli DH5α (Supplementary Table 2) from Life Technologies (Carlsbad, CA) were used for protein production and cloning, respectively, and were cultured in standard LB broth. The spn-licC gene was purchased from BioBasic Canada (Markham, ON) in the pET57 vector to include the N-terminal sequence MGSSH 6 SSGLVPRGSH prior to the N-terminal methionine residue and preserve the exact sequence of the previously characterized enzyme 29,33 . The synthetic spn-licC gene was directly cloned as an NdeI/XhoI fragment into pET29 to afford pGH1000. The ari1348 gene was purchased codon-optimized for E. coli from BioBasic in the pET28a vector and subcloned as an NdeI/NotI fragment into pET29. The resulting pGH2000 expression construct is designed to generate C-terminal His 6 -tagged Ari-PntC protein.
The tde1415 gene was purchased from BioBasic in pET28a and subcloned without its stop codon as a NdeI/NotI fragment into pET21b to afford pGH3000 for producing C-terminal histidine-tagged protein. RT-PCR analysis. Gene transcription was monitored with primers outlined in Supplementary Table 4. RNA extraction from bacteria was performed using the Ribopure RNA isolation kit (Invitrogen, Carlsbad, CA). Any remaining genomic DNA that may have been carried through the RNA extraction process was removed with a 1 h DNaseI treatment step, which was included in the kit. Following RNA extraction and DNaseI treatment, a DNA-based PCR was always performed to detect undigested DNA. The lack of amplification products confirmed the absence of contaminating genomic DNA. RT-PCR was carried out using the OneStep RT-PCR kit (Qiagen) according to the manufacturer's recommendations. Briefly, each reaction setup consisted of 2 µl of dNTP mix (400 µM of each dNTP in a reaction), 0.6 µM of each primer, 2 µl of polymerase mix and 2 µl of specific RNA templates. The thermal cycling protocol was performed with the iCycler iQ multicolor realtime detection system (Bio-Rad, Hercules, CA) and contained an initial reverse transcription step of 50°C for 30 min followed by PCR activation at 95°C for 15 min. Subsequent steps in the protocol contained 40 cycles of amplification (94°C for 45 s, 55°C for 45 s, 72°C for 1 min) and a concluding step at 72°C for 10 min. Amplified products were analyzed on 1.8% (w/v) agarose gels stained with 0.5 µg/ml ethidium bromide and imaged with a VersaDocTM 4000 MP (BioRad).
Site-directed mutagenesis and analysis of mutants. Tde1415 enzyme variants R15A, K25A, and K153 were generated by overlap extension PCR mutagenesis using primers listed in Supplementary Table 4. All mutants were verified by sequencing and proteins expressed in pET21b. Activity of enzyme variants was monitored by 31 P NMR after 2 h incubation at room temperature in 50 mM Tris-Cl pH 8.0 containing 7.0 mM CTP, 3 mM AEP, 10 µM Tde1415, and 7.0 mM of MgCl 2 . Phosphonoacetic acid at 6.0 mM was added as an internal standard immediately prior to 31 P NMR spectral acquisition.
Protein production. Plasmids were transformed into E. coli BL21(DE3) and plated on the LB agar plates containing the appropriate antibiotic. Single colonies were picked into~5 ml and grown overnight at 37°C in LB supplemented with antibiotic, 1 ml of which was then transferred to 50 ml LB supplemented with antibiotic. After overnight growth at 37°C, 1 l batches of fresh LB were inoculated with 10 ml of the overnight culture and incubated at 37°C and 220 rpm. At OD 600~0 . using MWCO 3500 dialysis tubing, and then run over a HiTrap Q FF anion exchange column (GE Healthcare, Chicago, IL) attached to an NGC FPLC system (Bio-Rad, Hercules, CA). After isocratic flow in 50 mM Tris-Cl (pH 8.0) for 2.5 column volumes, a linear gradient from 0 to 700 mM NaCl was performed over 18 column volumes. Tde1415 eluted at~120 mM NaCl, and Ari1348 eluted at 250 mM NaCl, consistent with the respective predicted pI values of 5.7 and 5.0 as calculated using the ProtParam tool on the ExPASy web server 64 . Pooled FPLC fractions were concentrated to~2 ml by ultrafiltration using 10 kDa MWCO centrifugation filters (Pall Corporation, Port Washington, NY), then desalted into Tris-Cl pH 8.0 buffer using PD-10 columns (GE Healthcare). The desalted protein was concentrated to~200 µl by ultrafiltration, then either used immediately or frozen as beads in liquid nitrogen for storage at −80°C until further use.
Protein analysis. Protein concentrations were estimated using the Bradford assay 65 , and purity was assessed using SDS-PAGE. Oligomeric structure was evaluated using size-exclusion chromatography and DLS. The former was performed on a Bio-Rad NGC FPLC system equipped with a Superdex 75 10/300 GL column from GE Healthcare, eluted using an isocratic flow (0.5 ml/min) of 50 mM Tris-Cl and 150 mM MgCl 2 at pH 8.0. DLS was performed on 20 µl of freshly purified protein (8 mg/ml) at 20°C using a DynaPro NanoStar (Wyatt Technology, Santa Barbara, CA).
HPLC enzyme assays. Reactions containing 50 nM enzyme, 1 mM of either AEP or P-Cho, 4 mM CTP, 7 mM MgCl 2 , and 50 mM Tris-Cl pH 7.8 were allowed to proceed for 30 min prior to quenching a 10 µl aliquot with an equal volume of cold methanol (containing 0.1% TFA). The quenched reaction was centrifuged for 5 min prior to injecting 1 µl on an HPLC equipped with a Luna NH2 column (3 µm, 100 Å, 100 × 4.6 mm, Phenomenex, Torrance CA, USA) using an isocratic elution at 1 ml/min in a mobile phase of 20 mM ammonium acetate, pH 10.
Aminotransferase activity assays. The AEPT activity was observed by combining 1.5 mM AEP, 6 mM pyruvate, 30 µM pyridoxal-5′-phosphate (PLP) in 50 mM Tris-Cl buffer containing 50 mM MgCl 2 , pH 8.0. The reaction was initiated by adding 1 µM of Tde1415 enzyme, which was missing in the no-enzyme negative control. The reaction was either monitored directly via 31 P NMR as shown in Supplementary Fig. 11 or allowed to proceed at room temperature for 2 h prior to removal of enzyme by ultrafiltration using a Vivaspin 5000 Da MWCO membrane centrifugation device (Sartorius, Goettingen, Germany). Filtrate was stored frozen at −20°C prior to mass spectrometric analysis at the Mass Spectrometry Facility at the University of Guelph ( Supplementary Fig. 12).
Metal use assays. 140 µl of 400 µM recombinant wildtype Tde1415 was diluted to a volume of 3.0 ml with 50 mM EDTA, 50 mM Tris-HCl, pH 8.0 buffer. The protein solution was applied to an Econo-Pac 10DG desalting column (Bio-Rad) and exchanged into 50 mM EDTA, 50 mM Tris-Cl, pH 8.0 buffer. The eluted protein was concentrated using a 5 kDa MWCO ultracentrifuge device (Sartorius) and then desalted a second time into 50 mM Tris-Cl, pH 8.0. The de-metalated protein was again concentrated to 2.0 ml by ultracentrifugation and stored at 4 o C. PntC activity was monitored by 31 P NMR after 2 h incubation at room temperature in 50 mM Tris-Cl pH 8.0 containing 7.0 mM CTP, 3 mM AEP, 5.5 µM Tde1415, and 7.0 mM of divalent metal (MgCl 2 , CaCl 2 , or ZnCl 2 ). Phosphonoacetic acid at 6.0 mM was added as an internal standard immediately prior to 31 P NMR spectral acquisition.
Colorimetric coupled enzyme assays. The EnzCheck ® Pyrophosphate Assay Kit (Molecular Probes, Inc., Eugene, OR, USA) was used to determine the activity of Spn-LicC, Ari-PntC, and Tde1415. Each reaction was kept to a total volume of 100 µl and performed in triplicate. The reaction components were split between two wells, one well contained the substrate and enzyme, while the other well contained CTP. Well 1 contained 0.5 µl (1U) PNP, 0.5 µl (0.03U) IPP, 10 µl (0.2 mM) MESG, 2.5 µl 20X reaction buffer, 16 µl (4 mM) CTP, and 20.5 µl of water, making a total of 50 µl. Well 2 contained 0.5 µl (1U) PNP, 0.5 µl (0.03U) IPP, 10 µl (0.2 mM) MESG, 2.5 µl 20X reaction buffer, 12 µl (7 mM) MgCl 2 , enzyme (variable), substrate (variable), and H 2 O to fill to a final volume of 50 µl. A control with either no enzyme or substrate was made for each trial. The separate wells were incubated for 30 min at room temperature. After 30 min, well 2 was combined to well 1, and absorbance was recorded every 30 s at 360 nm for 30 min. Initial velocities (<10% substrate turnover) for each substrate concentration were exported to Excel and Michaelis-Menten and substrate inhibition equations were fit by non-linear regression using the R software 66 .
Purification of CMP-AEP. 5 ml reactions containing 1 µM Ari-PntC, 1 mM of AEP, 2 mM CTP, 7 mM MgCl 2 , and 50 mM Tris-Cl pH 8 were allowed to proceed for 1 h at room temperature prior to removing the enzyme through centrifugation in a 5 kDa ultracentrifugation tube. The flow-through was injected on the FPLC over a 5 ml GE Hi-Trap™ Q FF anion exchange column into the mobile phase A (25 mM ammonium bicarbonate pH 9) and eluted using a linear gradient of 0-70% mobile phase B (25 mM ammonium bicarbonate + 1 M NaCl pH 9). Peak fractions pertaining to the CMP-AEP product were collected and lyophilized to a dry solid. 20-25 mg of the lyophilized product was dissolved into 600 µl of D 2 O before subsequent 1 H, 13 C, and 31 P experiments were performed. Phosphonoacetic acid was added as an internal standard for the 31

P experiments.
Protein crystallization. Purified Tde1415 in 50 mM Tris-Cl, pH 8.0 and 240 mM NaCl, were concentrated to 20 mg/ml with and without presence of 5 mM CTP/5 mM AEP. Tde1415 was initially screened against commercially available sparce-matrix screens in a 1:1 ratio, 1 μL final volume using an ARI Crystal Gryphon (Art Robins Instruments). The microplates were sealed, incubated at 18 o C, and conditions in which crystal growth was observed were used to design hanging-drop vapor diffusion expansion crystal plates. The ligand-free structure was determined by crystals grown against a crystallization solution composed of 0.01 M nickel II chloride hexahydrate, 0.1 M Tris-Cl, pH 8.5, 1.0 M lithium sulfate monohydrate and cryoprotected with an equivalent solution supplemented with 30% (v/v) glycerol. The product complex was determined from a crystal grown against a crystallization solution composed of 0.2 M magnesium acetate tetrahydrate, 0.1 M sodium cacodylate trihydrate pH 6.5, 20% (w/v) polyethylene glycol 8000 cryoprotected with an equivalent solution supplemented with 30% (v/v) glycerol.
X-ray data collection and processing. Protein crystals were retrieved, briefly introduced in cryoprotectant, and snap-frozen in liquid nitrogen. Data analysis was performed at the Advanced Photon Source (APS) at Argonne National Laboratory (Argonne, Illinois, Illinois & Michigan Canal State Trail, 9700 Cass Ave, Lemont, IL 60439, USA) and the Canadian Light Source (CLS) at the University of Saskatchewan (Saskatoon, Saskatchewan, Canadian Light Source Inc., 44 Innovation Boulevard, SK S7N 2V3, Canada). At the APS, X-ray diffraction data was collected at beamline 17-ID of the Argonne National Laboratory Synchrotron Facility. At the CLS, X-ray diffraction data was collected at beamline CLS-08-ID. In both cases images were collected using 0.25 oscillations at a temperature of 100 K and wavelength of 0.9795 nm. Data were indexed and scaled using AutoProcess in the space group P2 1 and data processed to 2.72 Å resolution. Note that while both datasets belong to the P2 1 space group, the unit cell dimensions were different. The "apo" structure of Tde1415 was solved via molecular replacement using the program PHASER 67 searching for four copies of PDB coordinates 1JYK 29 and 1VJO 37 which are the cytidyltransferase LicC from S. pneumoniae and an alanineglyoxylate aminotransferase from Anabaena [Nostoc.] sp. PCC 7120, respectively. Cycles of iterative model building, structure refinement, and density modification were performed with PHENIX 68 and were interspersed with inspection and manual building with COOT 69 . Restraints for CDP-Cho were generated via the Grade Web Server (http://grade.globalphasing.org). Ramachandran analysis of the apo structure revealed: 2288 (94.1%) torsion angles in the preferred region, 119 (4.9%) allowed, and 25 (1.0%) disallowed; for the Tde1415:CMP-AEP complex: 2325 (95.1%) preferred, 98 (4.0%) allowed, 21 (0.9%) disallowed.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.