Insights into Substrate Recognition by the Unusual Nitrating Enzyme RufO

Nitration reactions are crucial for many industrial syntheses; however, current protocols lack site specificity and employ hazardous chemicals. The noncanonical cytochrome P450 enzymes RufO and TxtE catalyze the only known direct aromatic nitration reactions in nature, making them attractive model systems for the development of analogous biocatalytic and/or biomimetic reactions that proceed under mild conditions. While the associated mechanism has been well-characterized in TxtE, much less is known about RufO. Herein we present the first structure of RufO alongside a series of computational and biochemical studies investigating its unusual reactivity. We demonstrate that free l-tyrosine is not readily accepted as a substrate despite previous reports to the contrary. Instead, we propose that RufO natively modifies l-tyrosine tethered to the peptidyl carrier protein of a nonribosomal peptide synthetase encoded by the same biosynthetic gene cluster and present both docking and molecular dynamics simulations consistent with this hypothesis. Our results expand the scope of direct enzymatic nitration reactions and provide the first evidence for such a modification of a peptide synthetase-bound substrate. Both of these insights may aid in the downstream development of biocatalytic approaches to synthesize rufomycin analogues and related drug candidates.


Enzyme expression and purification
The synthesis and cloning of the gene encoding rufO from Streptomyces atratus into a pET28a(+) plasmid carrying a 6xHis N-terminal tag was performed by Twist Bioscience (San Francisco, CA). Expression and purification procedures for RufO (Unitprot: A0A224AU14) were adapted from previously determined methods described by Tomita et al. 1 In short, E. coli strain BL21(DE3) cells were transformed with the pET28a(+)/rufO plasmid using the heat shock method and incubated overnight on lysogeny broth (LB) agar plates containing 50 µg/mL kanamycin. We subsequently inoculated 150 mL of LB (25 g/L) containing 50 µg/mL kanamycin with a single colony of E. coli BL21 cells carrying the pET28a(+)/rufO plasmid. This starter culture was grown at 37 °C for 18 hours before subculturing 10 mL per 1 L of LB with 50 µg/mL kanamycin, 0.6 mM ALA and 0.15 mM FeSO4. Cells were then incubated at 37 °C and shaken at 240 rpm until the cultures reached an OD600 = 0.6, after which IPTG was added to a final concentration of 0.1 mM. Following induction, cells were further incubated at 18 °C and 240 rpm for 20 hours. 2 Cells were harvested via centrifugation at 9,000 x g for 15 minutes then flash frozen in liquid nitrogen for storage at -80 °C.
All subsequent purification steps were performed at 4 °C to ensure protein stability. 4 mL of buffer A (20 mM NaPi, pH 8.0, 200 mM NaCl, and 5% glycerol) was added per g of cell paste, supplemented with 1X protease inhibitor cocktail, 1 mM phenylmethylsulfonyl fluoride (PMSF), 10 µg/mL DNase, and 1 mg/mL lysozyme. The resuspended cells were then lysed via sonication (10 s on, 10 s off for 45 minutes total at 50% amplitude). The resulting cell-free extract was collected by centrifugation at 32,000 x g for 30 min and loaded onto HisPur TM Ni-NTA resin equilibrated with 10 column volumes (CV) of buffer A. The column was then washed with 10 CVs of 10 mM imidazole in buffer A before elution at 40 mM imidazole in buffer A. The fraction containing RufO was concentrated using Thermo Fisher Scientific Pierce TM Protein Concentrators having a 30-kDa molecular weight cutoff and exchanged into buffer B (20 mM HEPES, pH 8.0, 400 mM NaCl, and 5% glycerol) using a Cytiva PD-10 column (Marlborough, MA) for further purification via size-exclusion. In order to obtain sufficient purity for crystallization, the enzyme was passed through a Cytiva HiLoad 16/600 Superdex 200 size exclusion column preequilibrated in buffer B and eluted using an isocratic gradient at 0.5 mL/min. Elution of RufO was monitored via absorbance at 280 nm and analyzed via SDS-PAGE. The purest fractions were pooled and concentrated to 25.4 mg/mL using Amicon Ultra filters with a 10-kDa molecular weight cutoff. Protein concentration was determined using a Thermo Fisher Scientific NanoDrop One C (Waltham, MA) with an extinction coefficient of ε280 = 29.45 mM -1 cm -1 estimated by Expasy ProtParam. 3 Samples were then buffer-exchanged into 20 mM HEPES, pH 8.0, 20 mM NaCl, 0.1 mM DTT, and 5% glycerol, before being stored at -80 °C.

Crystallization of holo-RufO
The N-terminally 6xHis-tagged enzyme was first exchanged from the storage conditions above into a buffer comprised of 20 mM HEPES, pH 8.0, 20 mM NaCl, and 5% glycerol, and diluted to a final concentration of 13 mg/mL. Crystallization was performed via the hanging drop vapor diffusion method at room temperature, during which RufO was mixed 1:1 with a precipitant solution of 0.2 M MgCl2, 0.1 M BIS-TRIS, pH 6.7, and 23% (w/v) PEG 3350 to generate a final drop volume of 2 µL. Red, cube-like crystals appeared within 3 days and were fully formed within 1 week. Upon looping, the crystals were soaked in a sub-stock of precipitant solution supplemented with 30% (v/v) ethylene glycol before flash freezing in liquid nitrogen.

X-ray data collection and processing
All data were indexed, integrated, and scaled using XDS before merging with AIMLESS. 4,5 Model building and phasing/refinement were performed using Coot and Phenix, respectively. [6][7][8] The quality of the model was regularly assessed using Molprobity, and PDB_REDO was employed to check for model bias. 9,10 Selected data processing and refinement statistics are presented in Table S1. Figures depicting the structure were generated with PyMol. 11 Solvent-accessible surface areas and volumes were calculated using the Computed Atlas of Surface Topography of proteins (CASTp) webserver (Fig. S2). 12 X-ray diffraction data were collected at beamline 23-ID-B of the Advanced Photon Source at Argonne National Laboratory (Chicago, IL) using a Dectris Eiger X 16M detector. All crystals were maintained at 100 K to minimize x-ray induced damage. Preliminary data were collected using an inverse beam (Δϕ = 0.2°, wedge = 15°) and an incident wavelength set to the absorption edge of Fe at 1.740 Å. Molecular replacement single anomalous dispersion (MR-SAD) was performed using the AutoRickShaw automatic structure solution pipeline to phase the data. 13,14 As BLAST was unable to identify a PDB structure with sufficient sequence similarity with RufO to enable molecular replacement, an AlphaFold prediction of the apo-RufO structure was used as the search model. [15][16][17] A higher resolution X-ray dataset (1.87 Å) was subsequently obtained, for which associated images were collected sequentially (Δϕ = 0.2°), with an incident wavelength of 0.9762 Å. The final solution was determined via isomorphous replacement. Note that the same Rfree flags were maintained and extended from the lower resolution dataset to ensure unbiased refinement. 6 The resultant model contained electron density for a single molecule (residues 11-394, missing 167-180) and a heme cofactor in the asymmetric unit.

UV-vis absorption spectroscopy
UV-vis absorption spectra were recorded using an Agilent Cary 3500 spectrophotometer (Santa Clara, CA) with a final protein concentration of 20 µM in a solution of 20 mM HEPES, pH 8.0, 20 mM NaCl, and 5% glycerol. Ferrous RufO samples were prepared in a Coy Laboratory Products anaerobic chamber (Grass Lake, MI) containing an atmosphere of approximately 97% N2 and 3% H2. All solutions were purged of O2 prior to use. RufO was reduced to the ferrous state by the addition of 1 mM dithionite and transferred to a sealed cuvette prior to measuring the absorption spectrum. Spectroscopic analysis revealed the enzyme remained reduced for several days using this preparation. The effect of L-Tyr on RufO absorption spectra was determined via titrating L-Tyr solubilized in either water or 1 M HCl. The effect of HCl alone was determined by titrating in an equivalent amount of 1 M HCl. Representative RufO spectra depicting the addition of 400 µM L-Tyr, 120 mM HCl, or both were chosen for Fig. 3A. Resulting spectra were corrected for dilution factor and absorbance at 800 nm. The effect of L-Tyr analogues on the absorption spectra of RufO was determined by the addition of 400 µM L-Tyr methyl ester (TME) or N-acetyl L-Tyr ethyl ester monohydrate (TEEM).

Stopped-flow spectroscopy
UV-vis absorption spectra of transient complexes were recorded using a two-syringe OLIS RSM 1000 stopped-flow spectrophotometer (Bogart, GA) equipped with a thermostated water bath. All experiments were carried out at 4 °C with a final protein concentration of 20 µM. Protein and substrate solutions were prepared in 20 mM HEPES, pH 8.0, 20 mM NaCl, and 5% glycerol inside an anerobic chamber, as previously described. To ensure the ferrous state was maintained prior to fast-mixing on the stopped-flow device, excess dithionite (30 mM) was added to 40 µM of ferric enzyme to generate the reduced state. The oxidation state was confirmed by UV-visible spectroscopy utilizing a sealed cuvette, similar to experiments described above. To remove oxygen from the stopped-flow lines, the system was equilibrated with 30 mM dithionite, before washing with anaerobic buffer. NO-binding to the ferrous enzyme was assessed by mixing with a solution of 800 µM DEA NONOate prepared in degassed buffer. Similarly, O2-binding was assessed by mixing with buffer bubbled with O2 for 20 s at 4 °C (~400 µM). 18 Time-resolved spectral changes were recorded using a DeSa rapid-scanning monochromator equipped with a 0.6 mm entrance slit and a grating of 400 lines per meter and 500 nm blaze wavelength. Single value decomposition and global analysis was performed using Olis GlobalWorks to generate representative spectra of the Fe(III)-O2 •and Fe(III)-NO traces (Figs. S3 & S4).

LC-MS analysis of RufO activity
Various reaction conditions were tested to ensure the lack of detectable product was not due to experimental setup, including permutations of substrate, DEA NONOate, peroxynitrite, ferredoxin, and ferredoxin-NADP + reductase in a buffer of 25 mM Tris-Cl, pH 8.0, and 5% glycerol. All reaction mixtures were first incubated with RufO and substrate for 30 min at 23 °C to allow for binding. To assess activity with small molecule mimics for the phosphopantetheinyl arm of a PCP-bound L-Tyr substrate, L-Tyr was replaced with TME or TEEM. Reactions were either set up under anaerobic or aerobic conditions. For anaerobic conditions, the reactions were prepared in an anaerobic chamber, as previously described, and dithionite was used to reduce the enzyme to the ferrous state. Anaerobic reactions were initiated by removing the samples from the glovebox, bubbling with air briefly to introduce O2, and then inverted to mix. Aerobic reactions were prepared outside the glovebox, where NADPH, ferredoxin, and ferredoxin-NADP + reductase were utilized as a reducing system. Specific conditions for each activity assay can be found in Figures S5-10. Reactions were carried out for 2 hrs at 23 °C, after which 20 µL of 1 M HCl was applied to solubilize potential products and denature RufO. Reaction mixtures were subsequently passed through Millipore 0.5 mL 3-kDa molecular weight cutoff centrifugal filters, and product formation was assessed via electrospray ionization mass spectrometry (ESI-MS) operated in positive ion mode at the Georgia State University or Emory University Mass Spectrometry Facilities. Standards containing only L-Tyr, 3-nitro-L-Tyr, TME, or TEEM were analyzed with each mass spectrometer to ensure the proper m/z values for each molecule were analyzed in subsequent experiments.
For direct injection MS, we utilized a Waters Xevo G2-XS Mass Spectrometer (Milford, MA) at Georgia State University. Samples were diluted by 100x with water, then 5 µL aliquots were introduced into the ion source through an autosampler with a flow rate of 200 μL/min. The instrument operation parameters were optimized as follows: capillary voltage of 1000 V, sample cone voltage of 20 V, desolvation temperature of 350 °C, and a source temperature of 120 °C. Nitrogen was used as cone gas and desolvation gas at flow rates of 25 and 800 L/h, respectively. Resulting mass spectra were acquired over a scan range of 50-800 m/z. MassLynx 4.2 software was used for data acquisition and processing. Resulting extracted ion chromatograms (EICs) were analyzed using MATLAB.
For LC-MS, a Thermo-LTQ Orbitrap Velos Mass Spectrometer equipped with a Thermo Dionex Ultimate 3000 dual pump, DAD, and a Shimadzu SIL-20AC HT autosampler at Emory University was employed.

S5
The instrument was controlled with Thermo Xcalibur and Chomeleon Xpress. Mass spectra were taken at a resolution of 60,000 for each of the spectra with the agc set at 50,000 and a maximum injection time of 100 ms. The source used a heated electrospray ionization probe at 3.0 kV with a source temperature of 350 °C. The sheath gas flow rate was set at 35 arbitrary units and the aux flow rate at 5 arbitrary units. The capillary temperature was set to 320 °C and the S lens Rf level set to 60%. A 10 µL injection volume per sample was used. The chromatography used a Zorbax RRHD Eclipse Plus C18 column at a 0.2 mL/min flow rate equilibrated in 98% solvent A (water) and 2% solvent B (acetonitrile). A multi-step gradient 2-60% of solvent B was applied from 0.1-10 min. Subsequently, a gradient of 60-95% of solvent B was applied from 10.01-13 min. The column was equilibrated for 5 min before subsequent sample injections. Resulting EICs were analyzed using Thermo Freestyle.

Sequence similarity analysis of NRPS substrate-binding pockets
The sequences of 1,547 A-domains were downloaded from the non-ribosomal peptide synthase substrate predictor database (NRPSsp). Among these, 22 distinct sequences were annotated as L-Tyr-binding domains and 15 as L-Trp-binding domains. 19 This subset of sequences, along with TxtB and the A-domain from the third module of RufT (UniProt: A0A224ANA9), were aligned to the A3-A6 phenylalanine activation domain sequence of GrsA via the Clustal method in Jalview. 20,21 Using methods described by the NRPS prediction blast server, the 8 core amino acids lining the binding pocket of each A-domain were identified. 20

In silico analysis of RufO binding to the third PCP-domain from RufT
As no crystal structures exist for any portion of RufT, a model of the PCP-domain (residues 3029-3104) was first generated using AlphaFold. The phosphopantetheinyl moiety of the PCP-domain was subsequently modeled onto Ser36 in Pymol, and the prosthetic group was optimized using the program's 'clean' command, which performs an energy minimization using the MMFF94 forcefield. Docking was then performed using the High Ambiguity Driven protein-protein DOCKing (HADDOCK) 2.4 server. 22,23 All simulations were run with default parameters for protein-protein interactions, excluding the minimum percentage of relative solvent accessibility (RSA), which was defined to consider a residue as accessible at 5.0%. To generate a structure of the complete enzyme, residues corresponding to the unmodeled FG-loop in RufO (residues 166-181) were added using COOT. The final rigid body template upon inclusion of this loop was generated via a 30 ns molecular dynamics (MD) simulation in GROMACS (see below for details). The docked complex formed between the apo-PCP-domain and the complete CYP enzyme was used to define ambiguous interaction restraints (AIRs) for subsequent HADDOCK runs with the phosphopantetheinyl-bound PCP-domain and the incomplete crystallographic model of RufO. A model in which the phosphopantetheinyl arm was observed reaching for the heme cofactor in the active site of RufO was utilized for further analysis.

Molecular dynamics simulations of RufO
To ensure the docked conformation of the phosphopantetheinyl arm represented a stable configuration, MD simulations were performed. A structure of the docking model with just the phosphopantetheinyl arm was obtained by replacing RufT with a methyl group at the β-carbon of Ser36. 24,25 The carboxyl group of L-Tyr was attached to the free thiol of phosphopantetheinyl arm using Avogadro, and the geometry minimized using steepest descent over 1000 steps with constraints placed on the hydroxyl O of L-Tyr, the phosphate P of phosphopantetheinyl moiety, and all surrounding amino acids. Parameters for the phosphopantetheinyl moiety were obtained using AmberTools20, 23 and atomic charges were derived using the AM1-BCC method in antechamber. 24 Protonation states of amino acid side chains were assessed using the H++ tool, [26][27][28] and the resulting topology and coordinate files were used to generate a protonated structure in AmberTools. To obtain the parameters for the high spin (S=2) heme of RufO, the Metal Center Parameter Builder python extension (MCPB.py) was used. 29,30 Bond and angle force constants in MCPB.py were derived using the B3LYP functional, 31, 32 the cc-pVTZ basis for the Fe, and cc-pVDZ for the remaining atoms using Gaussian 09. 33 The remainder of the protein was parameterized using the Amber99SB-ILDN forcefield. 34,35 AMBER parameters were translated to GROMACS compatible topologies using acpype. 36 In preparation for simulations carried out using GROMACS 2019.3 37 and the Amber99SB-ILDN forcefield, the phosphopantetheinyl-bound RufO complex was placed in the center of an octahedral box (~10 Å from each hexagonal face), and subsequently solvated using the TIP3P model of water. Neutralization of the system was carried out by adding Na + counterions. The system was then subjected to 5000 steps of energy minimization using steepest descent, followed by controlled heating to 300 K at an NVT ensemble using the velocity rescaling thermostat over 500 ps. 38 The NPT ensemble was obtained using a Parrinello Rahman barostat set to 1 bar. 39 Note that the LINCS algorithm was used to constrain bonds involving hydrogens, and protein-ligand complex positions were restrained using a harmonic potential of 1000 kJ mol -1 nm -1 during the heating process. 40 Production MD runs were carried out over 50 ns each with a time step of 2 fs and a target pressure of 1 bar. Structure files from the resulting simulations were extracted for analysis using Chimera 2021-6-26. 41 RMSD analysis was carried out relative to the motion of the protein and considered non-hydrogen atoms from the entire phosphopantetheinyl arm.

Supplementary Tables
Sequences and UniProt accession codes were obtained from NRPSps. b Bold and colored amino acids are consistent with the module 3 A-domain of RufT.           The carbon on L-Tyr that receives the nitro-group via direction nitration by RufO is marked with a star. Note that the substrate, heme cofactor, and RufO are colored as in the main text.