Machine Learning-Supported Enzyme Engineering toward Improved CO2-Fixation of Glycolyl-CoA Carboxylase

Glycolyl-CoA carboxylase (GCC) is a new-to-nature enzyme that catalyzes the key reaction in the tartronyl-CoA (TaCo) pathway, a synthetic photorespiration bypass that was recently designed to improve photosynthetic CO2 fixation. GCC was created from propionyl-CoA carboxylase (PCC) through five mutations. However, despite reaching activities of naturally evolved biotin-dependent carboxylases, the quintuple substitution variant GCC M5 still lags behind 4-fold in catalytic efficiency compared to its template PCC and suffers from futile ATP hydrolysis during CO2 fixation. To further improve upon GCC M5, we developed a machine learning-supported workflow that reduces screening efforts for identifying improved enzymes. Using this workflow, we present two novel GCC variants with 2-fold increased carboxylation rate and 60% reduced energy demand, respectively, which are able to address kinetic and thermodynamic limitations of the TaCo pathway. Our work highlights the potential of combining machine learning and directed evolution strategies to reduce screening efforts in enzyme engineering.


Materials
Chemicals were obtained from Sigma-Aldrich, Carl Roth GmbH + Co. KG, Santa Cruz Biotechnology Inc. and Merck.Biochemicals and materials for cloning and protein expression were obtained from Thermo Fisher Scientific, New England Biolabs GmbH and Macherey-Nagel GmbH.Coenzyme A was bought from Roche Diagnostics.Materials and equipment for protein purification were obtained from GE Healthcare, BioRad and Merck Millipore GmbH.Pyruvate Kinase/Lactic Dehydrogenase, Malic Dehydrogenase, Glucose-6-Phosphate Dehydrogenase, Glucose Dehydrogenase and Phosphoenolpyruvate carboxylase were bought from Sigma-Aldrich.

Strains
All strains used in this work are listed in the following table.
Invitrogen E. coli ElectroMAX DH5α was used to create random mutagenesis libraries that were needed to produce a dataset of randomly mutagenized GCC variants to train a machine learning model for the prediction of beneficial mutations.E. coli NEB Turbo was used to construct and maintain plasmids with site-specific mutations in the gene for GCC.E. coli BL21-birA was derived from E. coli BL21 DE3 by introducing a vector that bears a biotin ligase gene from Methylorubrum extorquens that is required to activate GCC.E. coli BL21-birA was used for protein overexpression of GCC variants.

Plasmids
All plasmids used in this work are listed in the following Table S2.

Oligonucleotides
All oligonucleotides used in this work are listed in the following Table S3.a Candidates are labeled after the substitution that distinguishes them from GCC M5.Positions of substitutions correspond to PCC from M. extorquens. 2 All GCC variant substitutions are listed versus the PCC from M. extorquens.

Supplemental Figures
Figure S1 A-O) Lysate-based screen of random mutagenesis library (continued on next page).
Figure S1 P-T) Lysate-based screen of random mutagenesis library (continued).All graphs show the screening results of the same random mutagenesis library measured in 192-sample batches (per individual 384-well plate) represented in the different graphs.The overall absorbance decrease during the first 10,000 s is plotted on the y-axis and indicative for the ATP per carboxylation ratio.The initial slope of absorbance decrease during the first 500 s of the reaction is plotted on the x-axis and indicative for the carboxylation rate.In total 3840 samples including 3360 randomly mutagenized variants, 400 positive controls with unmutated GCC M5 and 80 negative controls with lysis buffer were measured.Black: randomly mutagenized variants of GCC M5.Orange: positive controls with unmutated GCC M5.Blue: negative controls with lysis buffer (Celllytic B; Sigma Aldrich) instead of cell lysate.A 340nm : Absorbance at 340 nm

Figure S5 .
Figure S5.Cryogenic electron microscopy (Cryo-EM) data collection and analysis for GCC M5 L100N.Cryogenic electron microscopy data collection and analysis of GCC M5 L100N.A) Schematic processing workflow for the electron map of GCC M5 L100N.Dataset was collected on a Titan Krios G3i electron microscope equipped with a Gatan BioQuantum-K3 imaging filter.B) Gold-standard Fourier shell correlation plot from map refinement in CryoSPARC.Resolution determined at Fourier shell correlation (FSC) = 0.143.C) Angular particle distribution.D) Distribution of local resolution at FSC = 0.143.E) Local resolution as calculated by CryoSPARC mapped onto the refined density with different views (top and side view) shown.F) Map to atomic model FSC plot with resolution (masked and unmasked) determined at FSC = 0.143.

Figure S6 .
Figure S6.Michaelis-Menten kinetics for GCC M5 G20R and L100N.Michaelis-Menten kinetics were determined via LCMS, each kinetic was measured in triplicates and at least six different substrate concentrations.Quantification of product formation was used to determine enzyme activities.The data were analyzed using nonlinear regression.A) GCC M5 G20R with glycolyl-CoA as starting substrate.B) GCC M5 G20R with acetyl-CoA as starting substrate.C) GCC M5 L100N with glycolyl-CoA as starting substrate.D) GCC M5 L100N with acetyl-CoA as starting substrate.

Table S6 . Spectrophotometric measurements of selected GCC variants.
Candidates are labeled after the mutation that distinguishes them from GCC M5.The positions of the substitutions are corresponding to the original PCC of M. extorquens.b Measured at 0.5 mM glycolyl-CoA and 37 °C.n.d.= not detectable, n.m. = not measured, n = 6 for GCC M5, G20R and L100N, n = 3 for all other variants. a