Machine learning analysis of RB-TnSeq fitness data predicts functional gene modules in Pseudomonas putida KT2440

ABSTRACT There is growing interest in engineering Pseudomonas putida KT2440 as a microbial chassis for the conversion of renewable and waste-based feedstocks, and metabolic engineering of P. putida relies on the understanding of the functional relationships between genes. In this work, independent component analysis (ICA) was applied to a compendium of existing fitness data from randomly barcoded transposon insertion sequencing (RB-TnSeq) of P. putida KT2440 grown in 179 unique experimental conditions. ICA identified 84 independent groups of genes, which we call fModules (“functional modules”), where gene members displayed shared functional influence in a specific cellular process. This machine learning-based approach both successfully recapitulated previously characterized functional relationships and established hitherto unknown associations between genes. Selected gene members from fModules for hydroxycinnamate metabolism and stress resistance, acetyl coenzyme A assimilation, and nitrogen metabolism were validated with engineered mutants of P. putida. Additionally, functional gene clusters from ICA of RB-TnSeq data sets were compared with regulatory gene clusters from prior ICA of RNAseq data sets to draw connections between gene regulation and function. Because ICA profiles the functional role of several distinct gene networks simultaneously, it can reduce the time required to annotate gene function relative to manual curation of RB-TnSeq data sets. IMPORTANCE This study demonstrates a rapid, automated approach for elucidating functional modules within complex genetic networks. While Pseudomonas putida randomly barcoded transposon insertion sequencing data were used as a proof of concept, this approach is applicable to any organism with existing functional genomics data sets and may serve as a useful tool for many valuable applications, such as guiding metabolic engineering efforts in other microbes or understanding functional relationships between virulence-associated genes in pathogenic microbes. Furthermore, this work demonstrates that comparison of data obtained from independent component analysis of transcriptomics and gene fitness datasets can elucidate regulatory-functional relationships between genes, which may have utility in a variety of applications, such as metabolic modeling, strain engineering, or identification of antimicrobial drug targets.


Contents
have previously described roles in demethylation and formaldehyde tolerance.The znuB gene, which encodes the inner membrane pore of a zinc ABC transporter, was also included in fModule_28.(B) For P. putida KT2440 wild-type (KT2440), addition of 100 nM ZnSO4 to the growth medium resulted in a slight improvement in growth with vanillate, but not glucose, as the carbon and energy source.A P. putida mutant impaired in zinc transport (KDD007, DznuA1) displayed growth defects relative to wild-type in all conditions, but the effect was more pronounced during growth with vanillate as the carbon and energy source.Addition of 100 nM ZnSO4 to vanillate growth media resulted in a slight improvement in growth for KDD007, likely due to the presence of multiple zinc transporters in P. putida (1).Error shading indicates the standard deviation from the mean of three biological replicates.The pK18sB vector backbone was amplified using the oKDD005/006 primer pair.Genomic regions 1000bp upstream and downstream of znuA1 were amplified using oKDD001/002 and oKDD003/004, respectively.These amplicons were then assembled via Gibson assembly and transformed into chemically competent E. coli DH5a F'Iq cells.Transformed colonies underwent colony PCR with oCJ680 and oCJ547.A correct colony was miniprepped and Nanopore sequenced (Plasmidsaurus) to verify the correct sequence.
a Km r , resistance to kanamycin.

Figure S1 .
Figure S1.Dataset replicate quality and ICA result.(A) Pearson R correlation coefficients between every pair of the 332 experiments.Green and blue bars indicate a pair of known replicate samples and non-replicate samples (different conditions).(B) Determination of the optimal ICA dimension for the dataset.The optimal dimension was chosen where the number of final components is equal or greater than the number of non-single gene components.

Figure S2 .
Figure S2.Tree map depicting the proportion of variance explained by each fModule.The proportion of variance explained by each fModule corresponds to its area and the color of each fModule corresponds to a functional class designation.Numbers in the lower left of each box indicate the fModule number, as provided in Table1.

Figure S3 .
Figure S3.Characteristics of fModule_21 indicate its member genes function together in L-arginine metabolism.(A) Gene weights and (B) activity of fModule_21 under experimental conditions lacking L-Arg supplementation.

Figure S4 .
Figure S4.Activities of fModule_68 and fModule_73 indicate a role for member genes in lysine regulation, transport, and catabolism.(A,B) Gene weights for fModule_68 and fModule_73, respectively.(C,D) Activity of fModule_68 and fModule_73, respectively, when Llysine, D-lysine, or catabolic intermediates were used as a carbon or nitrogen sources.

Figure S5 .
Figure S5.Benzoate catabolism is the indicated function of fModule_5.ICA grouped genes with well-characterized, shared roles in benzoate catabolism, as indicated by (A) gene weights and (B) fModule activity values in conditions where benzoate was provided as a sole carbon source.

Figure S6 .
Figure S6.Activity of fModule_28 in conditions where O-methoxylated aromatics were provided as a sole source of carbon.Each condition on the x-axis refers to growth on a single carbon source, and the set number indicates the RB-TnSeq experiment set from which fitness data were collected.Error bars represent the standard deviation from the mean of 2-3 replicate RB-TnSeq experiments.

Figure S7 .
Figure S7.Zinc transport is important for growth with vanillate.(A) Members of fModule_28 have previously described roles in demethylation and formaldehyde tolerance.The znuB gene, which encodes the inner membrane pore of a zinc ABC transporter, was also included in fModule_28.(B) For P. putida KT2440 wild-type (KT2440), addition of 100 nM ZnSO4 to the growth medium resulted in a slight improvement in growth with vanillate, but not glucose, as the carbon and energy source.A P. putida mutant impaired in zinc transport (KDD007, DznuA1) displayed growth defects relative to wild-type in all conditions, but the effect was more pronounced during growth with vanillate as the carbon and energy source.Addition of 100 nM ZnSO4 to vanillate growth media resulted in a slight improvement in growth for KDD007, likely due to the presence of multiple zinc transporters in P. putida(1).Error shading indicates the standard deviation from the mean of three biological replicates.

Figure S9 .
Figure S9.Growth with L-Glu as the sole nitrogen source.Growth of individual transposon disruption mutants (gene::Tc1) was compared to wild-type (KT2440) in M9 minimal medium in microtiter plates, using 20 mM glucose as a carbon source and (A) 5 mM ammonium or (B) 5 mM L-Glu as the nitrogen source.Error shading indicates the standard deviation from the mean of three biological replicates.

Figure S10 .
Figure S10.Overexpression of amaC does not improve growth with or stress tolerance to aromatics, relative to wild-type.The growth of P. putida wild-type (KT2440) was compared to two strains engineered for overexpression of amaC.Strain ACB272 contains the strong Ptac promoter upstream of the native amaC sequence (Ptac:amaC), while strain ACB287 contains a second copy of amaC under control of the Ptac promoter at the fpvA locus (fpvA:Ptac:amaC).All strains were cultivated in M9 minimal medium in microtiter plates with (A) 20 mM glucose, (B) 10 mM 4-coumarate, (C) 10 mM ferulate, (D) 20 mM glucose + 60 mM 4-coumarate, (E) 20 mM glucose + 60 mM ferulate, or (F) 20 mM glucose + 60 mM protocatechuate.Error bars indicate the standard deviation from the mean of three biological replicates.

Table S1 .
Plasmids used in this study.

Table S2 .
Sequences of DNA oligos used in this study.Integrated DNA Technologies (IDT) was used for synthesis.

Table S3 .
Bacteria used in this study, including construction details for engineered strains and barcode details for individually arrayed mutants.Three transformants were pooled into a single culture to generate the strain stock.Km R .

Table S4 .
Synthetic DNA.Twist Biosciences was used for synthesis.