Pangenomic landscapes shape performances of a synthetic genetic circuit across Stutzerimonas species

ABSTRACT Engineering identical genetic circuits into different species typically results in large differences in performance due to the unique cellular environmental context of each host, a phenomenon known as the “chassis-effect” or "context-dependency". A better understanding of how genomic and physiological contexts underpin the chassis-effect will improve biodesign strategies across diverse microorganisms. Here, we combined a pangenomic-based gene expression analysis with quantitative measurements of performance from an engineered genetic inverter device to uncover how genome structure and function relate to the observed chassis-effect across six closely related Stutzerimonas hosts. Our results reveal that genome architecture underpins divergent responses between our chosen non-model bacterial hosts to the engineered device. Specifically, differential expression of the core genome, gene clusters shared between all hosts, was found to be the main source of significant concordance to the observed differential genetic device performance, whereas specialty genes from respective accessory genomes were not significant. A data-driven investigation revealed that genes involved in denitrification and components of trans-membrane transporter proteins were among the most differentially expressed gene clusters between hosts in response to the genetic device. Our results show that the chassis-effect can be traced along differences among the most conserved genome-encoded functions and that these differences create a unique biodesign space among closely related species. IMPORTANCE Contemporary synthetic biology endeavors often default to a handful of model organisms to host their engineered systems. Model organisms such as Escherichia coli serve as attractive hosts due to their tractability but do not necessarily provide the ideal environment to optimize performance. As more novel microbes are domesticated for use as biotechnology platforms, synthetic biologists are urged to explore the chassis-design space to optimize their systems and deliver on the promises of synthetic biology. The consequences of the chassis-effect will therefore only become more relevant as the field of biodesign grows. In our work, we demonstrate that the performance of a genetic device is highly dependent on the host environment it operates within, promoting the notion that the chassis can be considered a design variable to tune circuit function. Importantly, our results unveil that the chassis-effect can be traced along similarities in genome architecture, specifically the shared core genome. Our study advocates for the exploration of the chassis-design space and is a step forward to empowering synthetic biologists with knowledge for more efficient exploration of the chassis-design space to enable the next generation of broad-host-range synthetic biology.

A s synthetic biologists continue to explore the design space of engineered genetic circuits, we are presented with a complex landscape where functional fidelity depends on host physiology, environment, and genetic tractability (1,2).This has given rise to the subdiscipline of broad-host-range synthetic biology, which aims to create versatile genetic systems that can operate across diverse organisms and is particularly beneficial toward biotechnologies that capitalize on microbial diversity (3,4).Despite the plethora of modular genetic parts available, we are still faced with the challenge that identical genetic devices exhibit consequentially different performance depending on the host context the device operates in, a process termed the "chassis-effect" (5,6) or "context-dependency" (7,8).The chassis-effect lowers the predictability of circuit function based on part composition alone and can cause any circuit optimization typically done in a model organism (e.g., Escherichia coli) to be rendered null once introduced into a different host environment (9).This added layer of instability limits the current state of microbial biodesign and biases our understanding toward model organisms, even though more suitable hosts may exist for a given application (10)(11)(12).Overall, the chassis-effect constrains the design-build-test cycle by demanding costly repetitions of trial-and-error experimentation.There remain major knowledge gaps as to which cellular processes underpin species-specific chassis-effects.Closing this knowl edge gap undoubtedly enables more efficient engineering of novel hosts and moves the current engineering dogma toward the notion that the host itself can be considered a part of tuning circuit functions (7,8), representing a paradigm shift toward a new understanding of different microbial species and their unique cellular environment as customized hardware for biodesign (13).
Previous efforts to address this knowledge gap have shown that device performances between bacterial hosts can be better explained by the differences in the physiology of hosts rather than genomic or phylogenetic relatedness (14).In turn, the available physiological states for a given host are ultimately shaped by those functions encoded in their genomes, and perhaps more importantly, the expression of gene products that control cellular physiology.Deeper investigations into the gene expression responses toward the activity of heterologous genetic circuits are needed to uncover insight into which cellular processes underpin chassis-effects.Designing such a study is difficult, as it requires a suit of microbial hosts capable of operating an identical genetic device and producing an observable chassis-effect under the same growth conditions while also sharing significant enough levels of genomic identity for their interspecies gene expression to be comparable.We developed a synthetic biology kit for the Stutzerimonas genus to specifically address this challenge.Recent re-examination of the Pseudomonas genus invigorated with high-quality genome sequencing data led to the delineation of the clade into several novel genera (15,16), one such genus being Stutzerimonas (17,18).Several members of the Stutzerimonas with sequenced genomes are available in culture collections and previous studies have highlighted the natural high transfor mation competence of Stutzerimonas spp (19,20).Furthermore, many Stutzerimonas members have innate phenotypes that have garnered attention as potential microbial cell factories and/or bioremediation agents (21)(22)(23)(24), making them appealing targets for domestication and biodesign applications.
In this work, we transformed an inducible genetic inverter device into six Stutzeri monas hosts that share a sufficient amount of genomic identity to form a sizeable core (i.e., orthologous genes shared between all hosts) and accessory genome (i.e., orthologous genes shared only by a subset of hosts or genes unique to host).We comparatively quantified an observable chassis-effect in device performance within this group of closely related Stutzerimonas hosts and sequenced the global transcriptomes during different operation modes of the inverter.This investigative workflow enabled us to ask the guiding scientific question of whether the observed chassis-effect is more influenced by the expression of conserved core genes that are fundamental to growth physiology and cellular housekeeping or by less conserved functions encoded within the accessory genome of our experimental platform.We were also able to ask if unique, species-specific gene expression patterns can serve as a concordant predictor of device performance.Our results show a significant correlation between the transcriptional response of shared core genes and inverter performance, as well as between growth physiology and inverter performance, suggesting the mechanism in which core genetic elements contribute to the chassis-effect effect is through changes in basic cellular physiology related to growth.

A Stutzerimonas tool kit for pangenome-guided synthetic biology
Connecting structures and functions of multiple genomes to a measurable chassis-effect requires a standardized, broad-host-range genetic circuit.We implemented a modified build from one of our previously described genetic inverters (14) (Fig. 1a).This inverter can be directionally induced (toggled) by anhydrotetracycline (aTc) and L-arabinose (Ara) and was cloned into plasmid pS5.The circuit features two inducible promoters, P BAD and P Tet , with each promoter regulating the expression of the other's cognate transcrip tion factor as well as a fluorescent reporter protein in a bicistronic manner.In all, 15 Stutzerimonas hosts were screened for plasmid transformability and inverter operabil ity, of which 10 were successfully transformed via electroporation with a pBBR1-KanR backbone vector (BB23).Six of these ten were chosen for further study, which are as follows: Stutzerimonas chloritidismutans NCTC10475 (S. chloritidismutans) (25), Stutzeri monas perfectomarina CCUG 44592 (S. perfectomarina) (18), Stutzerimonas degradans FDAARGOS 876 (S. degradans) (18), Stutzerimonas pgs16 24a13 (S. pgs16) (26), Stutzerimo nas pgs17 24a75 (S. pgs17) (26), and Stutzerimonas stutzeri DSM 4166 (S. stutzeri) (27,28).Phylogenomic analysis of the six Stutzerimonas species reveals them to be closely related (Fig. 1b), but comparing pair-wise average nucleotide identity reveals none to be of high similarity to cross the threshold (95%-97%) (29,30) to be considered of the same species (Fig. S1).Gomila et al. 's work (2022) (16,18) pioneered the clarification of the Stutzerimonas clade and also reported that our six hosts can each be considered distinct phylogenomic species.The names of our Stutzerimonas hosts follow the proposed names assigned by (18).
Comparative pangenomics of the selected Stutzerimonas hosts revealed almost equal sizes between the core and accessory genomes, propitiously setting the stage for investigating whether the measurable chassis-effect can be attributed to differential expression from the core or accessory genomes and which specific functions covary with host-specific device performance.Pangenome analysis of Stutzerimonas host genomes was performed using Anvi'o (31), leading to a total of 25,344 gene calls functionally annotated using the 2020 clusters of orthologous genes (COGs) database (Fig. 1c).These gene calls were grouped into 6,469 gene clusters or "pangenomic orthologous groups." A gene cluster is grouped into a "core" or "accessory" frequency group depending on the number of genomes the gene cluster occurs in.Core gene clusters were defined as the 42.5% (2,751) of gene clusters that had hits across all hosts.The "accessory" gene clusters made up the remaining 57.5% of gene clusters that had hits in five or fewer genomes.Within the accessory genome, we further distinguish gene clusters exclusive to a single genome, referred to as "unique" gene clusters, which make up 31.1% (2,013) of total gene clusters.The clustering of the six hosts by their presence/absence of the identified orthologous gene clusters differs by the phylogenomic clustering of GToTree (Fig. 1d).The number of host-specific gene calls ranged from 3,731 in S. degradans to 4,457 in S. pgs16, with all hosts sharing on average 67.2% ± 4.4% of genes.23% of all gene clusters were not assigned to any COG category (assigned to the "NA" group) (Fig. 1e), with an additional 3.6% and 4.9% assigned to COG category S (Function Unknown) and R (General Prediction Only), respectively.Of these three groups of lesser-known gene clusters, 81% belong to the accessory genome, with approximately half of these being unique gene clusters.This disproportionate number of unassigned gene clusters between the core, accessory, and unique groups is consistent with previous bacterial pangenome studies (17) and is theorized to be due to the accessory genome tending to house genes that confer a specific advantage to the organism within its niche environ ment.These gene clusters might therefore only be expressed under certain conditions, making them difficult to study under cultivation conditions that have been standardized across multiple species.
The seemingly uneven distribution of COG categories across the three frequency groups prompted an investigation of whether certain COG categories are overrepresen ted (Fig. 1f).The Fisher exact test (P-value > 0.05, with Bonferroni Correction) revealed that within the core genome, COG categories consistently enriched (i.e., significantly enriched in ≥3 species) for categories associated with housekeeping functions such as category C (Energy Production and Conversion), D (Cell Cycle Control, Cell Division and Chromosome Partitioning), E (Amino Acid Transport and Metabolism), and O (Post-Trans lational Modification, Protein Turnover and Chaperones).Meanwhile, COG groups G (Carbohydrate Transport and Metabolism), P (Inorganic Ion Transport and Metabolism), Q (Secondary Metabolites Biosynthesis, Transport, and Catabolism), and K (Transcription) are overrepresented in the accessory genome, suggesting the presence of metabolic pathways that confer unique nutrient assimilation capabilities.We note that phenotypic characterization done by (18) reveals that S. chloritidismutans, S. degradans, and S. perfectomarina tested negative in the metabolization of arabinose, with no character ization data available for the remaining three.Comparative BLAST search using the sequence of E. coli araABCD operon as a query against the genomes of all six Stutzerimo nas hosts yielded no significant similarities in sequence as well.Few COG categories were found to be consistently overrepresented within the unique genome, likely due to the majority of genes being of unknown function, but the few are category X (Mobilome: Prophages and Transposons), which hints toward past viral infection events, and V (Defense Mechanisms).Other prokaryotic pangenome studies report similar patterns of enriched COG categories in their defined shared and accessory genomes (32)(33)(34).The distinct functional identity of genomic sub-groups gives merit to the hypothesis that the expression of genes among the core or accessory groups may uniquely contribute to the chassis-effect.

The chassis-effect is observable between closely related Stutzerimonas hosts
Comparing the quantified performance of the engineered genetic inverter operated by each host under a standardized environment revealed a clear chassis-effect.The performance of the inverter operating within Escherichia coli DH5α (E.coli) was also quantified as a reference.Induction response dynamics were characterized by fitting the Hill function [(βx n / (K n + x n )) + C] to normalized induction curves (Fig. 2a through d).
Parameter C is the baseline output at 0 inducer concentration, β is the max output level at saturating input levels, K represents the sensitivity of the system to inducer input as well as the input responsive range (activation coefficient), and the Hill coefficient n reflects the steepness of the response (varying from step-like or dosage dependent).These parameters collectively quantify interactions between inducers (aTc and Ara), their respective transcriptional factors (TetR and AraC), and the responsive operons, thereby quantitatively describing device performance.We observe markedly different perform ance profile from the genetic inverter depending on host context -that is, a strong chassis-effect.For instance, the observed K aTc values range from 1.46 to 35.11 nM aTc among the Stutzerimonas hosts (Fig. 2a and b), suggesting host-specific factors affecting the intracellular aTc concentration and/or different levels of TetR repressor.E. coli exhibited the highest sensitivity to aTc, with a K aTc value of 0.39.The K Ara metrics were relatively more uniform across hosts, with only an overall 2.8-fold difference between the smallest and largest K Ara values (Fig. 2c and d).We define an additional metric, DR Ara and DR aTc (dynamic range) as the ratio of the estimated β and empirical C expressed as a fold-change value, representing the largest possible fold-change difference measurable from respective reporter proteins.At saturated inducer concentrations, the highest β Ara achieved among the Stutzerimonas hosts was S. perfectomarina at 8,190 RFU, which also achieves the highest DR Ara value of 13,75.Overall, E. coli had the lowest β Ara at 750 RFU.Meanwhile the range of observed DR aTc values were relatively uniform, ranging from 3.22 to 4.60 among the Stutzerimonas.
We next performed a toggle assay to demonstrate the invertibility of the device and to further quantify the chassis-effect (Fig. 2e through h).Initial OFF cells (grown in the absence of inducer) were diluted into media with inducer to prompt Ara-ON or aTc-ON states, respectively (Fig. 2e), and performance metrics from normalized sfGFP and mKate fluorescence curves were estimated across induction states.These metrics (Fig. 2f) include the maximum steady-state fluorescence at the late growth phase (F ss ), maximum rate of fluorescence (Rate) at the exponential phase, specific dynamic range (DR S ), and lag time (Lag).Each of these metrics has different biological implications, depending on the induction state.For example, the fluorescence output under OFF conditions is a measure of the inverter's unbiased output level-that is, the host-spe cific background fluorescence/expression leakage.Meanwhile, fluorescence output from cells grown under the presence of an antagonistic inducer (e.g., sfGFP output in the presence of aTc) indicates deficient transcriptional control.The range of observed fluorescence curves and estimated performance metrics further solidifies the existence of the chassis-effect.Inversion between states was controlled by washing cells twice and diluting them into media containing respective opposite inducers (Fig. 2g and h).Results from the toggle assay showed that upon being toggled from Ara-ON to aTc-ON, multiple hosts experienced an attenuated mKate output, to different degrees.For instance, the DR S_aTc values of all hosts other than S. stutzeri all drop below 1 when diluted from Ara-ON to aTc-ON, meaning their induced mKate output becomes lower than their baseline output.This decreased output was not observed when diluted from OFF to aTc-ON and can therefore be attributed to a measurable hysteresis effect.Two additional toggle assays with induction schemes 0.25 mM-40 nM aTc and 0.375 mM Ara-5 nM aTc were performed (Fig. S2), and for S. chloritidismutans, S. degradans, and S. stutzeri, the mKate output attenuation was relieved under induction schemes with decreased Ara concentration.However, the other three hosts experienced consistent attenuation across toggling schemes.As cells were washed twice and diluted, the mechanism in which mKate output is attenuated is likely not due to residual extracellular Ara concentrations.This result suggests the invertibility (input/output logic) of the inverter is dependent on past induction states and that the dynamics of this hysteresis effect varies between hosts based on intracellular molecular physiology.

Species-specific physiology responses to the genetic inverter
A consistent growth inhibition was observed when comparing wild-type hosts against their engineered genotypic counterparts and across induction states (Fig. 3).Growth of hosts operating the genetic inverter was measured simultaneously during the toggle assay from which growth physiology was characterized (Fig. 3a through d).The reference host E. coli showed the highest growth on LB media compared to all Stutzerimonas hosts.The addition of BB23 backbone led to decreased specific growth rates of all strains compared to their respective wild-type counterparts, with S. pgs17 exhibiting the most reduced growth rate (Fig. 3e and f).The consistent growth reduction is expected due to the added burden of maintaining the vector backbone in the presence of kanamycin.E. coli and S. chloritidismutans exhibit the greatest growth burden upon the addition of the inverter device into the backbone (complete pS5 plasmid) under NI conditions, likely due to a larger plasmid payload (35) and leakage expression of the device.However, some hosts experience close to zero additional growth burden, and while S. chloritidis mutans had a relatively high amount of leakage expression (Fig. 2), E. coli also had the lowest leakage expression, suggesting the mechanism in which the backbone vector and device impose growth inhibition varies between hosts.Induction of the genetic inverter exacerbated the growth inhibition (Fig. 3e and f).For instance, the growth rate of S. pgs17 was almost halved when induced with Ara when compared with the NI condition but showed little response when treated with aTc.Overall, these results exemplify that the genetic inverter's mode of operation uniquely affects the growth of each host.
The unique intracellular molecular environment within each host is ultimately shaped by their respective genomes, and perhaps more importantly, the expression pattern of their gene products.We therefore extracted total RNA from cells induced in both directions for mRNA sequencing to determine their transcriptome profiles.By augment ing our comparative transcriptome analysis with pangenomic insight, we can discern the impact that the core and accessory genome have on the observed host-specific inverter performance.

Unique transcriptional patterns arise from both the presence and operation of an engineered genetic circuit
Global transcriptome analysis revealed marked variability in gene expression profiles between hosts operating the engineered genetic inverter.The difference in response was observed in terms of magnitude (number of differentially expressed genes or DEGs), the functional composition of DEGs, and the direction of regulation of certain gene clusters shared among hosts' core genomes.These unique transcriptomic profiles support our hypothesis that differences in gene expression from core and/or accessory genomes uniquely correspond with differences in genetic inverter performances.
Cross-species comparison of DEG profiles was performed by pooling all read counts mapped to gene calls within a gene cluster, on the basis that genes within the same gene cluster are inferred to be highly similar.Differential expression of mRNA encoded from the genetic inverter confirmed its programmed operability (Fig. 4).Cells treated with Ara showed a higher proportion of reads (measured by transcripts per million, TPM) mapped to tetR and sfGFP compared to araC and mKate, and vice versa for aTc-induced cells (Fig. S3).Differential gene expression analysis of Ara against aTc-treated hosts reveals significant upregulation of tetR and sfGFP genes and downregulation of araC and mKate, in accordance with the design of the inverter.The degree of response differed greatly between hosts.Log2 fold-change values of tetR ranged from 4.9 (S. chloritidismu tans) to 8.2 (S. degradans) and ranged from 1.1 (S. perfectomarina) to 4.1 (S. pgs17) for sfGFP.These results support the observed chassis-effect via mRNA abundances from genes encoded within the engineered genetic inverter.The consistent downregulation of the kanR gene in Ara-induced cells could be due to high degree of transcriptional readthrough in aTc-induced cells when transcribing from the P Tet promoter, highlighting the importance of context insulation (36).A relatively large difference in TPM values was also observed between polycistronic repressor-reporter pairs (Fig. S3).
The mRNA abundance of genes encoded by the genetic inverter intuitively plays a major role in influencing the observed chassis-effect, but these heterologous gene products are not compartmentalized away from native cellular elements.We were therefore prompted to also explore the global transcriptome response of all hosts to better understand how the expression of genes from the core and accessory genomes might be concordant with the chassis-effect.Our results reveal a clear diversity in the global differential gene expression responses between hosts when comparing Ara against aTc-induced cells, indicating that genome-encoded functions respond differently depending on how the engineered genetic inverter is being operated.(Fig. 5).Differential expression analysis with an adjusted P-value > 0.05 threshold leads to a total of 4,825 significantly DEGs distributed across hosts, with 3,995 DEGs belonging to the core group distributed among 1,672 gene clusters (Fig. 5a and b).We observed that each host significantly expressed only a subset of shared core gene clusters, with 63% of core gene clusters being expressed by at least three hosts.Among significant DEGs, the direction of regulation and the strength of response (captured by log2 fold-change value) within a gene cluster differs.The number of DEGs is also unevenly distributed between the hosts, varying from 589 (S. chloritidismutans) to 1,106 (S. pgs17) meaning the operation of the genetic inverter (i.e., the user-defined induction state) incites a greater response in some hosts than others.We note that the differences in DEG response between states cannot be entirely attributed to the expression of the inverter alone but also from the presence of inducer compounds Ara and aTc.However, the addition of induction compounds is inherently coupled to the function of the inverter by design and we therefore include responses to inducers as a contributor to the chassis-effect.A different clustering pattern is observed upon performing hierarchical clustering of the hosts in terms of their differential expression of the core and accessory group (Fig. 5c  and d) except for S. stutzeri, whose expression response clustered uniquely into its own branch from the other five hosts in both core and accessory groups.Overall, this result further solidifies the difference in expression response among the hosts.
We next examined the functional profile of the core and accessory DEGs by assessing their distribution among the COG categories.The core DEGs of hosts show a uniform distribution across the 22 COG categories while a more varied response pattern in the accessory genome is observed.Among core DEGs, 45.3% ± 1.5 % of the DEGs can be found in categories C, E, J, M, and T alone (Fig. 5e).Meanwhile, in the accessory genome, COG categories P, K, T, and NA (unassigned) make up 43.0% ± 2.6 % of DEGs (Fig. 5f), with NA gene clusters making up 17.1% ± 2.3 % alone.We observed differences in the composition of the upregulated and downregulated genes between hosts within functional groups (especially J, T, and M in the core group) suggesting varied responses to inverter activity.The less uniform transcriptomic responses from accessory genomes are attributed to gene clusters shared by only a subset of species.In the accessory genome, larger numbers of DEGs were concentrated in category T (Signal Transduction) for all hosts, indicating unique response patterns occurring within each host because of inverter activity.A larger variation in the distribution of DEGs was observed within a COG category as well.For instance, 31.3% of DEGs in category K occurred in S. perfectomarina alone, exemplifying the diversity in gene expression response patterns.We note that the majority of DEGs are unassigned (NA), meaning with improved gene annotation, the transcriptional profile of the accessory DEGs could change substantially.The accessory genome, comprising genes with specialized functions shared by only a subset, can be a strong source for genes contributing to an observed chassis-effect.However, differential expression of functions encoded within the core genome often belongs to the major carbon, nitrogen, and energy metabolism, cell division, and housekeeping genes, which can more intuitively underpin the chassis-effect, given that the majority of DEGs stem from the core genome.

Differential expression of core genes is concordant with the chassis-effect
Enrichment analysis and Procrustes Superimposition (PS) analysis both indicate that the key genome-encoded functions responsible for the chassis-effect originate from differences in the expression of the core genes shared among all hosts.PS analysis is a statistical test that can be used to determine the strength of correlation between two multivariate data sets by comparing the goodness-of-fit between two configurations consisting of n items (our hosts) on a coordinate plane.When the distances between points in the configuration carry information of (dis)similarity, such as points plotted along principal components, PS analysis can be used to determine whether the (dis)simi larities between two datasets correlate (37,38).PS analysis outputs the Gower statistic (m 2 ), which is the sum of squared vector residuals that remains after optimal fitting and can be tested for significance through a permutation method that maintains the internal co-variance of each data set.
Fisher's exact test showed that core gene clusters were significantly enriched within the significantly DEGs for all hosts, while accessory and unique gene clusters were significantly underrepresented (Fig. 6a through c).The strongest source of gene expression response to the inverter device operation thereby stems from core genes, suggesting the core genome to be significant in driving the chassis-effect.Building upon this result, we conducted PS analysis to answer the question of whether the differential inverter device performance between our hosts correlates with the differences between them in terms of their gene expression response (i.e., do hosts with more similar gene expression response also have more similar performance?).
Principal component analysis (PCA) was first performed to project the captured differences among our hosts along the first two principal components for each data set.These configurations in ordinate space were then compared by PS.The process of fit optimization is illustrated in Fig. 6d.To avoid artificial inflation of distance, the unique gene clusters were omitted from the data set of accessory DEGs, as these gene clusters inherently have no comparison between hosts.A significant correlation is observed when comparing differences in inverter performance and differences in expression response for all DEGs (Fig. 6d; P-value = 0.043, m 2 = 0.328).Upon splitting the DEGs into respective core and accessory groups, we found that it is in fact the core genome that is responsible for the significance observed previously (P-value = 0.035, m 2 = 0.312) (Fig. 6e  and f).This result supports the following notions that the observed chassis-effect can be explained by the differential gene expression response between hosts and that hosts with more similar expression patterns of their shared core genome also have significantly more similar performances.Hence, for our given set of species and their shared gene clusters, core genes commonly associated with housekeeping functions, central carbon, and energy metabolism, are major biological drivers of the chassis-effect.In addition, PS analysis applied to the captured growth metrics shows a significant correlation with inverter device performance (Fig. S4), corroborating our previous result that observed differences in the physiological state of hosts can be used to potentially predict device performance (14).
We also interrogated the specific cellular functions that are most concordant with the observed chassis-effect by dissecting the differentially expressed core and accessory genomes into their functional categories in a second iteration of PS analysis.Ten of the 23 COG groups were found to be significant (Fig. S5).Category E (Amino acid Transport and Metabolism) is especially intuitive because the main carbon and energy source provided by LB media are amino acids (not sugars) and different catabolic strategies for these will, in turn, cause different growth phenotypes.Categories R (General Function Prediction Only) and S (Unknown Function) were also significant, suggesting that genetic elements of unknown function are contributing strongly to the observed chassis-effect.

Genes clusters involved in denitrification and efflux pumps are highly responsive to genetic inverter activity within Stutzerimonas spp
A data-driven investigation of the top 100 most differentiated gene clusters between host cells revealed that gene clusters involved in denitrification, iron acquisition, and membrane-bound transport proteins are among the most highly differentially expressed gene clusters in response between Ara versus aTc treatment applied to the Stutzerimonas hosts (Fig. 7).This was determined by ranking gene clusters by a metric that takes into consideration the number of hosts significantly expressing each gene cluster, the sum of absolute log2 fold-change values, and the combinatorial sum of absolute differences in  S1).To add confidence in the functional annotation of the gene clusters, we supplemented the COG annotation with annotation using the KEGG Orthology (KO) database (39) because COG accessions are (in some cases) only general descriptions of protein functions or families.For instance, the assigned COG accession for the four highly ranked gene clusters GC_00001064, GC_00001456, GC_00001627, and GC_00001727 describes all these gene clusters to encode for cytochrome c protein (CccA).Meanwhile, the corresponding KO annotations for the four CccA-annotated gene clusters are cytochrome c55X (NirC, K19344), dihydroheme d1 dehydrogenase (NirN, K24867), nitrite reductase (NirS, K15864), and cyto chrome-containing nitric oxide reductase subunit c (NorC, K02305), respectively, which are all heme-containing enzymes belonging to the cytochrome c protein family (40,41).Notably, every KO accession has a complementary COG accession entry in the KEGG Orthology database (https://www.genome.jp/kegg/ko.html),and the KO and COG annotations discussed here all match their respective database records, indicating high degree of agreement between the two annotation methods.
Genes involved in denitrification were upregulated in S. chloritidismutans and S. stutzeri and downregulated in S. pgs16 and S. pgs17 but showed little change in S. perfectomarina and S. degradans when comparing Ara against aTc induction states.This result was not anticipated and exemplifies that the operation of engineered genetic devices can impact and/or be impacted by fundamental cellular functions seemingly unrelated to the heterologous transcription and translation networks.Denitrification is canonically described as an anaerobic process, but aerobic denitrification has been identified in numerous bacteria, many of whom have been identified as Pseudomonas stutzeri species (21,42,43).The gene napA encodes for a nitrate reductase which catalyzes the important first step in the denitrification pathway.Only one gene cluster, GC_00001817, was annotated as napA by KO annotation, but this gene cluster was only significantly differentially expressed in S. perfectomarina and ranked low.Genes involved in iron acquisition showed the opposite trend, being upregulated in S. pgs16and S. pgs17 and downregulated to some degree in all other hosts.Denitrification activity requires bioavailable iron for heme biosynthesis (44); hence, we expected the differential expression of the two pathways to instead positively correlate.
Other highly ranked gene clusters of note from the data-driven investigation are GC_00000078, GC_00000115, and GC_00000605.COG annotation inferred these gene clusters to encode for components of the AcrAB-TolC efflux pump.Meanwhile, KO infers them to be components of the MexEF-OprN efflux pump.Both pumps are broadsubstrate Resistance-Nodulation-Division family transporters known to provide drug resistance (45,46).Another high-ranking gene cluster was GC_00002718, annotated by COG as a transcriptional regulator part of the LysR family, but as the MexT protein by KO.Incidentally, MexT is a LysR-type transcriptional regulator regulating the MexEF-OprN operon, indicating that gene clusters GC_00000078, GC_00000115, and GC_00000605 indeed encode for a MexEF-OprN efflux pump, as AcrAB-TolC is regulated by the transcription factor AcrR, which instead belongs to the TetR family of transcriptional regulators (47).Interestingly, hosts that exhibit downregulated or low change in the expression of denitrification gene clusters were also the hosts with the highest upregula tion of the AcrAB-TolC/MexEF-OprN encoding gene clusters (hosts S. perfectomarina, S. pgs16, and S. pgs17).Fetar et al. (46) have also reported that nitrosative stress in the form of nitric oxide accumulation is a direct inducer of MexEF-oprN expression (46), suggest ing that hosts expressing the efflux pump could be a response to nitric oxide produced as part of denitrification activity.statistic is determined through a permutation method.Colored lines between points have been added to visualize an arbitrary "configuration" formed by each data set, which connects each point in the following arbitrary order "-CHL-PER-DEGR-PGS16-PGS17-STU-." PS analysis comparing inverter performance against significantly differentially expressed (e) core and (f ) accessory gene clusters.P = P-value, m 2 = Gower statistic.

DISCUSSION
This study was driven by the overarching question as to whether closely related bacterial hosts can exhibit a strong chassis-effect when programmed with an identical engineered genetic inverter and how contributions from the core and accessory genomes might underpin this phenomenon.This was investigated using dimension-reduction techni ques combined with multivariate statistical analysis to determine which portions of the differentially expressed genome were concordant with the measured chassis-effect and then further trace these differences into specific genome-encoded functions.In addition to our main finding, we report the successful engineering of several Stutzerimonas species.S. degradans is of interest as a biotechnology host with its known applications in the bioremediation of contaminants (18).S. pgs17 is inferred to possess the nifDKH genes according to KEGG pathway mapping (48) and therefore has potential application as a biological nitrogen fixation agent to replace agrochemical similar to S. stutzeri, the latter having already received much attention for its nitrogen fixation capability (27).
Given the standardized growth conditions on rich LB medium, it is tempting to theorize that accessory and host-specific unique genes would be the leading cause of phenotypic distinction (and resulting chassis-effects), yet our findings empirically showed no such relations between inverter performance and the accessory genome.Instead, we observed concordance between the transcriptional response of shared core genes and inverter device performance.In other words, hosts with more similar gene expression from their core genomes also have more similar performance, suggesting that the expression of the core genome, and, in turn, its protein products have a greater impact on the cellular state.This holds especially true if the accessory genome consists of genes that are not expressed unless conditions call upon their expression, such as for unique biosynthetic gene clusters that may uniquely influence how each host responds to a change in substrate availability or changing growth conditions (49).The core genome of our Stutzerimonas platform is enriched for genes involved in central carbon and energy metabolism as well as housekeeping-associated functions.Combined with the result that observed differential growth physiology between hosts cultured under identical conditions was found significantly correlated with differential perform ance, the chassis-effect may manifest through the redistributive flux of cellular resources and gene expression machinery unique to each host context because of genetic device maintenance and operation.This concurs with previous studies that have demonstrated how resource dynamics such as ribosome abundance (50) can affect genetic circuit performance and even parameterized resource competition to build predictive models (51,52).
Data-driven analyses showed that denitrification genes were responsive to heterolo gous expression of the engineered inverter.This indicates that engineered gene circuits can have unforeseeable and unspecific cross-talk with fundamental cellular processes and that these events are context-specific to individual hosts.If denitrification is in fact occurring, the source of nitrate in the media is unclear and would require a more targeted analysis of proteins and metabolic intermediates involved in denitrification pathways.A possible explanation is that our hosts can perform nitrification of ammonia to produce nitrate, with the source of ammonia being the by-product of amino acid catabolization from the oligopeptides in the LB media.But KEGG pathway mapping reveals that neither of two the genes involved in the canonical nitrification pathway (ammonia monooxygenase, amo, and hydroxylamine reductase, hao) are found in the genomes of our hosts (Fig. S6), suggesting a lack of nitrification capability.However, there have been numerous reports of other Stutzerimonas (reported in the defunct name Pseudomonas stutzeri) capable of converting ammonia to nitrogen gas through heterotrophic nitrification and aerobic denitrification (HNAD) despite their genomes encoding neither amo nor hao (21,42,53,54).These reported HNAD-capable species are thought to perform HNAD through an undocumented pathway, or the enzymes catalyzing the reactions performed by amo and hao are significantly different enough in amino acid composition to escape current annotation algorithms.The same enig matic HNAD process could be occurring for our Stutzerimonas species with increased denitrification activity.Denitrification is a pragmatic phenotype due to its bioremedia tion potential (55), but bacteria capable of simultaneous HNAD are valuable as the cost of maintaining anaerobic conditions is reduced (21).
Given the complex interwoven network structure of cellular metabolism (56), it is impractical to postulate that the observable chassis-effect between a given set of hosts can be explained by a single or even a set of predictable genome-encoded functions without experimental insight.The examples from this study were denitrification and efflux pumps but given that it is specific to the host context, device operations in other species/strains may influence different cellular functions.Hence, deriving a mechanistic metabolic model detailing how these factors interact to bring about the produced output is likely intractable and may not hold true across different sets of hosts.Inci dentally, with the advent of dimension reduction using multivariate statistical models in supervised machine learning, synthetic biologists can advance data-driven engineer ing strategies and bypass dependence on a priori mechanistic insight for biodesign applications (57,58).In contemporary biological research, unveiling a mechanistic model that explains an observation is considered the pinnacle goal, but the goals of synthetic biologists more often prioritize the practical implementation of designed systems rather than knowledge acquisition.Supervised machine learning aligns well with this goal (59).
We have here developed new knowledge of how observable chassis-effects of an engineered microbial platform can be explained by genome-encoded functions that are largely conserved and represent fundamental cellular processes.In addition, we uncovered a context-specific phenomenon that engineered genetic devices have unpredictable interference with metabolic functions that regulate cellular physiology.The evidence for these conclusions is corroborated by changes in species and even strain-specific global transcriptome profiles and growth phenotypes, and the use of multivariate statistics to attribute how quantifiable chassis-effects are concordant with genome structure and function.This represents a genome-informed advancement within the field of broad-host-range biodesign which aims to lessen our reliance on model organisms so that we can better understand diverse microbial behaviors and use them in the blueprints of biodesign.The number of assimilated microbes available for use as industrial biotechnology platforms and biodesign engineering is increasing steadily (10,60,61), heralding the advancement toward broad-host-range synthetic biology where synthetic biologists must explore not only the design space of genetic parts but also the design space of host-chassis to optimize their engineered systems.As the field of biodesign progresses toward this new era, the constraints imposed by the chassis-effect will only become more relevant.It is therefore of high interest to develop novel hosts and data-driven predictive frameworks capable of using genomic insight to understand how chassis-effects might limit or in some cases expedite design-build-test cycles for the future's biotechnology applications such as bioremediation of soil and marine contaminates and production of renewable agrochemicals.

Species, cultivation, cloning, and transformation
An overview of the species used in this study can be found in Table S2.The six Stutzerimonas hosts selected for further study were verified by their rpoD sequence, determined through sanger sequencing (62) (Table S3).Cells were cultured in Lysog eny-Broth (LB) at 35°C unless specified otherwise.BB23 backbone and pS5-carrying strains were cultivated in the presence of 100 µg/mL kanamycin while wild types were grown without.Single colonies from streaked LB agar plates were picked to inoculate liquid media and incubated overnight with shaking to prepare overnight cultures.199 µL of media was inoculated with 1 µL of overnight culture in black clear-bottom 96-well plates (Thermo Fischer, 165305) and sealed with Breath-Easy film (Sigma-Aldrich, Z380059).OD 600 , sfGFP (Ex 485/Em 515, gain 75), and mKate (Ex 585/Em 615, gain 125) fluorescence was measured continuously using a Synergy H1 plate reader (Agilent Biotek, Serial Number 21031715) with continuous linear shaking (1,096 cpm, 1 mm) at 9 mm read height.Working stock solutions of 1 M L-Arabinose (VWR, A11921) and 1 mM aTc (VWR, CAYM10009542) were prepared by dissolving in MilliQ water and 70% ethanol, respectively.Cloning was performed using E. coli DH5α, made chemically competent, and transformed following the Inoue method (63).Stutzerimonas species were transformed via the electroporation method as previously described in Chan et al. (2023) (14).Primers used in this study can be found in Table S4.BB23 backbone was integrated into the BASIC Assembly format using pSEVA231 as a template with primers B_SEVA_F and B_SEVA_R.Plasmid pS5 was assembled in the Biopart Assembly Standard for Idempotent Cloning (BASIC) (64, 65) environment as previously described (14).Sequence and accession of pS5 components can be found in Table S5.

Induction assays
Overnight culture grown in the absence of an inducer was used to inoculate the media with various concentrations of aTc and Ara in 96-well plates.The normalized steady-state fluorescence at the late growth phase (Fss) averaged over a time window of 6-12 hours was used as a response variable of induction curves.In R (v4.3.1),Hill coefficient (n), activation coefficient (K), and max steady-state fluorescence output (β) were estimated by fitting the Hill function (1) using non-linear least-square regression with the "nls" function from the base R stats package.For parameter C, representing basal fluorescence output at 0 inducer concentration, an empirical value was used.

Toggle and growth assay
Overnight culture grown in the absence of inducer was used to inoculate media in 96-well plates supplemented with Ara, aTc, and no inducer condition.To toggle, cells were harvested by centrifugation at 4,000 RPM for 20 minutes at room temperature and the supernatant was removed before resuspending in 200 µL LB media, this washing step was repeated for a total of two washes.After final resuspension, 1 µL of washed cells was inoculated to 199 µL fresh media supplemented with the opposite respective inducer.
The growth difference metric Δµ was calculated using equation 2.
(2) ∆ μ = μ condition_1 − μ condition_2 /μ condition_1 *100 Where µ is max specific growth rate and "condition_1" and "condition_2" denote two different sample conditions describing genotype and induction state.In R, max rates of OD 600 and normalized fluorescence curves were estimated based on a rolling regression method using the "all_easylinear" function from the growthrates (v.0.8.4) R package.Lag times and curve plateaus of OD 600 and normalized fluorescence curves were determined using the "all_growthmodels" function, fitting the Gompertz growth model (66) with an additional lag (λ) parameter.

Phylogenomic tree and pangenome
Genomes of Stutzerimonas hosts were downloaded from NCBI.GToTree (67) was used to infer phylogenomic relationship using the 174 single-copy gene set (Gam maproteobacteria Hidden Markov Model set) under default settings, which infers tree through Maximum-Likelihood using FastTree (68).Comparative pangenomic analysis was performed in the Anvi'o (v7.1) environment.pS5 plasmid nucleotide sequence was manually added to each Stutzerimonas genome file.For complete data on binned gene clusters obtained from comparative pangenomic analysis, see Data and Code Availability.Unless specified, all Anvi'o commands were run using default settings.Briefly, genomes were converted to contigs databases using the "anvi-gen-genomes-storage" command, which uses Prodigal (69) to make gene calls.Gene calls were annotated with the Clusters of Orthologous Genes 2020 (70) database through Anvi'o and the KEGG Orthology (39) database through KofamKOALA.Pangenome analysis was done using "anvi-pan-genome" command with the --mcl-inflation parameter set to 10 for high cluster granularity and DIAMOND was run with the "-sensitive" flag as recommended when comparing genomes from closely related organisms.Gene clusters were binned according to their number of occurrences across the six genomes in the interactive Anvi'o pangenome display command "anvi-display-pan, " from "Bin 1" to "Bin 6. " We define gene clusters as belonging to the core genome if it has hits in all six genomes (Bin 6).All other gene clusters were defined as accessories, with gene clusters with only one hit ("Bin 1") further distinguished as unique.

Total RNA extraction and RNA sequencing
Cultures for total RNA harvesting were initiated as described for toggle assay with induction scheme 0.75 mM Ara and 20 nM aTc to ensure transcriptome profile represen tative of the toggle assay.Cells were harvested at the late exponential growth phase to ensure enough cell mass by centrifugation at 10,000 rpm for 1 minute before discarding the supernatant and immediately freezing in liquid nitrogen.Total RNA was extracted using the Quick-RNA Miniprep Kit (Zymo Research, R1055) following the manufacturer's instructions.The kit includes a cell lysis step and 15-minute on-column DNase I treatment at room temperature.RNA samples in biological triplicates for each group were sent to Eurofins Genomics (INVIEW Transcriptome Bacteria) for quality control, rRNA depletion [NEBNext rRNA Depletion Kit (Bacteria), New England Biolabs], cDNA library preparation, and sequencing (Illumina NovaSeq 6000 S4 PE150 XP).Across all samples, an average of 23.7 ± 3.1 million reads was obtained.

RNA-Seq and differential gene expression analysis
Trimming, mapping, and read counting were done in QIAGEN CLC Genomic Workbench (v22) using the "RNA Seq-Analysis" tool run in default settings (length and similarity fraction = 0.8) to obtain gene expression table files with mapped counts.Annotated reference genomes were made by annotating genome FASTA files using GFF3 files retrieved from Anvi'o contigs databases using the "anvi-get-sequences-for-gene-calls" command with the "--export-gff3" flag.Trimmed reads were mapped to annotated reference genomes.When two reads are equally likely to be mapped to two or more positions, the CLC pipeline randomly maps the read to one of the candidate positions and labels the mapped read as "non-unique." An average of 10.7 ± 1.3 million reads was mapped per sample, of which 9.9 ± 1.6 million reads (93%) were uniquely mapped.In R, pangenome gene cluster metadata were mapped to the gene expression table by matching Prodigal gene call IDs (see function "r2_RNA_merge_pan_and_count").Differential gene expression analysis was performed using DESeq2 (v1.40.2) with default settings with a P-value threshold of 0.05 and Benjamini and Hochberg adjustment (default to DESeq2) to correct for multiple testing.To allow cross-species comparison of DEG profiles, counts mapped to gene calls within the same gene cluster were pooled for each species.

Data and statistical analysis
Hierarchical clustering was done using the "hclust" function from the base R stats package, using Euclidean distance by the "complete" method.Fisher exact test was done using the "fisher.test"function from the stats package, testing for both depletion and enrichment.Principal component analysis and Procrustes Superimposition analysis were done using the Vegan (v.2.6.4) package.The first two principal components for each data set were used for downstream analysis.The m 2 statistic from PS analysis (scale and symmetric set true) was tested for significance by a permutation approach (n = 719, maximum number of iterations).Briefly, observations in one matrix are randomly reordered while maintaining the covariance structure within the matrix and a test statistic is calculated and recorded enough times to obtain a sizeable null distribution.A P-value for each statistic is then calculated, representing the probability of obtaining a statistic with a value equal to or more extreme of the experimental value.

FIG 1
FIG 1 The genetic inverter and pangenome of selected Stutzerimonas hosts.(a) Schematic representation of Ara-aTc genetic inverter design.In the presence of Ara (Ara+), Ara-bound AraC upregulates its cognate promoter (P BAD ), leading to sfGFP and TetR expression and, in turn, creates a distinct measurable fluorescent state and leads to the downregulation of the P Tet promoter.In the absence of Ara (Ara-), AraC functions as a repressor.The presence of aTc leads to mKate and AraC production.The two promoters thereby act antagonistically, where the upregulation of one leads to the downregulation of the other.(b) Inferred phylogenomic tree of the six Stutzerimonas hosts.The scale bar is in units of the number of amino acid substitutions per site between two sequences.(c) Composition of core, accessory, and unique gene clusters from pangenome analysis (Anvi'o) of our six Stutzerimonas species.Bin 6 means all six hosts contribute with at least one gene call to the gene cluster and Bin 5 means any combination of five hosts contribute with at least one gene call and so on.Bin 5 to Bin 1 are grouped as the accessory genome, with gene clusters belonging to Bin 1 further distinguished as unique.Percentages indicate the portion of (Continued on next page)

FIG 1 (
FIG 1 (Continued) the 6,469 gene clusters assigned to the three frequency groups (core, accessory, and unique).(d) Clustered presence/absence matrix of the 6,469 orthologous gene clusters with columns representing gene clusters.The number indicates the number of gene calls for each host.(e) Percentage composition of cluster of orthologous genes (COG) categories of core, accessory, and unique group.(f ) Enrichment analysis of COG categories within each frequency group by Fisher exact test."n/6" indicates the number of hosts in which COG category was found significantly (P-value < 0.05, Bonferroni correction) enriched within the group.Only COG categories enriched in three or more hosts are shown.Gray points represent outliers.COG category description is provided at the bottom of the figure.CHL = Stutzerimonas chloritidismutans NCTC10475; PER = Stutzerimonas perfectomarina CCUG 44592; DEGR = Stutzerimonas degradans FDAARGOS 876; PGS16 = Stutzerimonas pgs16 24a13, PGS17 = Stutzerimonas pgs17 24a75; STU = Stutzerimonas stutzeri DSM 4166.

FIG 2
FIG 2 The chassis-effect is observed through the measurable performance of the genetic inverter between closely related Stutzerimonas hosts.(a) aTc induction curves, the left-most plot shows all induction curves overlaid up to a given inducer concentration.All induction curves to the right show individual curves with scaled axes.Hosts are color-coded, error bars indicate standard error of the mean, n = 8.(b) Estimated Hill parameters from aTc induction curves.Color scale relative to each column.(c) Ara induction curves and (d) estimated Hill parameters.(e) OD 600 normalized fluorescence dynamics of one of three toggle assays with induction scheme 0.75 mM Ara and 20 nM aTc.Initial OFF cells were diluted to respective induction states.(f ) Estimated fluorescence metrics from fluorescence in (e) across induction state and fluorescence output type.(g) Fluorescence dynamics of toggled cells diluted to respective opposite inducer and (h) estimated fluorescence metrics.DR = dynamic range; NI = no induction; Fss = late phase steady-state fluorescence; Rate = max specific rate; DRs = specific dynamic range.

FIG 3
FIG 3 The growth dynamics are uniquely affected as a result of the host-specific operation of the genetic inverter.(a) Growth curves of initial OFF cells diluted to respective induction states.Hosts are color-coded.(b) Estimated growth metrics for each host from growth curves in (a).The color scale is relative to each column, as for all subsequent subpanels.(c) Growth curves of toggled cells diluted to respective opposite inducer to toggle induction state and (d) corresponding estimated growth metrics.(e) Boxplots of growth difference between host genotypes and/or induction state captured by the Δµ metric, defined as a relative percentage change in growth rate.Whiskers indicate minimum and maximum values.(f ) Table overview of metrics from (e).WT = wild type, BB23 = pBBR1, and KanR cloning vector.pS5 NI = pS5 plasmid in the absence of an inducer.pS5 Ara = pS5 plasmid in the presence of Ara (0.75 mM), pS5 aTc = pS5 plasmid in the presence of aTc (20 nM).NI = no induction; µ = max specific growth rate; A = carrying capacity; λ = lag time.

FIG 4
FIG4 The chassis-effect is measurable in the differential response of genes designed into the engineered genetic inverter.Log2 fold-change values of the six genes encoded in the pS5 plasmid between hosts comparing Ara against aTc-induced cells.An empty position indicates a non-significant differential expression.

FIG 6
FIG 6 Procrustes analysis reveals significant concordance between similarity in inverter performance and similarity in core genome response between hosts.(a) Composition of each host's genome in terms of core, accessory, and unique gene clusters.(b) Composition of significantly differentially expressed gene clusters in terms of core, accessory, and unique gene clusters.(c) Enrichment analysis results by Fisher exact test, testing for depletion (underrepresentation) and enrichment (overrepresentation) for the three frequency groups for each host.(d) Procrustes superimposition analysis comparing hosts in terms of all significant differential gene expression responses against inverter device performance metrics.For the accessory genome, unique gene clusters were omitted to reduce artificial inflation of distance.The key steps in PS analysis are schematically illustrated.Performance metric data set and differential gene expression data set are first projected onto ordinate space via PCA, then the configurations are compared through PS analysis, which involves centering, scaling, and transforming the two projections to minimize the sum of squared vector residuals (the m 2 statistic) between each respective point (host).The significance of the obtained (Continued on next page)

FIG 7
FIG 7 Gene clusters containing the most highly differentially expressed genes between hosts.Spider plots showing log2 fold-change data of most highly differentially expressed gene clusters between the hosts.Gene names in parentheses are names provided by KO annotation, all other gene names are provided by the annotated COG accession.In cases where the gene name provided by COG and KO match, only one gene name is shown.