Machine learning predicts system-wide metabolic flux control in cyanobacteria

Metabolic fluxes and their control mechanisms are fundamental in cellular metabolism, offering insights for the study of biological systems and biotechnological applications. However, quantitative and predictive understanding of controlling biochemical reactions in microbial cell factories, especially at the system level, is limited. In this work, we present ARCTICA, a computational framework that integrates constraint-based modelling with machine learning tools to address this challenge. Using the model cyanobacterium Synechocystis sp. PCC 6803 as chassis, we demonstrate that ARCTICA effectively simulates global-scale metabolic flux control. Key findings are that (i) the photosynthetic bioproduction is mainly governed by enzymes from the Calvin–Benson–Bassham (CBB) cycle, rather than by those involve in the biosynthesis of the end-product, (ii) the catalytic capacity of the CBB cycle limits the photosynthetic activity and downstream pathways and (iii) ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) is a major, but not the most, limiting step in the CBB cycle. Predicted metabolic reactions qualitatively align with prior experimental observations, validating our modelling approach. ARCTICA serves as a valuable pipeline for understanding cellular physiology and predicting rate-limiting steps in genome-scale metabolic networks, providing guidance for bioengineering of cyanobacteria. Highlights A workflow for flux control analysis in Synechocystis sp. PCC 6803. Machine learning with features derived from genome-scale metabolic modelling. Identification of potential key reactions for metabolic adaptations and cell bioengineering.


Introduction
Photoautotrophic microorganisms, such as cyanobacteria, are regarded as promising platforms for the sustainable manufacturing of various chemicals, based on their inherent ability to absorb light and fix CO 2 (Kukil and Lindberg, 2022;Liu et al., 2019;Mustila et al., 2021;Rodrigues and Lindberg, 2021).Yet, the light-to-product conversion efficiency has be to be improved in these cell factories in order to fully exploit their potential.
Several efforts to identify limiting steps in cellular metabolism have been recorded.Main approaches being employed consist of 13 C-based metabolic flux analysis (MFA) (Sauer, 2004) or metabolic control analysis (MCA) using kinetic models (Heinrich and Rapoport, 1973;Kacser and Burns, 1973).Fluxomic studies assisted in the improvement in isobutyraldehyde, 2,3-butanediol and sucrose production rate in Synechococcus elongatus PCC 7942 (Jazmin et al., 2017;Kanno et al., 2017;Wang et al., 2023).Kinetic models have also been used to understand cyanobacterial metabolism (Jablonsky et al., 2013(Jablonsky et al., , 2016;;Janasch et al., 2018), and to identify overexpression targets for enhanced production of limonene (Wang et al., 2016) and ethanol (Nishiguchi et al., 2019).While MFA offers precise and detailed quantification of the intracellular fluxes in the living cell, the high experimental demand of this technique hinders the wide application for systematic metabolic engineering analyses (Sauer, 2004).The construction of kinetic models is challenging due to the required knowledge on enzyme kinetic parameter values (e.g., Michaelis constants and turnover numbers), which are scarce (Davidi and Milo, 2017;Islam et al., 2021;Nilsson et al., 2017), especially for cyanobacteria (Chang et al., 2021;Wittig et al., 2018).
In contrast, constraint-based analysis (CBA), such as flux balance analysis (Orth et al., 2010), uses linear programming, which allows for metabolic modelling at the genome scale, by setting reaction constraints and an objective function.Thereafter, in silico flux distributions through the biochemical network can be calculated upon optimization of a chosen objective (e.g., maximal growth rate).Since the number of reactions in the metabolic network exceeds the number of metabolites, the system is said to be underdetermined.On the one hand, this could lead to overprediction of the metabolic capacity, for which the solution space can be minimized by integrating quantitative omics data into the reconstructed model (Reed, 2012).On the other hand, this inherent property provides an opportunity to study the robustness of the metabolic reactions within the network, which can be realized through, for example, protein and metabolic engineering (Greenhalgh et al., 2021).
Earlier studies have proposed CBA-based methods for selecting targets for cellular engineering (Choi et al., 2010;Ranganathan et al., 2010;Tsouka et al., 2021).These analyses maximize a singular objective function and are thus biased towards the investigated bioproduct.In reality, however, the cell has multiple objectives to satisfy.An alternate approach to characterize the metabolic network is using Monte-Carlo flux sampling, which generates an unbiased probability of the steady-state flux distributions, affected solely by the network topology and the imposed constraints (Schellenberger and Palsson, 2009;Wiback et al., 2004).Such technique was proven useful for revealing transcriptional regulation in key enzymes (Bordel et al., 2010), differentiating between strains producing specific compounds (Scott et al., 2021), as well as for deciphering the effect of environmental conditions on the flux distributions (Herrmann et al., 2019;Pearcy et al., 2022;Wanichthanarak et al., 2020).
In recent years, an integrative approach, combining machine learning and CBA, has become a key element in addressing biological questions.Results derived from CBA were processed by machine learning models, which improved the interpretability of the obtained findings.Examples include the classification of metabolic fluxes based on drug side effects (Shaked et al., 2016), as well as identification of growth-limiting factors from computed flux distributions (Vijayakumar et al., 2020).Alternatively, machine learning algorithms were used to improve the predictive capability of CBA, by integrating enzyme biochemical characteristics (Heckmann et al., 2018(Heckmann et al., , 2020) ) or other experimental data (Zhang et al., 2020).
Here, we present ARCTICA (mAchine leaRning for Constrainedbased meTabolIc Control Analysis) to identify key controlling reactions that influence bioproduction at the genome-scale, based on the changes in internal fluxes.Using the model cyanobacterium Synechocystis sp.PCC 6803 (hereafter Synechocystis) as chassis, we illustrate the utility of ARCTICA for assessing the metabolism of cyanobacteria producing various food ingredients and fuel precursors.

Metabolic network reconstruction
The iJN678_AK genome-scale metabolic model of Synechocystis was used for the computational analysis (Kugler and Stensjö, 2023).For the simulation of recombinant Synechocystis producing metabolites of interest, the metabolic reconstruction was expanded with unidirectional pseudo-reactions allowing product export of the respective compounds (Supplementary Tables 1-3).For lauric acid production, a heterologous thioesterase reaction (FatB1) was added, allowing for the conversion of lauryl-ACP to lauric acid (Supplementary Table 1).

Genome-scale metabolic modelling
To evaluate the maximum production rate of the metabolite of interest, flux balance analysis (Orth et al., 2010) was employed (Eqs (1)-( 3)).To estimate the flux solution space, Monte-Carlo flux sampling (Schellenberger and Palsson, 2009;Wiback et al., 2004) was carried out with the optGpSampler method (Megchelenbrink et al., 2014) implemented in COBRApy, using 100,000 sample points of randomly distributed fluxes and a thinning factor of 100.Cellular metabolism was simulated by constraining the carbon dioxide, bicarbonate and glucose uptake rates to 0, 3.7 and 0 mmol/gDW/h, respectively.In addition, the minimum flux through the biomass objective function was set to 10% of the predicted maximum growth rate (Kugler and Stensjö, 2023).In order to simulate cell phenotypes harboring low metabolite efflux and high metabolite efflux, the reaction rate for metabolite excretion was set to 10% and 90% of the predicted maximum production rate, respectively, prior to sampling.Then, to capture the metabolic mechanisms underlying responses to genetic perturbations, the obtained reaction rates were normalized to the export rate of the end-product, providing percentage of flux values in the metabolic pathways.
Where Z is the objective function, v is the flux vector of the metabolic reactions, c T is the transposed vector of the objective coefficient.S ij refers to the stoichiometric coefficient of metabolite i participating in reaction j, and the v j refers to the vector of reaction flux (mmol/gDW/h; gDW, gram dry weight) of reaction j at steady state.The flux v j is the j-th component of an n-dimensional flux vector v, where n is the total number of fluxes.LB and UB correspond to the lower bound and upper bound, respectively, of the j-th reaction in the flux vector v.

Machine learning and feature selection
The multiple datasets generated by flux sampling were concatenated into a single comprehensive data matrix, to which a machine learning algorithm was applied.To differentiate between the two virtual phenotypes (10% and 90% of maximal production rate), we used logistic regression model with an L1 penalty (Least Absolute Shrinkage and Selection Operator; LASSO) (Tibshirani, 1996) (Eqs (4)-( 7)).The metabolic reactions were ranked according to their model coefficients, also referred to as feature importance scores.The model coefficient is an estimated parameter associated with each predictor variable (input feature), reflecting the extent to which the different reactions in the network contribute to the phenotype.A greater importance score implies that the particular input feature exerts a more significant influence on the model utilized for predicting a specific variable.Thereafter, a metabolic reaction that is estimated to exert a larger effect for differentiating between the virtual phenotypes is associated with a high importance score.
To ensure model generalization, we randomly split the samples dataset into train and test subsets, comprising 67% and 33% of the main dataset, respectively, with ten-fold cross-validation.The predictive accuracy of the machine learning model was evaluated by assessing the accuracy scores and the classification report.For clarity, exchange and transport reactions (either naturally occurring or necessary for the modelling) were omitted from the analysis.For the list of metabolic reactions, their full name and abbreviation, refer to the GitHub repository https://github.com/amitkugler/ARCTICA.

P(y i |x
A. Kugler and K. Stensjö Where P(y i ⃒ ⃒ x i , β ) is the probability of observing the binary outcome y i , given the predictor variable (input feature) x i and the corresponding model coefficient β.L(β) is the likelihood function, l(β ) is the loglikelihood function, and J(β ) is the objective function, while applying LASSO regularization to encourage sparsity in the coefficients.i and j are used as indices to represent specific observation and predictor variable, respectively, whereas n denotes the total number of observations, and p denotes the total number of predictors.Upon minimization of the objective function J(β ), the model coefficient β is being estimated.

Construction of ARCTICA for predicting important metabolic reactions
The ideation of ARCTICA is based on findings from previous studies, showing that the rate of the intracellular reactions is a function of the maximal flux values of the constraining reactions in the network (Bordel and Nielsen, 2010).Another research revealed that the crowded environment in the cell influences the Michaelis-Menten kinetics (Weilandt and Hatzimanikatis, 2019), inferring a dynamic behavior of the metabolic capacity, rather than a single enzymatic state.
Here, to uncover the key steps that govern pathway flux towards enhanced production rate of target molecules, we developed a computational framework called ARCTICA (Fig. 1).In its essence, ARCTICA conducts a probability-weighted analysis of metabolic reactions with varying rates, encompassing all feasible fluxes in the solution space.Leveraging the flexible system of flux balance analysis (Orth et al., 2010), we generated two virtual phenotypes, representing cyanobacterial strains with low and high production capabilities.By comparing the resulting flux distributions under constraints on maximal production rates, we aimed to understand how alterations in the internal reaction rates influence the overall flux of other reactions in the network.This analysis is crucial for evaluating metabolic control, as metabolic fluxes are the ultimate outcome of enzyme catalysis, forming the basis for cellular operation (Nielsen, 2003;Sauer, 2006).
ARCTICA consists of three steps: (i) Implementation of flux balance analysis to compute the maximal production rate of the metabolite of interest.Then, the excretion rate of the selected metabolite is constrained to 10% and 90% of the predicted maximal production rate to yield two metabolic reconstructions, representing high-producing and low-producing strains (virtual phenotypes).(ii) Application of Monte Carlo random flux sampling on the two newly-generated metabolic reconstructions to compute the probability of all feasible fluxes that satisfy the network constraints.(iii) Generation of a comprehensive data matrix by concatenating the obtained datasets from step ii to be employed as input features (metabolic fluxes) for training the machine learning model.Thereafter, using logistic regression for feature selection and extracting feature importance score, metabolic reactions that statistically influence and distinguish the two virtual phenotypes are mined and ranked.
For evaluating the applicability of ARCTICA, we examined the network reconstruction of the model cyanobacterium Synechocystis, and chose three metabolites which belong to different families (according to their chemical characteristics) and are valuable in the food and fuel industries: Lauric acid (fatty acids), squalene (terpenoids) and L-leucine (branched-chain amino acids) (Fig. 2).The biosynthesis pathways of lauric acid and squalene have been previously studied (Englund et al., 2014;Yunus et al., 2020), whereas the biosynthesis of L-leucine has not been explored yet in cyanobacteria.In addition, to evaluate a whole-cell product, we examined flux control where biomass was chosen as the objective of analysis.

Assessment of ARCTICA performance for discriminating phenotypes
To assess model generalizability and the ability of ARCTICA to classify phenotypes [low-producing strain (10% of maximal production rate) and high-producing strain (90% of maximal production rate)], we examined the predictive metrices.The testing accuracy was 0.99 (Supplementary Table 4) and the F1-score values were 1 (Supplementary Table 5), indicating a high predictive performance of the machine learning model for the applied dataset (in silico fluxomic data).The performance of the machine learning model was visualized using the confusion matrix (Supplementary Fig. 1), where less than 0.1 percent of the trained data was identified as false positive.

Identification of controlling catalytic steps in the metabolic network
To qualitatively evaluate the potential effect of specific reaction rates on the phenotypic outcome, we assessed the resulting feature importance score (model coefficient) for each metabolic reaction, obtained by ARCTICA (Fig. 3).The analysis shows that all the investigated in silico production strains were mostly limited by the metabolic capacity of the Calvin-Benson-Bassham (CBB) cycle [ribulose-1,5-bisphosphate Fig. 1.Flowchart of the computational framework (ARCTICA) proposed in this study for predicting metabolic flux control.The input for ARCTICA is the light and carbon uptake rates.Upon computing maximal production capabilities and constraining the metabolic model (step 1), Monte Carlo flux sampling is employed on each of the generated metabolic reconstructions to analyze the probable metabolic fluxes (visualized as histograms) (step 2).Subsequently, feature selection of metabolic reactions is done using fluxes as input data (step 3).The output of ARCTICA is a shortlist of candidate targets for genetic manipulation.carboxylase/oxygenase (RBPC), phosphoribulokinase (PRUK), phosphoglycerate kinase (PGK)] and the photosynthetic machinery [Photosystem I ferrocytochrome c oxidation (PSI_2), Photosystem II (PSII) and ferredoxin-NADP + reductase (FNOR)] (Figs.3-7).ARCTICA showed that the enzyme with the highest model coefficient, and thus has the most significant influence on the desired pathway flux (phenotype), was PGK.In addition, reactions within the product-forming pathways were among the highest influencers of the optimal solution (Figs. 2 and 3).Further, reactions within nitrogen metabolism, mainly N-acetyl-γ-glutamylphosphate reductase (AGPR) and N-acetylornithine transaminase (ACOTA), were identified as important factors for the high-rate formation of the various examined bioproducts.
ipdp)].For L-leucine biosynthesis, in addition to the capacity of the CBB cycle, ARCTICA found ENO and RPI as top flux-controlling reactions, followed by enzymes within the L-leucine biosynthesis pathway and those involved in nitrogen metabolism (Fig. 3C and 6).Further, we were interested in identifying reactions that determine microbial growth (Fig. 3D and 7).In a similar manner to evaluating production of specific metabolites, we constrained the biomass pseudo-reaction to 10% and 90% of maximal growth rate and examined crucial reactions for biomass accumulation.ARCTICA found that, following PGK and RBPC, photosystem I activity (PSI_2) exerted the most control over biomass formation.A medium-scale control was found for PRUK and ENO, and a weaker control was simulated for SBP, transaldolase (TALA), fructosebisphosphate aldolase (FBA) and RPI (Figs. 3 and 7).

Discussion
In this study, we illustrated that combining high-throughput fluxomic data with machine learning enables the determination of flux control distribution, taking into account the modular enzymatic capacity within the cell.Thereby, ARCTICA complements previous machine learning-based studies on enzyme parameters (Heckmann et al., 2018;Kroll et al., 2021Kroll et al., , 2023;;Li et al., 2022;Wendering et al., 2023) and extends the traditional MCA (Heinrich and Rapoport, 1973;Kacser and Burns, 1973).Whereas MCA represents a helpful system to understand flux control along the biopathway, the results obtained from kinetic modelling are difficult to translate into actionable system-level metabolic engineering strategies.Therefore, as compared to earlier studies, focusing on the parameterization and the selection of kinetic models (Choudhury et al., 2022;Gopalakrishnan et al., 2020;Matos et al., 2022;Miskovic and Hatzimanikatis, 2010), ARCTICA relies solely on computer-based metabolic flux data.This approach offers a scalable, model-to-lab framework for investigating system-level flux control and identifying target enzymes to modulate.
Considering model coefficients as indicators for important reactions, we found that, under the conditions examined in this study, our results are in a good agreement with published observations.In order to validate the obtained predictions, we explored literature discussing kinetic measurements and bioengineering.As investigations on Synechocystis are scarce, we broadened our comparison to studies performed on other model biological systems, such as Arabidopsis thaliana, were applicable.
For example, for all the modelled products, enzymes within the CBB cycle exerted the highest flux control, more than the photosystems.In previous studies, the capacity of carbon sinks downstream to the photosynthetic machinery, e.g.CBB cycle, appeared to influence the photosynthetic electron transport rates in cyanobacteria and in plants (Raines, 2022;Santos-Merino et al., 2021;Stitt and Schulze, 1994;Zorz et al., 2015).Further, ensemble kinetic models identified that phosphoglycerate kinase (PGK) exerts a significant control over RuBisCo flux in Synechocystis (Janasch et al., 2018;Nishiguchi et al., 2019).Our results are also consistent with previous findings on Synechocystis, showing that the gluconeogenic glyceraldehyde-3-phosphate dehydrogenase (GAPDi_nadp) and enolase (ENO) possess a weaker control over the fluxes through the system (Janasch et al., 2018).
ARCTICA additionally revealed bottlenecks in the early stages of fatty acid synthesis, with acetyl-CoA carboxylase (ACCOAC) and malonyl-CoA-ACP transacylase (MCOATA) being contributing the most.While ACCOAC is often considered the key regulator of fatty acid biosynthesis (Page et al., 1994), its overproduction resulted in a disproportionate increase in the malonyl-CoA pool, compared to the actual fatty acid synthesis rate (Davis et al., 2000).This suggests that downstream steps, particularly the catalytic rate of MCOATA, impose limitations on the fatty acid pathway flux.Studies also indicated that overexpression of the gene encoding MCOATA led to significant fatty acid accumulation in various microalgae species (Chen et al., 2017;Lei et al., 2012;Li et al., 2018), with a strong correlation between MCOATA transcript abundance and fatty acid synthesis levels under stress induction (Tian et al., 2013).Furthermore, ARCTICA identified enoyl-ACP reductase (e.g., EAR100y) and beta-keto-ACP reductase (e.g., 3OAR120) as additional flux-controlling enzymes for lauric acid production, though at a lesser extent than ACCOAC and MCOATA.Earlier independent studies have highlighted a role for both enzymes in the regulation of fatty acid biosynthesis in E. coli (Heath andRock, 1995, 1996;Yu et al., 2011).Also, the promiscuous phosphoketolase (PKETX and PKETF) emerged as a stimulating player, providing an alternative and efficient pathway for acetyl-CoA supply in the cell (Meile et al., 2001;Xiong et al., 2016).In S. elongatus PCC 7942, enhancing the intracellular acetyl-CoA through a heterologous phosphoketolase pathway boosted fatty acid ethyl esters production (Lee et al., 2017).In E. coli, a substantial increase in fatty acid production was achieved by overproducing ACCOAC, MCOATA, PGK, and acetyl-CoA-forming enzymes (Xu et al., 2013).Given that the type II fatty acid synthesis (FAS II) pathway in Synechocystis is responsible for phospholipid synthesis, similarly to that in E. coli (Heath et al., 2002), our findings emphasize the pivotal role of acetyl-CoA carboxylase and malonyl-CoA-ACP transacylase, along with phosphoketolase, in determining the rate of fatty acid production in Synechocystis.
balancing the MEP pathway flux in Synechocystis.
We additionally examined reactions that determine cellular growth.ARCTICA unveiled both well-known and novel flux control checkpoints for biomass production.We focused on enzymes within the CBB cycle, an extensively studied biochemical system in photosynthesis research, and identified RBPC, SBP and FBA as key controllers on the overall network flux.These results align with existing experimental and modelling data on cyanobacteria and plants (Janasch et al., 2018;Johnson and Berry, 2021;Poolman et al., 2000;Raines et al., 2008;Zhu et al., 2007).SBP, FBA and RPI contribute to ribulose 1,5-bisphosphate (RuBP) regeneration, crucial for the operation of the CBB cycle and the synthesis of biomass building blocks, such as purine nucleotides.SBP and FBA jointly control the photosynthetic carbon assimilation with RuBisCO (Raines, 2003), and overexpression of the genes encoding the identified reactions was shown to influence cyanobacterial growth (Liang and Lindblad, 2016).In this study, RPI emerged as a significant control point, and we propose this enzyme as a potential flux-controlling factor for enhancing bioproduction in Synechocystis.
We also noted that enhanced flow rate through PSI (ferrocytochrome c oxidation) could be beneficial for improved growth.Previous comparative analyses on model cyanobacterial strains (Synechococcus and Synechocystis) (Jahn et al., 2018;Ungerer et al., 2018) revealed that a fast growth trait is attributable to a higher abundance of PSI and electron carriers per cell, which could allow for a higher flux of electrons through the photosynthetic electron transport chain.Here, we corroborate these results, and provide a mechanistic insight into the cause-and-effect relationship between an increased PSI capacity and cellular growth of Synechocystis.
The obtained results confirmed the validity of the proposed computational framework for the identification of flux-controlling enzymes, and encouraged us to evaluate reactions limiting L-leucine production, an unexplored biopathway in cyanobacteria.According to the  (King et al., 2016).For the list of metabolic reactions, their full name and abbreviation, refer to the GitHub repository https://github.com/amitkugler/ARCTICA.The map was generated with Escher web-tool (King et al., 2015).obtained results, L-leucine synthesis rate is partly controlled by the metabolic capacity of acetate kinase (ACKr).ACKr cooperates with either phosphotransacetylase (PTAr) or acetyl-CoA synthetase (ACS) to produce acetyl-CoA, essential for activating 2-isopropylmalate synthase (IPPS) involved in the L-leucine biosynthesis pathway.This observation is consistent with the fact that acetyl-CoA turnover dictates L-leucine synthesis (Dempo et al., 2014).Yet, it was rather surprising to discover that the ketol-acid reductoisomerase (KARA1), converting 2-acetolactate into 2,3-dihydroxy-3-methylbutanoate, exerts low control over L-leucine synthesis rate, despite its role in diverting flux from the central carbon metabolism to the branched-chain amino acid biosynthesis pathway.
This study also revealed the vital role of N-acetyl-γ-glutamyl-phosphate reductase (AGPR) and N-acetylornithine transaminase (ACOTA) in optimizing Synechocystis metabolism.In cyanobacteria, the ornithineammonia cycle, wherein AGPR and ACOTA participate, acts to facilitate the adaption to oscillations in environmental nitrogen availability (Zhang et al., 2018).Our analysis further implicates that these enzymes are effective in increasing the growth and production rate of metabolites of interest, probably due to the interlinkage with the metabolism of nitrogen-rich amino acids (L-glutamate, L-aspartate, L-proline and L-arginine metabolism).This could enable an efficient flux redistribution of carbon skeleton and nitrogen resources for the synthesis of diverse cellular components.
We would like to highlight few points that merit further attention.(1) ARCTICA indicated varying importance levels for predicted reactions, emphasizing the need for a synergistic action of multiple enzymes, rather than the reliance on overproducing individual enzymes, as earlier exemplified (Niederberger et al., 1992;Schaaff et al., 1989).( 2) The control over a metabolic flux is influenced by external conditions (Walsh and Koshland, 1985;Stitt and Schulze, 1994;Nies et al., 2023), implying that the identified rate-limiting enzymes in the current study may vary under the different experimental setup.(3) Sampling of unconstrained flux space could lead to an overestimation of flux capacity for key enzymes.To enhance the predictive accuracy of the machine learning models, ARCTICA could be adjusted by integrating omics data, such as in vivo kinetic parameters (Davidi et al., 2016;Heckmann et al., 2018), metabolic fluxes (Hendry et al., 2019) or protein costs (Sánchez et al., 2017).(King et al., 2016).For the list of metabolic reactions, their full name and abbreviation, refer to the GitHub repository https://github.com/amitkugler/ARCTICA.The map was generated with Escher web-tool (King et al., 2015).

Conclusions
We devised a strategy for the identification of controlling biochemical reactions associated with cell phenotypes, through a statistical evaluation of metabolic features.While the proposed workflow is computationally intensive, the accessibility of high-performance computers makes this pipeline viable.By considering alternative steady states of intracellular fluxes, we illustrate how cyanobacterial metabolism responds to genetic perturbations.The proof-of-concept application of ARCTICA demonstrates its capacity to learn mechanistic relations in order to automatically target specific metabolic pathways or enzymes for further functional analysis and strain design.Due to the flexibility of the presented framework, ARCTICA opens the venue for addressing scientific questions in cyanobacterial physiology that are impossible or laborious to test experimentally.We envision that this open-source framework will accelerate the design-build-test-learn

Declaration of competing interest
The authors declare that they have no competing interest.(King et al., 2016).For the list of metabolic reactions, their full name and abbreviation, refer to the GitHub repository https://github.com/amitkugler/ARCTICA.The map was generated with Escher web-tool (King et al., 2015).

Fig. 2 .
Fig. 2. Biosynthesis pathways modelled by the ARCTICA framework for the identification of flux control points.Metabolic reactions and metabolites, except heterologous ones (FatB1), are indicated by their BiGG identifier(King et al., 2016).For the list of metabolic reactions, their full name and abbreviation, refer to the GitHub repository https://github.com/amitkugler/ARCTICA.The map was generated with Escher web-tool(King et al., 2015).

Fig. 3 .
Fig. 3. Distribution of logistic regression coefficients (model coefficients) as feature importance scores for the top-20 contributing reactions identified by ARCTICA.A) Lauric acid production.B) Squalene production.C) L-leucine production.D) Biomass production.For the full list of metabolic reactions, their corresponding abbreviation and feature importance scores, refer to the GitHub repository https://github.com/amitkugler/ARCTICA.