A Novel Method for the Integration of Stochastic Petri Net Simulation and Transcriptomic Data Applied to a Metabolic Pathway

Over the years, methods capable of integrating data from omics, such as transcriptomics, proteomics and metabolomics have emerged in Systems Biology, principally the use of networks to integrate omics information. In particular, the role of each biological pathway aims to understand the intermolecular interactions. While there are theoretical and experimental strategies to investigate biological pathways involved in cellular metabolism, computational modeling methods allow for a better understanding of them. Here we propose a new method to connect transcriptomic data with simulation approach using stochastic Petri Net (PN) metabolic networks. This new approach was developed based on well-studied theoretical gene expression modeling while trying to assimilate dynamic ordinary systems to a stochastic model function. The developed method was used to perform stochastic PN simulations of ethanol fermentation by Saccharomyces cerevisiae considering glucose and xylose as carbon sources. Lastly, we developed the PeNTIOS software, which is capable of converting Saccharomyces cerevisiae metabolic pathways and transcriptomic data into SBML format. The generated files can be readily imported into PN simulation programs. Our results show that the reconstruction of stochastic PN systems with transcriptomics data is a promising method that can generate new insights about biological experiments, as shown through our case study with the xylose-fermenting yeast. A Novel Method for the Integration of Stochastic Petri Net Simulation and Transcriptomic Data Applied to a Metabolic Pathway


Introduction
The large-scale generation of biological information on metabolism and gene regulation led to the development of new bioinformatics methods based on mathematical modeling which, coupled with multiomics approaches, have enabled a deeper understanding of cellular systems. In this context, systems biology emerges as a field of study capable of integrating omics data and metabolic networks to extract information about biological responses and phenotypes.
The standard analysis in metabolic network simulation is Flux Balance Analysis (FBA) [1], an approach that studies metabolism through the reconstruction of metabolic pathways at the genomic scale and the integration of omics data [2]. This mathematical representation uses linear programming to solve reaction equations and to calculate possible flows using a metabolic model. In order to apply this methodology, one must have a completely validated model supported by experimental data, such as fermentation profiles and cellular growth. Moreover, transcriptomic and metabolomic data can be used as additional constraints to the model and to give support for the evaluation step.
Many other methodologies have been used for modeling complex biological systems. Examples include the ordinary differential equations (DOE), Boolean networks and Bayesian models. The Petri net (PN), introduced by Carl Petri in early 1960s [3], emerges as a potential mathematical approach for modeling biological systems [4] and metabolic networks [5]. There are various Petri Net models, such classical Petri Net, Continuous Petri Net, Colored Petri Net (CPT) and Stochastic Petri Net (SPN). SPN preserves the conditions of qualitative networks with the addition of an exponential probability distribution (delay function) to each transition trigger action. The modeling of stochastic networks can be generated by adding the transition probability estimated from previously available data. Considering the complexity of biological systems, the use of stochastic modeling represents an improvement over FBA, but it also brings the challenge of choosing the appropriate delay function.
In the present work we describe a new method that uses transcriptomic data to estimate the delay function in SPN and applies it to metabolic network simulations. Saccharomyces cerevisiae considering glucose and xylose as carbon sources, a case study for both the first and the second-generation ethanol production processes [6]. Lastly, we developed the PeNTIOS software, which is capable of converting Saccharomyces cerevisiae metabolic pathways and transcriptomic data into SBML format. The generated files can be readily imported into PN simulation programs.

Methods and Implementation
In a metabolic PN model, the network representation consists of states, transitions and tokens, normally represented by circles, rectangles and black dots, respectively. State S represents the substance (metabolites, cofactors, etc) and transition T represents the biochemical reaction, where the token-pass is activated only by the transition T, since the state immediately above contains a token. Edges connect a state to transition and represent the flux of the reaction promoted by token passing [7]. PN always reach the next state, unless if there is not any token to fire, resulting in the end of simulation. Other advantage in Petri Net simulation is the use of logical nodes. This type of notation is useful to represent multiple nodes in network that cause the same trigger during simulation. SPN can be expressed as a stochastic probability function F(t) of a PN transition using an exponential delay function given by F(t)=1e -λt , which has an expected distribution value of 1 λ . In terms of PN transition, different values of λ change the probability function ( Figure  1A) and generate different final states to the network simulations. Thus, the correct choice of λ values for each transition is an essential step for simulations to converge the results into a real biological system. Thus, our method is based on a gene expression model [8], in which the mRNA is produced from DNA at a rate k m and degraded at a rate γ m , as shown in equation 1: In a biological system, considering that at any given moment the metabolic flux of the cells is constant as a function of time (called steady state), we can assume that the transcription and degradation rates, and its respective ratio m m k y , will also be constant in the steady state. In this condition, we can consider that the ratio m m k y can be approximated by mRNA expression data obtained from transcriptome analysis, such as RPKM, FPKM, TPM or normalized microarray [9].
In this context, our method proposes a positive correlation between λ values and normalized gene expression data obtained from transcriptome analysis. This insight was achieved by comparing the graph of the delay function over the time ( Figure 1A) and the graph of the theoretical deterministic model of inverse mRNA production over time ( Figure 1B).
Based on the comparison of the functions steady state, we associated the expected value 1/λ of delay function with the plateau of the inverse of the equation (2)   Although this new method can be applied to any organism, we focused our analyses on the metabolic pathways of Saccharomyces cerevisiae, with a case study of the consumption of glucose and xylose and their conversion into ethanol. For this purpose, we implemented the PeNTIOS software (https://github.com/lmigueel/PeNTIOS) in Perl language to automatically generate a SBML file for the PN network simulations, including the calculation of the λ value extracted from transcriptomic data. PeNTIOS requires as inputs a (1) pathway name from the Saccharomyces Genome Database (SGD) (example: glycolysis)  The simulation initializes at β-D-glucose (red) and produces ethanol (green) and glycerol (blue). The logical nodes (grey) and the blue path (two reactions) were implemented manually to allow for glycerol production by the model. and (2) a tab-separated text file containing names and expression data (RPKM, FPKM, TPM, CPM or normalized microarray) for all genes in the condition of interest. PeNTIOS performs the download of metabolic pathways from SGD (HTML format) and makes a conversion into SBML format parsing the reactions. To attribute the correct gene name for each reaction ID, each reaction is downloaded from the HTML page using a reaction ID and parsed to recover the gene name. After that, gene expression data from each gene identified in metabolic pathway is extracted from transcriptomic data, normalized by median and converted to SBML math format for simulation ( Figure  2). Furthermore, all the simulation can be performed using Snoopy2 v1.21 [10] in Windows/Linux.

Example of Usage
In order to validate our method, we focused on the analysis of ethanol production by Saccharomyces cerevisiae having glucose and xylose as carbon sources. Although yeast glucose uptake and its conversion into ethanol has been well studied through several experimental techniques and models [11,12], xylose fermentation remains an unresolved problem that requires the development of new genetically modified strains and presents metabolism bottlenecks. Thus, several strategies have been employed to study this system, including transcriptomic and metabolomic approaches [12,13].

Ethanol production from glucose
PeNTIOS was used to convert the super pathway of glucose fermentation (glycolysis, glycerol biosynthesis and alcoholic fermentation pathways) and a publicly available transcriptome dataset (FPKM values) [11] into a SBML file containing all metabolic reactions, cofactors and λ value calculated by rescaled FPKM values. After the PeNTIOS conversion, all cofactors nodes, such as ATP and NADH, were manually converted to logical nodes. Also, the cofactor was modified from NAD-dependent to NADP-dependent in the acetate production reactions, as previously described [14], and two other reactions for acetate formation (ALD4 and ALD5) were excluded as they involve mitochondrial genes (details in Supplementary Figure 1). Finally, as shown in Figure 3, the initial conditions of the tokens in SPN simulation were configured to 1, 2, 1 for glucose, ATP and NAD, respectively, in order to allow for glucose consumption in the first steps of the metabolic pathway. The simulation was performed using Gillespie model over 1000 iterations during 60 ms. We compared the scenario where all the transition rates were defined as λ=1, symbolizing the simulation without any external information, against the scenario where all transition rates were corrected by the transcriptome data.
In the first scenario, the simulation resulted in ethanol and glycerol as the main products at a proportion of 50% each ( Figure 4A). On the other hand, in the second scenario, the simulation resulted in the production of ethanol and glycerol at a ratio of 9:1 ( Figure 4B), showing a good correlation with experimental results ( Figure 4C). These results reveal the potential of the new method for recovering the essence of biological systems through computational simulation supported by transcriptomic data.

Ethanol production from xylose
The production of ethanol using xylose as a carbon source is commonly referred to as second generation (2G) ethanol technology. It is a promising process which can increase production and reduce the costs associated with the first-generation (1G) ethanol production from glucose. The 2G process is based on the deconstruction of lignocellulosic biomass to release fermentable sugars (mainly glucose and xylose) [6]. Since wild type yeast cannot consume xylose, it is necessary to perform genetic modifications that enable xylose consumption, such as the insertion of the exogenous xylose pathway genes xylose reductase (XR) and xylitol dehydrogenase (XDH) [15,16]. These specific modifications cause an unbalanced redox in the yeast cell, leading to xylitol accumulation and ultimately hindering the production of ethanol.
In order to evaluate the performance of our new integration method in the study of xylose-fermenting yeast, we created a SPN containing a xylose consumption pathway by insertion of the XR and XDH genes and the pentose phosphate pathway (PPP) into the already described model (Supplementary Figure 1). The stochastic simulation was then performed using as input transcriptomic data from a study where glucose and xylose consumption and their conversion into ethanol by a genetically modified yeast were investigated [12]. The simulation was started with 5 tokens of glucose or xylose, 10 tokens of ATP, 5 tokens of NAD + and 5 tokens of NADP + . The Gillespie model was used over 1000 iterations during 60 ms. The resulting number of tokens of ethanol and xylitol predicted by the simulation with glucose and xylose as carbon sources are shown in Figure 5A. A comparison of these numbers with experimental data, obtained from Zeng et al. [12], shows very similar end-products profiles ( Figure 5B). The lower ethanol yield observed during the consumption of xylose when compared to glucose is related to the accumulation of xylitol, which is produced by the XR reaction and cannot be consumed by the XDH reaction due to the unavailability of the NAD cofactor.
In order to reinforce the correlation between simulation and experimental results, a publicly available metabolomic dataset from yeast genetically modified for xylose consumption was obtained from Wasylenko et al. [13]. Figure 5C shows the number of tokens of internal metabolites and the NAD/NADH ratio obtained through the simulation using glucose and xylose as carbon sources. The comparison with the retrieved metabolomic data ( Figure 5D) reveals good agreement between simulation and experimental results. This is especially true for the high accumulation of intracellular xylulose 5-phosphate (X5P) and the reduced NAD/NADH ratio. The reduction Figure 4: A) Amount of ethanol (red) and glycerol (blue) tokens produced from one token of glucose over 1000 cycles of simulation of the S. cerevisiae glycolytic pathway. As the probabilities are identical, we have equal production of ethanol and glycerol. The number of ethanol or glycerol tokens was normalized by the initial number of glucose tokens. B) Amount of ethanol and glycerol tokens produced from one glucose token in S. cerevisiae glycolytic pathway model, where the transition probability of the Petri Net network was substituted by the normalized expression values. C) Experimental results of ethanol and glycerol production using glucose as carbon source (data extracted from Carvalho-Netto et al. [11]).

Figure 5:
A) Fermentation end-product relative concentrations predicted by SPN simulation using glucose (red) and xylose (blue) as carbon sources. B) Experimental results obtained from Zeng et al. [12]. C) Normalized intensities of metabolite profiles obtained by SPN simulation using glucose (red) and xylose (blue) as carbon sources. D) Metabolite profiles obtained from experimental metabolomic data [13]. X5P and S7P denote xylulose 5-phosphate and sedoheptulose 7-phosphate, respectively.
of the NAD cofactor observed in the simulation and in the metabolomic data is related to xylitol production, as previously discussed. Moreover, the accumulation of xylulose 5-phosphate in the PPP during xylose consumption suggests a metabolism bottleneck that can be a target for genetic manipulation, such as the over expression of the transketolase (TKL1) gene.
The overall results showed that the here described method, which is based on the integration of transcriptomic data with Stochastic Petri Net simulation, can be used to predict end-products and internal metabolites profiles. The possibility of performing stochastic simulations guided by transcriptomic data enables the use of new strategies for interpreting transcriptome and metabolome experiments, besides generating new insights on metabolism. Although PeNTIOS has been implemented for Saccharomyces cerevisiae, the method developed in this work can be applied to other organisms.